Question: do we really want to reuse the scripting via setup-all.sh or do we want sth. else, that might be more helpful?
I'm thinking along the lines of having a configuration file and some script that makes the necessary calls.
I'm open to change.
Would we have to maintain two scripts doing similar things?
Yeah, probably. It might be worth considering replacing the setup-all scripts with this, making it reusable for both
What would happen if we even would be so blunt to make this part of the app?
E.g. roles... if you really need some custom ones, use the API. But probably most stick with the default ones.
Same for root dataverse and first admin. Do we really need to use the API for this?
Same for database settings. We could read them from some file and load them in a Flyway migration
And so on and so forth
Can we small chunk this? Start with roles? That's the one you mentioned first.
Q: How do you eat an elephant? :elephant:
A: One bite at a time. :eating_utensils:
We just had a nice chat about this at tech hours.
We'll see where #9443 takes us. At the very least we'll remove hard coded ports!
No issue for this yet but if we add a new way for developers to run the API tests locally (in the new containers), as discussed, the "requires a sequence to be added to your database" test is expected to fail: https://guides.dataverse.org/en/5.13/developers/testing.html#identifier-generation
For the test requiring a sequence in postgres: https://dataverse.zulipchat.com/#narrow/stream/375812-containers/topic/running.20API.20tests/near/345214694
Alrighta lets make some serious business here. To make integration testing easier, people (@Jan Range) are asking for a Github Action and it's not a good idea to always clone the whole repo just for that. We wanted some config tool / image anyway, so let's hack together some stuff out of what we have, ship that for immediate reuse and go on from there with improvement.
Sure, but what are we shipping? A larger version of dvinstall.zip? Or something new?
In my head I have an image that contains just the bare necessities. So the necessary scripts, the data for those and the solr config parts
Make it a teeny tiny alpine based image to execute all the tasks
Have some nicely named shellscripts to run as entrypoints
Done
Anyone can extend that with custom logic
We can improve the scripting part
And add nice stuff like mdbtool once ready
This is kind of a small helper utility thing, bundling all the shenanigans
So it get's easy to run those on any platform and skip cloning a repo
Sounds like a plan?
This is kind of a mixin milestone B and C into one solution
Btw what is a good name for this image? gdcc/config? gdcc/init? gdcc/boot?
Ok, so a new image. In a previous meeting I think we called this a configuration image. Or a bootstrapping image. I kind of like config better.
@Oliver Bertuch for testing purposes, I think this is a great solution!
For now we created a note/draft at https://github.com/orgs/IQSS/projects/34/views/17 called "configurator image" that we'll convert into an issue.
Let's call the image confighubbub :zany_face:
I converted the note/draft to an issue: configurator image for container setup #9573
"setup image" is shorter
but I'm fine with "configurator image"
I'm very interested in the naming of the image itself
What shall it be
Getting more silly be the minute. Please stop me and gimme sth to work with thank you
@Jan Range you're online! Please vote!!!
Where's @Philip Durbin when I need one? :innocent:
@Sherry Lake what does not sound odd to you when you read it? Have your say! :smile:
Oooh I am the worst at naming things :sweat_smile: Configbaker sounds fancy, but Bootstrapper is nice too
You got off (too?) easy when you named EasyDataverse haha @Jan Range
Yea that was an easy one (no pun intended :-D)
There was a pun in there? Didn't notice. :innocent:
:beers:
lol configbaker
:birthday:
Haha looks like we got ourselves a name then
Edit: Nevermind :grinning:
Well certainly Season 3 went :face_palm: at halftime...
We better don't follow that path. There be dragons
When Harvard Dataverse ran on bare metal the servers were called kirk and picard.
Isn't there a theory about servers being named after LoR, ST, SW like 80% of the time?
:spock:
the old Odyssey cluster at Harvard was full of Greek names for machines
Ours is called Flanders, because our group leader literally looks like him :sweat_smile:
How about gdcc/mando? It's gonna weild a fancy blade, have a neat youngling and shout "This is the way" anytime you use it
maybe start writing the README and a name will emerge
what's it going to look like on Docker Hub? the README, I mean
and will there be a :birthday: emoji?
OK lets call it working title "configbaker" for now :innocent:
Probably it will be very similar to the other READMEs
We can decide to add more "how to use" in it as well
Like name available scripts you can use
You can call it mando as long as the README is clear :happy:
This might be helpful not needing to dig into the guide every time for that
configbaker sounds like a good codename to start with :birthday:
Oh BTW it will have a help message printed when you "just run it"
That probably should list all the scripts baked in
I added a :birthday: to the description of #9573. That's my contribution so far. :happy:
How about "it can cook up an instance"
Baking now involves cooking?
we'll workshop it some more :sweat_smile:
oh, sorry, I meant instead of bootstrap, the same sentence with icing
echo "Hello!"
echo ""
echo "This is ConfigBaker, a container image with lots of tooling to cook up a containerized Dataverse instance."
echo "It can bootstrap an instance (initial config), put icing on your Solr Search Index configuration and more."
echo ""
echo "Here's a list of command scripts available:"
Please rephrase to your liking :wink:
echo "Hello!"
echo ""
echo "I'm ConfigBaker, a container image with lots of tooling to "bake" a containerized Dataverse instance!"
echo "I can cook up an instance (initial config), put icing on your Solr search index configuration, and more!"
echo ""
echo "Here's a list of things I can do for you:"
everybody wants to talk to bots these days so I say we go with it
Haha make ChatGPT setup a Dataverse instance for ya
What happens when you ask it that?
It tells you to RTFM:
Screen-Shot-2023-05-04-at-5.02.21-PM.png
Hahahahahahahahahaha
Utter nonsense
It ends with this:
"If you are not familiar with server administration or installing software, it is recommended that you seek the assistance of a qualified professional to help you with the installation process."
@Don Sizemore help!
Alright got a script that does the file permission fixing for us
I should create some commits and push this to some place as an image
yes!
Feel adventurous? Go try https://hub.docker.com/r/gdcc/configbaker
Here's a command for you: docker run -it --rm gdcc/configbaker:unstable
So there now is also https://github.com/IQSS/dataverse/pull/9574. Quite naked so far, but at least we started working on it!
I'm gonna close the hacking session now for today. It's been exciting! See y'all tomorrow
OK I lied - I added printing a script description to the helper script because this is user friendly (but extracting it from the script files, so this is not hardcoded...)
Works great. I love how small it is!
pdurbin@air ~ % docker run -it --rm gdcc/configbaker:unstable
Unable to find image 'gdcc/configbaker:unstable' locally
unstable: Pulling from gdcc/configbaker
c41833b44d91: Pull complete
6afcb51be105: Pull complete
8d4c0b2520e6: Pull complete
3727a8da81b4: Pull complete
Digest: sha256:47498c6d054bb4b24f13a1b3659ffe18aedc08183c1afdb1462c9f63d2d12496
Status: Downloaded newer image for gdcc/configbaker:unstable
โmฮฑo
โซ jh
`%โฅรฆโจ
โซยต
โ@M%โ,
โ` โซU
โยฒ โซโ
โM#Mโ"
โรฆMโโ%ฯโซโ
โโซ" "โซโ
โ โ
โ โ
`โซยต ยฟโซ"
"โ%%MMโ`
Hello!
I'm ConfigBaker, a container image with lots of tooling to 'bake' a containerized Dataverse instance!
I can cook up an instance (initial config), put icing on your Solr search index configuration, and more!
Here's a list of things I can do for you:
fix-fs-permissions.sh - Fix folder permissions using 'chown' to be writeable by containers not running as root.
help.sh - This script.
Simply execute this container with the script name (and potentially arguments) as 'command'.
pdurbin@air ~ %
Ok here's the first tricky question. Do we provide a full configset for Solr already or don't we?
Providing it means very simply start command for Solr, needs copy hacking otherwise. (As we do now)
I'm fine with copy hacking for now. We can remove it later.
I'm having trouble with the compose file, the maven docker plugin and the new bakery... :sad:
This stuff doesn't really play along well
I thought I could make a compose build only a reality for this, but it beats up the Maven side completely
:pensive:
This is kinda frustrating. I probably will go for a Maven build then first.
Sure. Sounds good.
@Philip Durbin is there a way to detect from the outside that an instance is bootstrapped?
I'd say the best way to have that is by checking for a root dataverse
But that might not be published yet
Would it make sense to add a /api/info/initialized thingy?
I'm not opposed to adding an API endpoint for this.
Checking for the root dataverse could work. @Jan Range was checking for metadata blocks, which get added later.
Maybe we could check for the very last thing that gets added, whatever that is.
Or maybe both
It definitely needs both things
OK maybe for now I'll stick to the blocks thing, thx for the idea @Jan Range
These are the last two things the final setup script configures:
echo "Publishing root dataverse..."
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "http://localhost:8080/api/dataverses/:root/actions/:publish"
echo "Allowing users to create dataverses and datasets in root..."
curl -H "X-Dataverse-key:$API_TOKEN" -X POST -H "Content-type:application/json" -d "{\"assignee\": \":authenticated-users\",\"role\": \"fullContributor\"}" "http://localhost:8080/api/dataverses/:root/assignments"
Yeah but I want sth more general ;-)
sure
I'm already adding personas so we have it easier to add more bootstrapping things
Don't have a commit yet, sry
I will touch the setup scripts in scripts/api in this PR
No way around this - I simply can't use localhost:8080 from inside a container
(The change is minimal - simply replace the static URL with a variable)
Is it worth including a fix for the issue @Akio Sone opened recently? Configurable server-port-number for the containerized Dataverse #9534
Dunno if this means we get in trouble with Kevin and QA
I fear @Akio Sone is mixing up a few things here. From inside the container network, we can simply reach out to http://dataverse:8080. There is no need to rewire these ports
At least not for configuration
If he wants to expose the container ports to some other port on his machine, that must be configured in the compose file.
Well, his use case is real (maybe a new topic). As a Java developer he has other apps he's developing running on port 8080 (on his laptop or desktop) and would like to run Dataverse on some other port.
And by the way - rewiring these internal ports is not as easy as it sounds - this needs to rewired all the way down to the Payara server conf. I don't think that is either necessary nor useful
Sure - he may simply change his local compose file
Just saying it's a real use case. :happy:
As for QA etc, can configbaker help with the SPA? That's the way to get buy in.
https://docs.docker.com/compose/compose-file/compose-file-v2/#ports
Either specify both ports (HOST:CONTAINER), or just the container port (an ephemeral host port is chosen).
Yes of course it can!
It will simply make calling the bootstrap script unnecessary!
Ok, let's make sure we talk this in the top of the PR.
"why we need this"
Simply because it can sit there as an init container, started alongside the others when you run compose
Good point!
OK I'll go ahead and make some commits so you folks can see some results
We could even wordsmith #9573 around the SPA. Maybe even put it in the SPA column if we do a good job selling it. :birthday:
possible new titles:
Do they get any new features?
Or is the main thing that they don't have to run the final setup script?
Yeah that is the main thing
Ok. Small chunks are good!
OK I just pushed the changes to the setup scripts
It removes the necessity to cd to the script's dir first and adds the override cap
Cool. I'm looking at it (#9574). Did you see my comments from before?
Yes but I wanted cosmetics later :yum:
Ok, I added a review.
Not much to say. Looks good. Should work.
I'd probably want to try it with my dev rebuild script.
That is, outside the context of containers.
Sounds good!
One can also use the "old script" for initializing containers
It makes use of the scripts
So this should be easy to testdrive
Good point.
And Jenkins will exercise setup-all, I believe.
Thank god for @Don Sizemore having us covered!
@Don Sizemore ![]()
Getting closer...
Anyone feeling adventurous? I just pushed a new version of configbaker to Docker Hub
And it's looking pret-ty good around here with making it bootstrapping things for ya!
@Philip Durbin I just pushed all the changes I have so far
One should be able to run mvn -Pct clean package docker:run and see the magic happen!
Works on my machine... :yum:
@Guillermo Portas dunno if you're feeling adventurous as well :wink:
orly
Yeah rly :big_smile:
I love how friendly it is!
dev_bootstrap> Done, your instance has been configured for development. Have a nice day!
And that it works. Another triumph by @Oliver Bertuch! ![]()
It's a baker after all :birthday::cake::cookie::donut::pie:
This should be presented every time
I just tried to convert that into Ascii Art... Too many details :wink:
But actually I thought about making an ascii art presentation of a cake and have the text nicely added to the side
This one is maybe more suited for Ascii if we are able to edit it :-D
Alright, finished the large review for Jim, onto new adventures! Hit me with things about this baker. After proofing the concept now's the time to extend and polish
Should we go ahead and merge https://github.com/IQSS/dataverse-frontend/pull/87 and worry about switching to configbaker later?
Sure, why not. There will always be room for improvement
Except @Guillermo Portas wants to avoid touching it again :wink:
Ok. I won't pull it out of "ready for QA" then.
@Philip Durbin you are today in the luxury position of telling me what to look into next: go for more baking or start a new PoC for a Github Action
Can you please start a branch for docs? Even a tiny diff helps. https://dataverse.zulipchat.com/#narrow/stream/375812-containers/topic/docs.20for.20milestone.20A.20.239540
In the backlog, what size should I give #9574?
Are you counting your experimentation as well? Then maybe a 10? Otherwise a 3
A 3 it is, thanks.
I have docs above it. I hope you don't mind.
Meaning you want to see the docs PR done first? No problemo senior!
Yeah. I'm talking about the column in the backlog. We should put at the top stuff that we want to be put in a sprint.
Of course, docs do have a fast track option. :thinking:
I put config baker first. :birthday:
When will it no longer be draft?
Probably we should think about a definition of done for it...
It seems like it already delivers value. Ship it? :rocket:
But it sounds like we should finish the docs for milestone A first so we can call them "done"
Oh no, please let me polish things a bit more
And it definitely needs docs
Yes, we can improve the local frontend environment with the configbaker later, so we can start using it as it is for now :)
So what _is_ a definition of done for this?
I think we still need:
That should be it for now, as long as we don't have stuff like mdbtools ready. We could add a helper for auth providers
Quite a list! :sweat_smile:
I added :birthday: configbaker :birthday: to tomorrow's agenda: https://docs.google.com/document/d/1eQVm88dP2rgM9DKn4ivoWBx6MOK6aXfkLhsZN-Y3fsc/edit?usp=sharing
"Brought to you by your local German baker" (Pun intended)
As we just discussed, #9574 is still draft. Let's make it non-draft so I can advocate to get it tested and merged. How can I help?
Hmm maybe draft some docs?
Should I push them directly to your branch?
Also, I just brought this up at standup. I think we can get it in.
Great!
Yes, please feel free to push into my branch
That's probably the easiest way forward
@Oliver Bertuch do you want to keep the support email as-is? https://github.com/IQSS/dataverse/pull/9574/files#r1185609395
$ ack forschungsdaten
modules/container-configbaker/Dockerfile
47: org.opencontainers.image.authors="Research Data Management at FZJ <forschungsdaten@fz-juelich.de>" \
modules/container-base/src/main/docker/Dockerfile
225: org.opencontainers.image.authors="Research Data Management at FZJ <forschungsdaten@fz-juelich.de>" \
src/main/docker/Dockerfile
45: org.opencontainers.image.authors="Research Data Management at FZJ <forschungsdaten@fz-juelich.de>" \
I'm fine with whatever. I'm somewhat obligied to mark things I did but sure, we can change those.
To what BTW
Maybe we should start using the REUSE framework
ok
I'm having trouble building the configbaker image.
I was trying to document it.
no pom.xml file
I guess I'll just remove Build Instructions
@Oliver Bertuch ok, I pushed some docs: #9574
Is it ready for QA? Or should we also add a GitHub Action?
... to build the image and push it to registries, I mean
Absolutely, we should have a CI workflow for that around.
Before we send it to QA?
Yes please!
Sorry, just back to desk
You probably already headed home :wink:
Nope. 15 minutes left. Need to bake a real cake tonight. The older kiddo has a birthday tomorrow.
I'd say getting the images pushed is the priority but I did just leave a comment about SOLR_URL here: https://github.com/IQSS/dataverse/pull/9574/files#r1205986701
I'd say you can simply ignore that one
And maybe more docs. I dunno. I added some, at least. Probably good enough.
To me this line doesn't make much sense. When I'm setting up Dataverse, the index should be empty.
Why would I delete it before going on?
Maybe this was added to the installer when someone wants to use it to start fresh
It's just a legacy line from my "drop database and start over" script.
But then I'm not sure why it wasn't added to that drop and rebuild script
Well we could be bold and just say drop the line
Meh. I vote later.
You're just afraid of QA :melting_face:
Just trying to keep PRs small. :upside_down:
Huh? The line is already being changed and the SOLR_URL defined as an additional line. Deleting the line would actually be of the same size or less?
Or are you referring to mental load PR size?
I'm always trying to avoid questions and discussion. :happy:
Ha now I know why this codebase is full of tech debt :stuck_out_tongue_closed_eyes: :wink:
OK here's our todo list for configbaker again:
Wow, I added that "clear out Solr" line back when we were still indexing users: https://github.com/IQSS/dataverse/commit/42692f5681ee65ea982b66656d7683bf0e015725
It's been sitting there for a while :smiley:
I can remove it if you want.
Or shall I do that commit? Then it's me to blame :see_no_evil:
No, I got it. One sec.
You'd have believable excuse: "I told him so!"
ok, pushed
For the bullet above about docs, I did add some.
Yeah I'm just about to read through those! Lovely!
I couldn't figure out how to build the image.
It's built together with the app image
Because they are tightly coupled
If you change some scripts etc you need to use that with the according app image
That's sort of what I figured. Is it possible to build them separately? If so, should we document it?
Yeah that should be possible - you can filter the images.
Ok. If you feel like testing and documenting if, please feel free, but I'd say it's a low priority.
OK.
I think I'll start by merging in develop and then take a look at the compose file, get that done.
Sounds good. I'd better get home. :birthday:
Go bake bake bake away!
Good luck!
(No fish please)
Sorry I didn't get far with Configbaker this evening. I first had to fix https://github.com/IQSS/dataverse/issues/9617. These random test failures are driving me nuts... :angry:
Thanks! I moved it to QA.
Lovely! It hit me again yesterday when hacking on config baker. Wondering y'all are not receiving that much messages from Actions
Not sure. There have been flakier tests. But good to fix this one. Thanks again.
Maybe I just got lucky and had all the failing ones
Oh, I would have written some docs on how to extend configbaker with your own scripts, but I wasn't sure how this works. Do you simply fork configbaker, add your scripts, commit and push, build your image, maybe push to a registry, and run it?
Or maybe you put your custom scripts on disk somewhere and point configbaker at them?
I wouldn't call it fork. You could use it as a base image. But yes, you could also just mount the scripts
This would be especially true for the personas when bootstrapping
Either for testing without rebuilding the image (also that is really really fast)
Ok. Mount the scripts.
Or you might want to prepare a persona package in a pipeline (e.g. add special blocks / users / ...) and execute the pipeline's result
Both is possible
Both makes a lot of sense
Depends on your use case
Ok. It might be nice to document one of these options.
Any hacking you want me to do on the 9573-configbaker branch? Or should I leave it alone (or hack in a separate branch)?
Oh plz go ahead and hack on it. I'm preparing a poster... :-( Deadline coming up fast
Ok. I'm trying to get some other stuff done too. I'll make noise here if I'm going to start. I promise nothing. :happy:
@Philip Durbin question: are we fine with configbaker being built alongside the app image? Would it make sense to split that up a little more?
I'm fine with how it is.
That said, configbaker doesn't have much to do with Java, does it? Perhaps we could just have a simple docker build process instead of using mvn.
No we can't.
I don't want any more bash glue around docker-compose just to get the files into place etc
Native docker is really bad at getting not-aligned-for-Docker collections of files together
The workaround is "send the complete project into the context", which means transferring all folder and subfolders during a build
(That's slow)
@Philip Durbin et al - WDYT https://dataverse-guide--9574.org.readthedocs.build/en/9574/container/configbaker-image.html
Thanks for adding build instructions!
And the examples. Mounting scripts for a persona.
We can add more as we go and see fit
There are a lot of possibilities
Only docs left, according to the strikeouts above. Want me to hack a bit and then move it to QA?
If you want we can have a cowriting session
Let's get this done and in the pipeline for Kevin :smile:
The leftover things for docs: maybe iterate on the guide page AND update the README on Docker Hub. We need to think about what we want on that page
I sort of want the README on DockerHub to just say "please read the container guide"
Fair enough!
I'll start hacking on it a little
Wait, you're changing if from ConfigBaker to Config Baker?
I think it's easier to read that way
The image tag is still configbaker because no spaces in those
Ok, but we should be consistent. I'm hacking on docs locally. Will push.
@Oliver Bertuch are you ready to make the configbaker PR non-draft?
Did you hack on that README?
I didn't because you said you're up to sth
A tiny bit.
Should we s/dev_bootstrap/dev_configbaker/?
I wouldn't. The service is doing the bootstrapping. The solr initializer uses configbaker as well. Let's stick to names of functionality
I just pushed a small commit to make things more consistent with Config Baker
I think I'm satisfied with this as is for now
Flagged as "ready to review"
Hmm, I'm sort of in git stash hell now. :flame:
I see an IT failing... DatasetsIT.testCuratePublishedDatasetVersionCommand
ah, this helping: https://stackoverflow.com/questions/8515729/how-to-abort-a-stash-pop/60444590#60444590
Ha but it wasn't us! Develop is failing with the same result! https://jenkins.dataverse.org/blue/organizations/jenkins/IQSS-dataverse-develop/detail/IQSS-dataverse-develop/1359/pipeline
Also here: https://github.com/IQSS/dataverse/pull/9558#issuecomment-1523864443
I left a comment here an hour ago: https://github.com/IQSS/dataverse/pull/9558#issuecomment-1570229407
Approved! Off to QA.
I added a quickstart. I hope you don't mind.
Ha! How dare you! :yum:
For people like me that are like "how do I run a container again? I forget"
Hey, there's already an image at https://hub.docker.com/r/gdcc/configbaker
22 days old though. I guess we shouldn't tell people to use it.
I had to create the repo at Docker Hub to set it up, set permissions etc
I can always push a new version manually (like it did with the one already there)
Configbaker is merged!!!
:partying_face:
:tada:
Including configbaker in the frontend local dev environment: https://github.com/IQSS/dataverse-frontend/commit/4034defae61d03f7217bf4ee9ff61d6338ccbe0d :wink:
Nice! Less code to maintain! :tada:
We have Config Baker now! https://guides.dataverse.org/en/5.14/container/configbaker-image.html
I'm resolving this topic. Please open new ones for followup.
Philip Durbin has marked this topic as resolved.
Last updated: Oct 30 2025 at 05:14 UTC