Stream: containers

Topic: containers for production


view this post on Zulip Jutta Schnabel (Apr 23 2024 at 12:59):

Hi! Sorry for opening this up again, but I am again not sure where to put my question :) We have finally finished our evalution and want to deploy dataverse now in production - but ideally using containers. It there still a big "don't use this in production"-flag on it all? If yes, perhaps we can contribute to create a setup for production.

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:02):

@Jutta Schnabel hi! Great question. In our documentation, we are still being cautious about recommending containers for production but there are brave souls already using them!

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:03):

Security is top of mind for me. When working on the tutorial for using containers as a demo, I made sure to show how to run the setup-all.sh script WITHOUT the --insecureflag.

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:04):

Please see https://guides.dataverse.org/en/6.2/container/running/demo.html#creating-and-running-a-demo-persona for more on this.

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:04):

Expecially this:

"One of the main differences between the β€œdev” persona and our new β€œdemo” persona is that we are now running the setup-all script without the --insecure flag. This makes our installation more secure, though it does block β€œadmin” APIs that are useful for configuration."

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:07):

You could go to the equivalent of https://demo.dataverse.org/api/admin/index/solr/schema on your server and make sure you can't reach it from outside. If you can, please see https://guides.dataverse.org/en/6.2/installation/config.html#blocking-api-endpoints

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:12):

What do others think? What am I forgetting? What do we need for the images to be production-ready? More docs? More configurability?

view this post on Zulip Johannes D (Apr 23 2024 at 13:29):

Stable images tags like 6.2/6.3 are needed for me. Besides that I'm happy with the current containers and they work great in production :)

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:30):

That makes sense. Currently our tags are "alpha" and "unstable".

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:35):

We've definitely talked about tags in recent meetings (#containers > weekly meeting ). Probably the best issue to watch is #10478. Plus we have the following topics going:

view this post on Zulip Jutta Schnabel (Apr 23 2024 at 13:36):

Great, thanks, that sounds hopeful :) We will try that out and see that we manage.

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 13:39):

Sounds good. Please keep the feedback coming!

view this post on Zulip Johannes D (Apr 23 2024 at 13:41):

Besides that I assume the different projects and their means to create images https://github.com/gdcc/dataverse-kubernetes , https://github.com/EOSC-synergy/dataverse-kubernetes , https://github.com/IQSS/dataverse-docker and https://github.com/IQSS/dataverse/blob/develop/docker-compose-dev.yml are a bit confusing for someone new to the community. Hence, my wish-list contains a clean-up, or more detailed documentation or clarification about the projects and their relationship (they all look like official GDCC/IQSS projects)...

view this post on Zulip Philip Durbin πŸš€ (Apr 23 2024 at 14:41):

@Johannes D good idea. I just created this issue: Document competing containerization efforts and how to choose #10522

view this post on Zulip Johannes D (Apr 23 2024 at 15:08):

Not entirely a container issue, but it would be nice to rework the documentation so that it is independent of the installation method. In particular, the admin and installation guides are in some cases quite specific to a particular installation method. This makes both demo/eval and production use cases more complex than they need to be.

view this post on Zulip Oliver Bertuch (Apr 23 2024 at 15:11):

#10478

Heads up I will be out all of May (as of next Monday) and will continue working on this in June

view this post on Zulip Oliver Bertuch (Apr 23 2024 at 15:13):

Johannes D said:

are a bit confusing for someone new to the community

I agree. I've been meaning to sunset gdcc/dataverse-kubernetes for some time now, but didn't have the time. A PR is welcome. (So don't delete but leave a hint in the README and archive it as read-only)

view this post on Zulip Johannes D (Apr 23 2024 at 15:13):

And documentation of the features that do not work out of the box in a Docker environment would be nice. Like RServe, make data count, or multiple dataverse instances. ... They work, but need a bit of tweaking....

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:06):

Heads up that #10672 is ready for review! It will increase production usability :smiley:

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:07):

@Philip Durbin are these container only things going trough the same process with sprint planning etc or can/should we fasttrack it?

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:09):

My rule of thumb is to look at the files change and to put it on the fast track the changes on affect containers. This one can be fast-tracked, I'd say. Please feel free to put it in "ready for review" if you like.

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:12):

Done!

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:13):

It's good to have this in place - we can expect Temurin images based on Ubuntu 24.04 to land within the next 4 to 5 weeks. Always good to be prepared!

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:13):

yeah

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:13):

I don't think anyone here can easily do this:

Suggestions on how to test this:
Run the images on a K8s cluster

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:14):

Ha! I'll edit it to say just run them in Docker :smiling:

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:15):

Do you think you could help get https://github.com/gdcc/api-test-runner/blob/main/.github/workflows/manual.yml working again to test it?

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:16):

Not sure - we're not talking Dataverse code here

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:17):

The test is done once you successfully deploy - this is infrastructure, so the application doesn't matter

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:17):

You could even try the base image with some other demo app

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:17):

Oh, even if it were working the "manual" workflow wouldn't test it?

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:17):

I dunno if we should at some point include a minimal testing app for automation of base image accpetance tests

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:18):

Well it tests that it builds (that is done in CI already), the one thing left to do is run an actual application...

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:18):

That doesn't need to be Dataverse, which is huge and clunky

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:19):

Sure, sounds very useful.

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:19):

Do you feel like it would be a good addition to this PR to have such smallscale tests around?

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:21):

Hmm, maybe? I mean, we want more testing of images before we publish them, generally.

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:21):

True!

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:21):

Do you feel we need to do this now or should we keep that for another issue/PR?

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:21):

Meh, I don't think we need it now.

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:22):

Some day I'd like to retire that api-test-runner repo and have the testing done upstream.

view this post on Zulip Oliver Bertuch (Jul 15 2024 at 14:22):

It might be a nice thing to try out https://github.com/arquillian/arquillian-testcontainers for this :smile_cat:

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 14:25):

Related: #containers > failing tests in 6.3 from api-test-runner

view this post on Zulip Philip Durbin πŸš€ (Jul 15 2024 at 19:41):

@Oliver Bertuch I removed you as an assignee from #10672. Items in "ready for review" should't have assignees so it's clear that anyone can pick them up and review them.

view this post on Zulip Philip Durbin πŸš€ (Oct 02 2025 at 14:45):

I just made this pull request:

update docs to suggest using Docker in production #11862

You can preview the docs here: https://dataverse-guide--11862.org.readthedocs.build/en/11862/installation/prep.html#choose-your-own-installation-adventure

What do you think?

view this post on Zulip Johannes D (Oct 06 2025 at 09:06):

Since we are using those containers for production for the last couple of years, those images are ready for production usage...and the guide reflects it. However, one could add the information that 'make data count' does not work that well and a section about backup & restore in a docker environment is missing.

view this post on Zulip Philip Durbin πŸš€ (Oct 16 2025 at 14:53):

Hmm. I don't feel very qualified to write about backup and restore. What if we allow #11862 to be reviewed and merged and make a PR in the future for that?

view this post on Zulip Philip Durbin πŸš€ (Oct 16 2025 at 14:54):

I'm also not sure what I would write about Make Data Count. @Johannes D if you want to make a PR into my PR, please go ahead! :smile:

view this post on Zulip Philip Durbin πŸš€ (Oct 16 2025 at 14:55):

Maybe I should try to address this as well:

Document competing containerization efforts and how to chooseΒ #10522

I do feel pretty qualified to write about that. :smile:

view this post on Zulip Oliver Bertuch (Nov 03 2025 at 15:06):

I think this would be a good addition for a production scenario. Not just in containers... :wink: #11948 CC @Leo Andreev

view this post on Zulip Philip Durbin πŸš€ (Nov 03 2025 at 15:56):

yeah

view this post on Zulip Philip Durbin πŸš€ (Nov 03 2025 at 15:57):

@Oliver Bertuch any feedback on the docs I wrote?

view this post on Zulip Philip Durbin πŸš€ (Nov 18 2025 at 13:38):

#11862 has been merged! Docker in production! Thanks for reviewing and merging, @Steven Winship!

view this post on Zulip Leo Andreev (Apr 16 2026 at 15:54):

Can I ask a dumb question about k8s. @Oliver Bertuch I get the "not quite production-ready" part. But you have been using it in your production for a while. The way our production works here, running it and keeping it alive relies on having people with ssh access going in and making various tweaks. I'm not just talking about tasks that must be done via localhost-restricted APIs. We have an ecosystem of scripts outside of the application proper. That generate reports, validate metadata, etc. etc. It's a very common scenario where I need to add another regex to the anti-spam script, for ex. If I see an aggressive bot abusing the APIs, I can quickly block the ip by adding an Apache rule. What is a proper equivalent of such real time work under k8s? It is of course possible to ssh into a pod and mess with things the same way... but that would not be ideal obviously.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 15:56):

You can work quite similar to that in Kubernetes. There is no problem running a command inside a running container (no SSH, but kubectl exec).

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 15:57):

The caveat is what's actually available to you inside a container

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 15:57):

Container images should as small as possible and as stripped down as possible

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 15:58):

This reduces attack vectors, reduces data transfer amounts and makes things stricter, tidier

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 15:59):

For example, all the bits like setup scripts and other things we need for configuration are not a part of the Dataverse image, but are in configbaker

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 15:59):

This way we don't need to polute the container filesystem of the important application with unnecessary stuff like Python etc.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:01):

It's important to keep the Kubernetes paradigm in mind: containers are not the atomic operation unit, pods are!

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:02):

And a pod is more like an atom in real life: it's made up of more things :wink:

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:02):

A pod can consist of three things: containers (protons), side car containers (electrons) and init containers (neutrons)

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:03):

Usually you'll have one main container, for example for your application

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:03):

This is accompanied by sidecars, that run alongside the main container

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:04):

These sidecars are the helpers for logging, exposing things in controlled fashions, run reports etc

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:05):

All of it usually aims to follow UNIX philosophy of "do one thing and do it well". It's cheap to have more sidecars and containers, so use 'em

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:05):

These containers bundled in a pod share a "localhost network". So a sidecar can reach something in the main container via localhost:port

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:06):

This is quite powerful, as you sometimes might want to expose something in the app server (e.g. the JMX port of Payara) to localhost only, but then reach that from somewhere else. A sidecar can be a safe bridge into that, potentially handling TLS, auth, etc.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:07):

These containers can also share any number of mounted filesystems. This way the main container can write something like a report and a sidecar can pick that up for processing

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:08):

It's also possible to add a container to a pod at runtime. This is a neat trick often used for debugging or other special purposes.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:09):

As the injected container shares the network and can mount the volumes in the pod, you can do something on the fly like running a special script you prepared and packaged in a container image.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:10):

What you usually should avoid: create a volume, store scripts there and execute as needed by entering the container. This is possible, but it violates the K8s principle of "no pets". You introduce unnecessary state, which is usually an anti-pattern.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:11):

If you need to experiment with stuff and need to develop a script, you're usually better of by creating port forwards from your development machine to the running pods/services.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:11):

That way you can code on your machine, run the script etc all without having to deal with "how do I get this into the pod"

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:12):

This obviously has it's limits when it comes to things like capturing network traffic or other low level stuff

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:12):

For the networking part you mentioned changing rules etc

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:14):

In modern Kubernetes deployments you usually will have some kind of middleware as your TLS handling and routing gateway. Formerly you'd use the "K8s Ingress API" that is being handled by an Ingress Controller. The current, newer approach that evolved from that is the Gateway API.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:15):

Any restrictions like blocking external traffic, adding things like Anubis or web Application Firewalls are handled at these levels

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:16):

It's of course possible to run another proxy like Apache or Nginx between ingress and the actual application, but I think this is less common these days.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:17):

To get the configuration into these middleware, you will need to handle either Configmaps or Custom Resources defined by and depending on the middleware.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:17):

You can either manage these yourself with kubectl

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:17):

Or follow the modern approach, using GitOps

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:18):

When starting new I'd always recommend using GitOps. Infrastructure as Code is a well known and good if not best practice now for a long time.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:18):

But again, that's just my opinion

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:19):

You are free to use other tooling, do it manually, etc

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:19):

Depends on your needs, what you and the team feels comfortable with

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:22):

I think it's fair to say that working with Kubernetes combines classic shell based administration with a lot of automation and abstraction. This requires more and new skills, but it's also more reliable than your average hacky pet. Obviously Harvard Dataverse is not a hacky pet, you all know what you're doing. But it requires a lot of context and experience to know all the details, while adopting IaC make things more reproducible and rebuildable.

view this post on Zulip Oliver Bertuch (Apr 16 2026 at 16:23):

Admins adopting K8s will probably need to adapt their way of thinking how to run a server. But that's not a bad thing - it keeps your grey matter lively :wink:

view this post on Zulip Leo Andreev (Apr 16 2026 at 17:50):

Thank you, really appreciate the info!
I may bug you with followup questions later on.
:pray:

view this post on Zulip Oliver Bertuch (Apr 23 2026 at 14:15):

Today I learned about https://github.com/kimdre/doco-cd . Very interesting! Maybe I should switch my DCM26 workshop from Flux+K8s to a potentially simpler DoCoCD?


Last updated: May 30 2026 at 09:11 UTC