As discussed during yesterday's Containerization Working Group meeting, there are now containers available for API testing!
In the past I've usually pointed authors of API client libraries to https://demo.dataverse.org for testing a live Dataverse installation, but is there interest in using GitHub Actions or similar to spin up Dataverse in a container for API testing?
Here's the newly published container: https://hub.docker.com/r/gdcc/dataverse
Please be advised that the container requires setup! For now we've only documented the setup for developers (and we plan to improve these docs) but as explained in the Dataverse Containerization doc, we have an item under milestone B to document how client library authors can use these containers. Here's what is says for now:
"Document limitations: must clone main repo (or at least download the necessary files, similar to dvinstall.zip or download the whole repo as a zip), must run docker-final-setup.sh"
Any brave souls out there who want to give this a try? :sweat_smile: I'm happy to help!
Philip Durbin said:
yes, i'm happy to help test, though it likely won't be for the next few days.
Happy to test it!
To be honest, I think we might need @Oliver Bertuch to explain (again) how to spin up our own containers within GitHub Actions. In the doc, it's worded like this (under milestone D):
"Smoke test within GitHub Actions: deploy and bootstrap, make logs available (Size: 33)"
"This smoke test here is about creating a first step to see if the deployment works at all and shall serve as intermediate step towards the full thing in Milestone G."
@Don Sizemore @Jan Range have either of you tried running the containers on a dev laptop or whatever? I'm happy to walk you through it. Now that the containers are out there you shouldn't even need Java installed. You will have to clone the main repo, though. That's where the docker compose file is and some config and scripts you'll need.
Maybe this will help? https://github.blog/2022-02-02-build-ci-cd-pipeline-github-actions-four-steps/
Code at https://github.com/open-sauced/open-sauced
I was trying to spin up the container using mvn -Pct clean package but it failed. Got the following error message:
Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project dataverse: There are test failures.
Maybe its also due to my MacBook, will try it on an Ubuntu server.
That is very odd as testing is usually deactivated in this profile to speed up things...
It's unlikely your MacBook as a cause
If you're feeling lucky you could try with compose only
The dataverse image is avail from the hub
It might be a good idea to just copy the compose file from the IQSS repo and add that to your project... It will fail for now with the Solr things, but you could temporarily copy those over until we have that stuff sorted out
Would it help you lot if I create a demo repo someplace?
So we have some small playground?
I did checkout the develop branch, maybe this is the cause?
Trying it now with the Dockerhub image
Is it working? Is no news good news? :happy:
Sorry I needed to leave. Tried it now using mvn -Pct docker:run but I ran into the following error:
DOCKER> Error occurred during container startup, shutting down...
[ERROR] DOCKER> I/O Error [Unable to pull 'iq/dataverse:5.13' : {"message":"pull access denied for iq/dataverse, repository does not exist or may require 'docker login': denied: requested access to the resource is denied"} (Not Found: 404)]
I have then spun up the Dockerhub image using the unstable tag and mapped ports 8080:8080 that led me to the Payara page instead of Dataverse. Am I doing something wrong? Also ran the docker-compose-dev.yml which started solr, smtp and postgres succesfully, but that didnt help.
That is VERY strange. I have no idea where that image name would be coming from! I see it written out as gdcc/dataverse:unstable
Just starting up the image itself is not sufficient, you need at least Postgres.
That way the app will at least deploy, also bootstrapping and access will fail without also having Solr tuned, set and ready to go
It's so funny to hear of all these problems of running these containers when at the same time I never see those :grimacing:
Probably I should just go ahead and create that demo project. Anyone interested to reuse it for some integration thing can grab the code and go for it
I tried pulling it from the Dockerhub link Phil supplied and it didnโt work - Some manifest was missing. So I checked the tags and unstable was available. Not sure, maybe I am doing something wrong. Either way, Postgres etc. worked using docker-compose.
I started a pull request at https://github.com/gdcc/pyDataverse/pull/158 that I hope will help.
At the very least, you should get a response from /api/info/version. This is what I see: {"status":"OK","data":{"version":"5.13","build":null}}
I haven't tried the next steps of actually setting up Dataverse, which involves running docker-final-setup.sh as explained at https://preview.guides.gdcc.io/en/develop/container/dev-usage.html
Starting from a blank Ubuntu 20.0 server both maven steps were succesfully executed. I am also able to retrieve the version:
ubuntu@openproject:~/dataverse$ curl "http://localhost:8080/api/info/version"
{"status":"OK","data":{"version":"5.13","build":null}}
The docker-final-setup.sh also ran smoothly without any errors. Here is the output to inspect the metadatablocks:
ubuntu@openproject:~/dataverse$ curl "http://localhost:8080/api/metadatablocks"
{"status":"OK","data":[{"id":1,"displayName":"Citation Metadata","name":"citation"},{"id":2,"displayName":"Geospatial Metadata","name":"geospatial"},{"id":3,"displayName":"Social Science and Humanities Metadata","name":"socialscience"},{"id":4,"displayName":"Astronomy and Astrophysics Metadata","name":"astrophysics"},{"id":5,"displayName":"Life Sciences Metadata","name":"biomedical"},{"id":6,"displayName":"Journal Metadata","name":"journal"}]}
The one thing that may have caused the error on my machine was running everything not as sudo. Usually I do not need to supply this using Docker on my local machine, but I think this may be a good addition to the docs. Also, is it possible to pass flags to mvn -Pct docker:run such as -d to run it in daemon mode? Here on my server it is running within the session and closing it may shutdown the container.
I am having problems with the ports at the moment, thats why I am not able to support a working URL, but this has nothing to do with the Dataverse instance but our BWCloud server - 8080 somehow is always a bit tricky. I have another private server where I will also test it and hopefully can supply a working URL to inspect the UI.
Anyway, looks great so far! Awesome work @Oliver Bertuch @Philip Durbin and to the Container Working Group :tada:
Tried it on my IONOS server with OS AlmaLinux 8 installed using the following Java/Maven setup:
Apache Maven 3.5.4 (Red Hat 3.5.4-5)
Maven home: /usr/share/maven
Java version: 1.8.0_372, vendor: Red Hat, Inc., runtime: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.372.b07-1.el8_7.x86_64/jre
Default locale: en_US, platform encoding: ANSI_X3.4-1968
OS name: "linux", version: "4.18.0-425.10.1.el8_7.x86_64", arch: "amd64", family: "unix"
I got the following error message, can you help me to resolve this?
Ran yum update again to be sure. Trying again now. It seems that some class has been compiled with a recent version of Java yet the one installed does not supply it (class file version 55.0 vs 52.0). Its a little sus, but I am no Java expert to fix this rn.
Updated JDK to 11 on CentOS and set JAVA_HOME to point to JDK 11, but that didn't help either. This may be exclusive to CentOS.
Huh. Let me start by pasting the error inline (minus some repeated lines about URLs) formatted a bit for readability:
[ERROR] Failed to execute goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision (retrieve-git-details) on project dataverse: Execution retrieve-git-details of goal io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0:revision failed: Unable to load the mojo 'revision' in the plugin 'io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0' due to an API incompatibility: org.codehaus.plexus.component.repository.exception.ComponentLookupException: pl/project13/maven/git/GitCommitIdMojo has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
[ERROR] -----------------------------------------------------
[ERROR] realm = plugin>io.github.git-commit-id:git-commit-id-maven-plugin:5.0.0
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = file:/root/.m2/repository/io/github/git-commit-id/git-commit-id-maven-plugin/5.0.0/git-commit-id-maven-plugin-5.0.0.jar
(...snip...)
[ERROR] urls[24] = file:/root/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar
[ERROR] Number of foreign imports: 1
[ERROR] import: Entry[import from realm ClassRealm[project>edu.harvard.iq:dataverse:5.13, parent: ClassRealm[maven.api, parent: null]]]
[ERROR]
[ERROR] -----------------------------------------------------
Do you think testing this on Rocky would help, since's it's also a RHEL derivative? Maybe it's similar enough to AlmaLinux? We have an issue here about Rocky and Docker: https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/issues/268
You shouldn't need sudo, by the way. I never need it on my Mac anyway.
As an alternative to mvn -Pct docker:run you can run docker-compose -f docker-compose-dev.yml up and yes, you can add -d if you like. Should we add -d to https://github.com/gdcc/pyDataverse/pull/158 ?
Mentioning docker compose is already on our todo list at https://github.com/IQSS/dataverse/issues/9540
Anyway, @Jan Range it seems like you're seeing some success if the "final setup" script worked in at least one environment! :tada: Thanks for testing!
Yea, I realized that it didnt make any difference using sudo on my machine. It just doesn't want to work on it unfortunately.
Regarding the flag, I'd say that if the inclusion of the -d flag within the maven command requires too much development, mentioning the docker compose alternative in the docs would also be nice for those who aren't aware of it.
Regarding Almalinux, sure that should help! I suspect that this may not be a complete Docker issue but something with Maven since the error is coming from classes that are compiled for newer versions.
@Jan Range sounds good. So, zooming out a bit, how could these Docker containers be useful for you with pyDataverse or EasyDataverse?
Would you run the containers on your laptop and point API tests at them?
Would you want to run this in some sort of CI like GitHub Actions?
How can we help you reach the next step, whatever that is? :happy:
They'd be very helpful for local and CI testing purposes. Especially, since the new connect functionality depends on the installation itself, its the perfect timing for the containers. The same also applies for pyDataverse.
For the local testing, it would be great if I could get the container running on my machine. Otherwise I can also use my server to point it to, but having it locally would be preferable. For a CI, spinning a Dataverse instance up within a GitHub Action would be the natural step after local testing. Have you already tried to do that? Otherwise we may open up a dummy repo to work on an action.
For the sake of generality, an action with the sole purpose to start a docker instance would be enough. After that any subsequent step in the action can talk to the instance. Thus, other applications could also benefit. What do you think?
Sure. A dummy repo sounds good. What would it do? I guess an MVP would be a shell script that does a curl of /api/info/version :happy:
Yes, that would be a start!
How does this look? https://github.com/gdcc/container-test
I just pushed a stub (same as from the pyDataverse PR) for local dev.
And maybe from here we can start hacking on GitHub Actions?
Looks great! Thats a good start :-)
Cool. I made you an admin @Jan Range :happy:
But do either of us know anything about GitHub Actions? :sweat_smile:
No DevOps expert, but lets give it a try! Found info on this using services. Would try this first.
Section "How to Run Services in Containers During a Workflow?" - https://yonatankra.com/2-ways-to-use-your-docker-image-in-github-actions/
Seems like a reasonable blog post to follow, sure!
I got the container running within the action! Wanted to get the version, but there needs to be some "off-time" to let the instance start before running curl.
Wow! Already? Go, go, go! :rocket:
I'm watching https://github.com/gdcc/container-test/actions/runs/4854159295/jobs/8651177685 live
:popcorn:
120 seconds may be a bit much :sweat_smile:
Success!!!
{"status":"OK","data":{"version":"5.13","build":null}} :tada:
@Jan Range great job!
Thanks! Didn't expect it to be that simple
expect a raise in your next paycheck
So what do we try next? Clone the repo before staring the containers? And run the final setup script after?
Or we could download the repo as a zip if that's easier or faster.
Or download just the files we need.
Yes, that would make sense! Maybe then curl the metadatablocks?
You mean curl the metadata blocks at the very end to make sure they were loaded properly? Sure!
Yes, currently trying it :-)
./dataverse/scripts/dev/docker-final-setup.sh: 3: set: Illegal option -o pipefail
I receive this error, when running the docker script. Do you know what the error could be? This is the step in the action after the DV repo has been cloned.
- name: Run docker script
run: |
./dataverse/scripts/dev/docker-final-setup.sh
Weird! Could it be line endings? https://stackoverflow.com/questions/35352955/set-illegal-option-on-one-host-but-not-the-other
Tried it with bash and it finally works!
Here is the latest run that worked:
https://github.com/gdcc/container-test/actions/runs/4854627884/jobs/8652228977
Bundling this into a shareable action now would be the icing on the cake :star_struck:
Fantastic! You make it look easy! Oh, right, that's your branding. :happy:
Thanks :heart:๏ธ It were quite a lot of commits though :sweat_smile:
Heh. So now what? Put this in EasyDataverse and run a real test or two?
Yes, that sounds great!
Will try this the next days. Need to go to sleep now :sleeping:
Good night! Thanks again! ![]()
@Jan Range thanks for opening https://github.com/gdcc/container-test/pull/1
Meanwhile, I've been trying to build on your work, playing around at https://github.com/IQSS/dataverse-sample-data/actions/runs/4856813925/jobs/8656732449 (one of many runs) but I'm getting a strange error:
{'status': 'ERROR', 'message': 'There was an error when trying to add the new file. Temp directory is not configured.'}
Hm this might be something special to the runner. Found that the runner has a dedicated temp that needs to be used:
https://github.com/actions/toolkit/issues/518
https://docs.github.com/en/actions/learn-github-actions/contexts#runner-context
Good Lord! Being a holiday for one day and you guys turn on the heat here!
Glad you enjoy the experience so far @Jan Range
Working hard on improving it further :smile:
:fire::fire::fire:
@Philip Durbin solved the "waiting" issue. There is an action that pings an endpoint until a certain status code is received.
@Jan Range great! Have you tried uploading a file? Do you get the same "Temp directory is not configured" error?
Shiro from the R client is excited, by the way: https://github.com/IQSS/dataverse-client-r/issues/4#issuecomment-1530287863
Not yet, but I saw your message regarding the temp error. I think this may be exclusive to GitHub's runner. Will give it a try and see if I can reproduce the error.
Awesome. Thanks!
@Oliver Bertuch might know how to fix it. Maybe we just need to set an environment variable before we run docker compose up.
There is only 1 reason how this error message can be triggered
The error is based on this exception
This condition can only reached via this error handling, which means: Dataverse has no write permissions to the directory
You need to check where we can write in a runner so the volumes are actually usable
Such things are hidden in the container logs... One would need to add a step with if: always() to make sure they get extracted and uploaded as artifacts of a run
Maybe even make it distinct artifacts for Postgres, Solr and Dataverse
No write permission! :grimacing:
Note: GitHub Actions must be run by the default Docker user (root). Ensure your Dockerfile does not set the USER instruction, otherwise you will not be able to access GITHUB_WORKSPACE.
This is for Docker actions. So we may find a way around that in our case
Idea: how about we use a tmpfs mount instead of the volume mapping? This will be ephemeral anyway.
(This might be a problem for some test that checks for the presence of files uploaded in the volume directly. But maybe that's an OK limitation?)
All hacks welcome!
I tried a couple things but they didn't work:
Hmm, I'd dumping the logs now (server.log). It says,
Failed to create filesTempDirectory: /dv/temp
https://github.com/IQSS/dataverse-sample-data/actions/runs/4867550949/jobs/8680232083#step:10:1527
So maybe I'm not configuring this correctly with MPCONFIG. :thinking:
@Oliver Bertuch any ideas for me? My most recent commit:
I can't seem to change it from /dv/temp.
May I mess around with your PR branch @Philip Durbin ?
@Oliver Bertuch of course!
Please go for it! Thanks!
Here you are https://github.com/IQSS/dataverse-sample-data/actions/runs/4872123218
I did tweak already a few different other places...
Maybe we can copy that to https://github.com/gdcc/container-test as well
Fantastic! Thanks, @Oliver Bertuch :heart:
@Jan Range this might be of interest for you as well. Here's the diff. https://github.com/IQSS/dataverse-sample-data/compare/master...container-test
This part seems important:
volumes:
- ${RUNNER_TEMP}/app/data:/dv
Crucial, not just important.
So is the container sort of hard coded to use /dv?
OK to be fair: it might also just work with the standard GITHUB_WORKSPACE, but it seems good practice to go for the temp dir that is distinctly documented
The other crucial part is probably the initializer container that tweaks the ownership of these folders...
@Oliver Bertuch I just pointed to your work from this comment https://github.com/IQSS/dataverse-frontend/pull/85#issuecomment-1533030542
Which builds upon work by @Jan Range of course. :heart:
Awesome!! Happy to see its working now :heart:๏ธ
Firing up the EasyDataverse tests now :fire:
Heh. Great!
One of the finalists for the Inkscape 1.3 about screen contest was :fire: set the world on fire :fire:. Not that I endorse this. And it didn't win.
I just added all this great work to the agenda for tomorrow's Containerization Working Group meeting: https://docs.google.com/document/d/1U3yvg9yG5Wnm_tQkDLf5XyREYyFVoRE4_-UvnxuryVE/edit?usp=sharing
Haha "great work"... for now it's a lot of red tape and bandaids holding together some pieces that fall apart the second you look away...
Sorry to sound pessimistic, but I fear there is a lot of more road to cover before champagne
Lets hope for the best :-)
I'm looking forward to additional German engineering! :flag_germany:
"This looks awesome." -- https://github.com/IQSS/dataverse-client-r/issues/4#issuecomment-1533597903
Philip Durbin said:
So is the container sort of hard coded to use
/dv?
I confirmed with @Oliver Bertuch today that /dv/ is not hard coded. Phew. :sweat_smile:
@Jan Range so are you thinking you'll wait for the proposed Dataverse GitHub Action ( https://dataverse.zulipchat.com/#narrow/stream/377090-python/topic/Dataverse.20GitHub.20Action ) before using containers for API tests?
I'm asking because I'm wondering if there's interest from you or others in using config above from sample data more or less as-is, bandaids and all. :happy:
Philip Durbin said:
Philip Durbin said:
So is the container sort of hard coded to use
/dv?I confirmed with Oliver Bertuch today that
/dv/is not hard coded. Phew. :sweat_smile:
Of course this is not hard coded, it's simply using a sane default. You can set this stuff via MPCONFiG if you'd want to
That said, usually in the container world people benefit from clear structured paths that are documented, so it's easier to follow
This is because you'll always have this layer of indirectness with the volumes you need to mount
So kind of fixing this by providing sane defaults saves headaches for devops that deploy this stuff
But again, nothing is hard-coded here. Just smoothing the experience
I have added some tests and cleaned up the repo. I think the action is ready to be used for testing pyDataverse :raised_hands:
https://github.com/gdcc/dataverse-action/pull/8
Last updated: Nov 01 2025 at 14:11 UTC