This just in from ErykKul at https://github.com/IQSS/dataverse/pull/9383#issuecomment-1486899586
"I think that running the integration tests with the Dockerized Dataverse is a great idea. Maybe we could use docker for that as well? For example: https://hub.docker.com/_/docker. Right now, the aio-docker makes some assumptions about the environment that are not true for every system (I still could not run the integration tests lately...). I am not sure about this, but it looks to me that it is no longer possible to build aio-docker using latest Rocky Linux image, like specified in the docker file? I get errors about not sufficient entropy (it also takes very long time to start the Dataverse inside the aio-docker container). Also, the "ps" command is missing, preventing the solr starting properly (procps package is not installed by default). Finally, when I get everything running, then I get errors from the sword API... I am not sure how the continuous integration runs the integration tests, I was not successful in doing that (I did get them running when trying out payara 6, but I had upgraded many packages to do that, upgraded the jdk version to 17, changed some scripts, etc.).
I have tried to fix the integration test, but since I was not able to run it locally, I am waiting for the continuous integration to finish."
I had ITs running in docker-aio at one point, but ran into trouble with merging PRs and nested environment variables. I definitely had ITs running, though.
It is a long lasting dream of me to make our API tests (these are end-to-end tests, a very special and expensive kind of integration tests) use Testcontainers and a Maven profile. I already asked on the IQSS Slack if we could transform the API tests to use JUnit5, so we can create actual extensions etc and make this great. I'm all in!
But obviously, we could also run API tests "just" against a dockerized version of things, without containers. It will still need some control over the internal environment to setup and tweak certain things like stored procedures etc
Probably this is also a good first step :wink:
And of course I want ITs also to run inside a Github workflow, which might even be speedier than using Jenkins + cheap AWS EC2 instances. :smiling_devil:
Ah, should I rename this topic to "API tests"? That's really what I'm talking about.
Probabyl :smiley: Oh and didn't we talk about this somewhere already? Maybe in "sunsetting"?
Hmm no we didn't. Maybe some place else, outside of this channel, some other context
Do we have an open issue about running API tests in Docker?
Philip Durbin said:
Do we have an open issue about running API tests in Docker?
I can open one. I never committed my branch because of said nested environment variable woes.
@Don Sizemore this would be for new Docker for dev setup. If you haven't tried it yet, I'm happy to walk you through it. Either way, yes, an issue would be great!
The test for /api/info/version works just fine. :happy:
Actually, I can run a real API test just fine as well: https://github.com/IQSS/dataverse/pull/9383#issuecomment-1487151653
@Don Sizemore so maybe we don't need an issue
Or rather, maybe the issue is about:
Bah. Running all these tests at once failed. 500 errors.
Ah, but some of these failures are expected, like this one:
Caused by: org.postgresql.util.PSQLException: ERROR: function generateidentifierfromstoredprocedure() does not exist
That's because our Docker image doesn't have that sequence inserted.
That must be this test:
[ERROR] DatasetsIT.testStoredProcGeneratedAsIdentifierGenerationStyle:816 ยป IllegalArgument
Anyway, here are the errors I saw:
[ERROR] Failures:
[ERROR] DatasetsIT.testAddEmptyDatasetViaNativeAPI:362 Expected status code <403> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testCuratePublishedDatasetVersionCommand:2686 Expected status code <201> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testDeleteDatasetWhileFileIngesting:1367 Expected status code <201> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testFilesUnchangedAfterDatasetMetadataUpdate:2620 Expected status code <201> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testLinkingDatasets:2189 Expected status code <201> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testRestrictFileExportDdi:2279 Expected status code <201> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testRestrictFilesWORequestAccess:2901 Expected status code <201> doesn't match actual status code <500>.
[ERROR] DatasetsIT.testUpdateDatasetVersionWithFiles:2132 Expected status code <201> doesn't match actual status code <500>.
[ERROR] HarvestingClientsIT.testHarvestingClientRun:234 Last harvest not reported a success (took 0 seconds) expected:<SUCCESS> but was:<null>
[ERROR] HarvestingServerIT.testMultiRecordOaiSet:606 Wrong number of items on the first ListIdentifiers page expected:<2> but was:<5>
[ERROR] MakeDataCountApiIT.testMakeDataCountGetMetric:61 Expected status code <200> doesn't match actual status code <400>.
I'm on this commit: 753ee8451b
@Oliver Bertuch explained to me that we should be able to run our own base/app containers within GitHub Actions.
This is huge.
I'm not sure yet how it will work but maybe this blog post will help: https://github.blog/2022-02-02-build-ci-cd-pipeline-github-actions-four-steps/
They build Docker images and push them and then run them during API tests, I think.
Might depend a little on whether you want to push images to registries only after API tests passed or not
Could of course push to a temp registry first.
Has the advantage of splitting the runs
What if we get all this working in a small app like https://github.com/IQSS/dataverse-people ? Faster, lighter, no red tape.
I just created this related topic: https://dataverse.zulipchat.com/#narrow/stream/377090-python/topic/containers.20for.20API.20testing
Jan Range said:
Thanks! Didn't expect it to be that simple
Sounds like we have a first happy customer for the working group's output!
I suppose the action about this topic now somehow happens in #python ... I'm gonna resolve this here.
Oliver Bertuch has marked this topic as resolved.
Maybe. It would be nice to deliver on what Eryk wanted in the top post, the ability to run the REST Assured API test suite.
Oh yeah sure, but let's stick to milestones - integrations come first :-D
Philip Durbin has marked this topic as unresolved.
That's a little like saying "let's close all the open GitHub issues we aren't actively working on", isn't it? :happy:
Like Eryk, I'm interested in this too so I'm unresolving this topic.
Next steps (yes, as time allows) as I see them:
I see @Eryk Kulikowski is interested in Jenkins access: https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/jenkins.20access/near/362673296
That's fine, but longer term we should circle back to this topic of getting API tests running in containers, right? :sweat_smile:
YEs of course! Just wanted to make sure they communicate and he's unblocked
Yes, thank you, thank you.
I'm just conscious of the fact that we closed and reopened this topic. If we need to start a new topic that's more specific, that's fine.
Also, I can run an idea by you if you like.
'bout API testing? Hit me!
What if we made a new repo called "second opinion".
The first doctor said your health is not so good. You'd like a second opinion.
Not sure I'm following :sweat_smile:
We use GitHub Actions workflow dispatch or whatever it's called. We click a button to trigger it. For now at least.
It spins up the develop branch in containers and configures them. Then it runs the API tests.
The idea is you can look at Jenkins, see failures on develop, go over to Second Opinion, click the button, and see what the other doctor thinks.
Still not following?
Oh sry missed that one
I think we share the same vision but I don't believe we need a second repo for that
We can have Jenkins do Jenkins and make Github Actions do the same with containers
That's what I figured you'd say: put it in the main repo.
But! With a new repo we can merge and innovate as fast as we want. And we can retire it when the code goes in the main repo.
Ideas how to replicate the PRs?
Does Kevin do QA for Workflow files?
People seemed to detest red crosses but love green checkmarks. So as long as we don't break to many builds...
Oh I might have missed sth here: you said "develop". Not PRs
Yeah, I hate it when develop is broken.
Sometimes Jenkins is 100% right. Sometimes there's nothing wrong with develop.
Hackathon on Sunday? :smile:
Sure!
But I think we have a couple talks to write too. :sweat_smile:
Who? Me? You? :shrug:
The auth talk. The container demo.
Hey I added some bits for the auth talk already
oh, phew, thank you!
they might record it, you know
Very basic. Just some ideas.
Ugh
Last time I checked it said I'm not presenting anything
Well, the auth talk :happy: brb :smiling_imp:
I want this. https://github.com/marketplace/actions/test-reporter
Or this. https://github.com/marketplace/actions/publish-test-results
Philip Durbin said:
But! With a new repo we can merge and innovate as fast as we want. And we can retire it when the code goes in the main repo.
I was going to delete https://github.com/gdcc/dataverse soon as we have almost all from the container branches incorporated (phew!)
It already contains stuff to update branches with content from the main repo
Maybe we can reuse some of that?
Sure!
@Oliver Bertuch should I try to explain Second Opinion again?
I think I know what you're up to... :wink:
BTW this run took 44 minutes to complete on Jenkins :rolling_eyes:
It would be nice to have at least one test runner saying everything is fine.
After waiting another 40 minutes, it says: all good!
Phew!
I finally got around to creating https://github.com/pdurbin/dataverse-api-test-runner
My current problem is that I'm running FitsIT which uploads a file. I get this error:
{"status":"ERROR","message":"There was an error when trying to add the new file. Temp directory is not configured."}
Here's my GitHub Action as of this writing: https://github.com/pdurbin/dataverse-api-test-runner/blob/f4cce7e919846ac14acaad5de2ca2ce41b36936b/.github/workflows/test.yml
It ends with this:
- name: Start Dataverse containers
run: |
cd dataverse
docker compose -f docker-compose-dev.yml up -d
- name: Run API tests
run: |
cd dataverse
mvn test -Ddataverse.test.baseurl=http://localhost:8080 -Dtest=UtilIT,FitsIT
[#|2023-07-18T14:57:33.342+0000|SEVERE|Payara 5.2022.5|edu.harvard.iq.dataverse.util.FileUtil|_ThreadID=85;_ThreadName=http-thread-pool::http-listener-1(2);_TimeMillis=1689692253342;_LevelValue=1000;|
Failed to create filesTempDirectory: /dv/temp|#]
[#|2023-07-18T14:57:33.345+0000|SEVERE|Payara 5.2022.5|edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper|_ThreadID=85;_ThreadName=http-thread-pool::http-listener-1(2);_TimeMillis=1689692253345;_LevelValue=1000;|
There was an error when trying to add the new file. Temp directory is not configured.|#]
[#|2023-07-18T14:57:33.346+0000|SEVERE|Payara 5.2022.5|edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper|_ThreadID=85;_ThreadName=http-thread-pool::http-listener-1(2);_TimeMillis=1689692253346;_LevelValue=1000;|
java.io.IOException: Temp directory is not configured.|#]
I'm having deja vu. I'm pretty sure we talked about this before.
Right, stuff like ${RUNNER_TEMP}/app/data:/dv at https://github.com/IQSS/dataverse-sample-data/blob/bbc104832ab0b55c2d9340df7f73f83e401ee3f5/docker-compose-ci.yml
Related conversation: https://dataverse.zulipchat.com/#narrow/stream/377090-python/topic/containers.20for.20API.20testing/near/355434190
Oliver Bertuch said:
BTW this run took 44 minutes to complete on Jenkins :rolling_eyes:
Out of curiosity here (we had a similar problem, and resolved it, a few years back); how are tests presently organized? Are they split into separate suites/modules or are they monolithic in nature?
Yes, modules, well Java classes. Please see how https://github.com/IQSS/dataverse/blob/v5.13/conf/docker-aio/run-test-suite.sh references the classes in this file: https://github.com/IQSS/dataverse/blob/v5.13/tests/integration-tests.txt
So if I'm reading this correctly (correct me if I'm wrong, I'm not a Java native), these are run sequential? If so; would it make sense to split this file into separated test suites? You could run those suites parallel.
Background anecdote is that we had this in Python as well; build took 1.5 hours to complete on account of needing to run ~2000 tests, including some very, very fixture-heavy xml test suites. We split those out as we were revamping our runner infrastructure (migrated from Jenkins to autoscaling runners on DigitalOcean using Gitlab Bastion), and as a result, could run those build steps in parallel. Build took about 30 minutes afterwards and we could run several at a time as opposed to creating gridlock on Jenkins.
Sure. It probably makes sense to try running API tests in parallel. That would be a good topic for #dev . This topic is about running the API test suite in Docker and GitHub Actions. I'm struggling with the latter.
@Guillermo Portas thanks for reminding me about how you're doing something similar. I just copied your config from https://github.com/pdurbin/dataverse-api-test-runner/commit/c7d1fef over to my repo and the (only a few so far) Rest Assured tests passed: https://github.com/pdurbin/dataverse-api-test-runner/actions/runs/5612377863/jobs/10270069291
Thomas van Erven said:
So if I'm reading this correctly (correct me if I'm wrong, I'm not a Java native), these are run sequential? If so; would it make sense to split this file into separated test suites? You could run those suites parallel.
Background anecdote is that we had this in Python as well; build took 1.5 hours to complete on account of needing to run ~2000 tests, including some very, very fixture-heavy xml test suites. We split those out as we were revamping our runner infrastructure (migrated from Jenkins to autoscaling runners on DigitalOcean using Gitlab Bastion), and as a result, could run those build steps in parallel. Build took about 30 minutes afterwards and we could run several at a time as opposed to creating gridlock on Jenkins.
Yes, they're sequential by default. Just be careful of tests such as the one Jim mentioned in Slack (concerning the :publicInstall DB setting) which may interfere with one another.
Yeap, non-avoidable interdependency between tests due to order or database state is the limiting factor for parallelization. You will always have n test suites, where n is interdependent test suites.
I got a good number of API tests to run on the Payara 6 branch: https://github.com/pdurbin/dataverse-api-test-runner/actions/runs/5649829177/job/15304980462
Heads up that we need https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/pull/264 in container-land for one of the harvesting tests to pass.
Discussion at https://github.com/IQSS/dataverse/issues/9457#issuecomment-1650561229
I just ran the API tests against the develop branch: https://github.com/IQSS/dataverse/issues/9457#issuecomment-1650852969
All tests are passing on the Payara 6 branch. It has a fix that develop and master don't having to do with harvesting. See #containers > error running a harvest
https://github.com/pdurbin/dataverse-api-test-runner is by no means perfect but shows that the full API test suite can be run against containers. I'm closing this long topic. We can open up fresh topics.
Philip Durbin has marked this topic as resolved.
Hi all, not sure if this is the correct stream or not. Actually I was trying to run the tests in BuiltinUsersIT file. I started the application in docker and then ran these tests but all tests that are hitting the create API are giving this error message:
{
"status": "ERROR",
"message": "Dataverse config issue: No API key defined for built in user management"
}
Can anyone help me out?
@Philip Durbin
Philip Durbin has marked this topic as unresolved.
Sure. Let's use this stream. :grinning:
It's strange that the key is not defined. It should be "burrito". Given your trouble over in #containers > gitattributes files and windows auto conversion I'm worried that a later script didn't run.
You had to fix the first script manually, right?
There are a lot of scripts. :sweat_smile:
yes I did
haha got it, I'll check logs once if any other script got the same error
That's fine but it could get a bit tedious.
Maybe we should resolve the line endings thing first.
On opening the localhost:8080 also I'm getting error
login-page-error.png
Right, I'm assuming the root collection wasn't created.
Probably there will be lots of errors in the logs. Instead of ending with "have a nice day".
Ohh, okay then I'll try and fix the line ending issue first
Awesome. Thanks. Sorry for the trouble! You get the pain but (hopefully) it will be easier for others in the future!
I understand :)
@Philip Durbin even after resolving the line end issue I'm getting this same error on running the tests in BuiltinUsersIT file :face_with_diagonal_mouth:
although now the localhost:8080 is opening and not giving error, but tests are giving error
Can you please try:
rm -rf docker-dev-volumes"Note that data is persisted in ./docker-dev-volumes in the root of the Git repo. For a clean start, you should remove this directory before running the mvn commands above." -- https://guides.dataverse.org/en/6.0/container/dev-usage.html
Removing the directory will delete your settings. Then, when you try again, that key you need should be inserted.
Hi @Philip Durbin , tried these steps but still getting the same result :face_with_diagonal_mouth:
D'oh! ![]()
What's the status of your Dataverse installation? Can you see the root dataverse in a browser? Can you log in as dataverseAdmin? (password: admin1)
I am able to see this screen on localhost:8080
dataverse_homepage.png
But when I open the administration console and tried to login using the credentials in the doc I got error that incorrect username or password
Oh. That's the Payara screen.
You should see something more like this: https://demo.dataverse.org ... a running Dataverse instance.
You probably see a lot of output in your terminal. Somewhere in there might be a useful error message. I hope. :sweat_smile:
@Sakshi Jain when you have a minute... perhaps you could start a new topic about the trouble you're (unfortunately) having setting up a dev environment. We're happy to help you out! Then we can circle back to actually running API tests.
@Sakshi Jain any news? We're happy to help you get a dev environment set up!
Hi @Philip Durbin , I was sick so haven't been able to work on this since we last talked. Will try running this again today.
Oh! I'm sorry to hear! I hope you're feeling better!
Last updated: Oct 30 2025 at 05:14 UTC