@Philip Durbin @Oliver Bertuch do you have experience in sharing an action to the marketplace? Those that I did always included a script or something. Is it possible to just share the yml in its current form?
@Jan Range no but @Ana Trisovic does!
Check out https://github.com/marketplace/actions/dataverse-uploader-action !
Oh nice! Thanks for sending
What would you like to achieve with that action?
I would like to bundle all of this, so that end users can simply call this as a step. Nothing fancy, just to reduce boilterplate.
steps:
- name: Setup Dataverse
uses: IQSS/dataverse-dev@master
Well we can create a composite action for this https://docs.github.com/en/actions/creating-actions/creating-a-composite-action
But we would need to think about stuff like variables etc
Certainly those would be inputs...
And I'm not sure if the compose thing is the best way to go here
It's hard to put in env vars for the container
Ah wait, if we go for an action, we might create a compose file that references an env_file in the flesh... That might work for generating these vars before we start the container
OK so we need as inputs:
Anything else?
@Guillermo Portas would such an action be of use for you as well?
@Oliver Bertuch Of course! That action would be useful for running frontend e2e and integration tests in an automated way through GitHub actions
@Oliver Bertuch is configbaker part of this?
Absolutely
That's what will provide the hard part of bootstrapping et al
Ok, so it's a dependency, basically. Thanks.
@Jan Range and I talked about this briefly today. Stay tuned! :big_smile:
We ended up creating https://github.com/gdcc/dataverse-action (a private repo) but no action there yet. :big_smile:
Philip Durbin schrieb:
We ended up creating https://github.com/gdcc/dataverse-action (a private repo) but no action there yet. :big_smile:
Hi Phil! Sorry for the delay, I was on vacation until today. Going to push the action today :-)
I just pushed the action to the repository. Here is an example run
https://github.com/gdcc/dataverse-action/actions/runs/6221210931/job/16882733005
I am running into issues when creating a dataverse:
{"status":"ERROR","message":"Can't find dataverse with identifier='1'"}
Tried it also with root, but no chance. My guess is that the respective dataverse hasnt been created yet - Container setup might lagging. If that is the cause, is there an API endpoint I can use to check?
It worked once though, but at that point the builtin user I have created does not have permission yet. How can I set this up? The given API_TOKEN should enable testers to fully operate without permission issues.
@Jan Range should we talk?
I was able to resolve the issue by adding a health check for the dataverse URL ".../api/dataverses/root", but now there is a new error message:
๐ Creating Test Dataverse
147
{"status":"ERROR","message":"Command edu.harvard.iq.dataverse.engine.command.impl.CreateDataverseCommand@5424d2a6 failed: Exception thrown from bean: jakarta.ejb.EJBTransactionRolledbackException: Exception thrown from bean: org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException: Error from server at http://solr:8983/solr/collection1: SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Can't load schema /var/solr/data/collection1/conf/schema.xml: Plugin init failure for [schema.xml] fieldType \"text_ws\": Plugin init failure for [schema.xml] analyzer/tokenizer \"whitespace\": [schema.xml] analyzer/tokenizer: missing mandatory attribute 'class'"}
We should tweak a few things probably...
@Oliver Bertuch sounds great! I have a meeting now and will be available again around 14.00. Would that work for you?
Sure, should work
Alright, does WebEx work for you? Otherwise we can also use Zoom if you have a free room or sth
I can provide a Zoom room
:+1:
https://fz-juelich-de.zoom.us/j/69281413267?pwd=bDZGbFVGM1A3U0lkT3JjOE9WdkpNQT09
@Jan Range @Philip Durbin
Joining in a minute :-)
See ya on the other side
So we worked on the action a little and created two issues: https://github.com/gdcc/dataverse-action/issues/2 and https://github.com/gdcc/dataverse-action/issues/1
We want functionality from upstream! :smiley:
2 messages were moved here from #python > downloading a file by dataset DOI and filename by Philip Durbin.
Interesting issues. :big_smile:
Say more...? :thinking:
I'm just looking forward to hearing more about it. I just put it on the agenda for Thursday: https://ct.gdcc.io :big_smile:
Here's an idea for shipping the postgres/solr version: how about we use labels/annotations?
No objection.
We could use sth like "org.dataverse.deps.postgresql=13" and "org.dataverse.deps.solr=9.3.0"
These can be added when building the image, Maven will provide those
Sure, sounds fancy. :monocle:
This information definitely is metadata about the application, so it should be treated as metadata
That's what annotations are supposed to be used for
@Oliver Bertuch since we've added the bootstrap to the action, will the API Token be present in the environment? I remember that this will be written to it if I am not mistaking. If so, we could add a small step moving it into the Action Env.
I definitely wanted to make the API token available as an output of the action
But I don't think it'll be around yet
Need to look at the scripts again - that token is extracted at some place
So we should be able to expose it
Just checked the scripts and it is actually exported. Hence, should be available at runtime.
We could write it to the GitHub env by using this line right after bootstrapping:
echo "API_TOKEN=$API_TOKEN" >>$GITHUB_ENV
That way it is accessible in successive steps within the action. Shall I add and test it?
We could, but that is only within the action!
Alright, so you were thinking of having this as a general output of the bootstrapping alongside other outputs? Makes sense
Wait - actually what you are proposing... I doubt it'll work
The token is exposed within the script running in a container
That means the outer layer of our action will not know about it
Oh okay, that makes sense. So there needs to be a way to extract it.
Stackoverflow ftw :-D
https://stackoverflow.com/questions/34051747/get-environment-variable-from-docker-container
No sir, that is not going to help us... The bootstrap container will not be running any longer after it's done
Hm thats unfortunate :-D
Do you need the dataverseAdmin API token? To run tests or whatever, can you just create a new user? If the admin API isn't blocked that user can even make themselves a superuser.
So we could mount some dir to drop a file into
Let me evaluate one other simple idea
We could use another output channel
We have fd 0, 1 and 2 by default. Let me see if I can make use of 3 to output something
output redirection for the win
@Philip Durbin yes, we were thinking of the bootstrap process to provide the API Token right after finishing. Done it the way you described too, but havent tried to permit super admin rights. How can that be done?
Hmm no we need to mount a file or directory
https://guides.dataverse.org/en/6.0/api/native-api.html#make-user-a-superuser
OK this needs upstream support...
I'm going to create a new issue in upstream
So I added https://github.com/IQSS/dataverse/issues/9933 and created https://github.com/IQSS/dataverse/pull/9935
Have a look :-)
We probably need to get some stuff merged to continue with the action...
"Modify compose file to mount an existing file into the bootstrap container"... should we go ahead and add this to the dev compose file? Maybe commented out?
Hmm. This use case might be far stretched from what we do in the main repo. Currently, I can't think of a usage there... But I might be wrong.
Ok. In the PR can you show a diff of what you mean?
You mean an example in the description?
Like this?
diff --git a/docker-compose-dev.yml b/docker-compose-dev.yml
index ab44dbc180..d6129cf738 100644
--- a/docker-compose-dev.yml
+++ b/docker-compose-dev.yml
@@ -41,6 +41,8 @@ services:
command:
- bootstrap.sh
- dev
+ - -e
+ - /tmp/envfile.txt
networks:
- dataverse
Close. Order is wrong. And it needs a volume mount for the file.
This why I'm asking for a diff :sweat_smile:
Hrhr
dev_bootstrap:
container_name: "dev_bootstrap"
image: gdcc/configbaker:unstable
restart: "no"
command:
- bootstrap.sh
+ - -e
+ - /tmp/envfile.txt
- dev
+ volumes:
+ - ./envfile.txt:/tmp/envfile.txt
networks:
- dataverse
But it is important to create that file on the host first - otherwise you will end up with a folder called "envfile.txt"... :wink:
Uh oh:
$ file /tmp/envfile.txt
/tmp/envfile.txt: empty
oh dev needs to be last
closer, I think, but...
dev_bootstrap> File /tmp/envfile.txt not found, is a directory or not writeable
Did you mount the file?
$ file /tmp/envfile.txt
/tmp/envfile.txt: empty
$ cd /tmp/envfile.txt
-bash: cd: /tmp/envfile.txt: Not a directory
$ ls -l /tmp/envfile.txt
-rw-r--r-- 1 pdurbin wheel 0 Sep 19 14:05 /tmp/envfile.txt
diff --git a/docker-compose-dev.yml b/docker-compose-dev.yml
index ab44dbc180..82f320c1e9 100644
--- a/docker-compose-dev.yml
+++ b/docker-compose-dev.yml
@@ -40,7 +40,11 @@ services:
restart: "no"
command:
- bootstrap.sh
+ - -e
+ - /tmp/envfile.txt
- dev
+ volumes:
+ - ./envfile.txt:/tmp/envfile.txt
networks:
- dataverse
I'm glad we're talking about this here and not in the PR. :sweat_smile: Chat is so nice.
The container should be running as root, right? So it should have the required permissions...
/Library/PrivilegedHelperTools/com.docker.vmnetd is running as root but all other docker things from ps aux are running as my user, pdurbin.
Interesting - when I mount the file to /tmp, it doesn't work on my machine either!
It did work when I mounted to /
dev_bootstrap> /scripts/bootstrap.sh: line 72: /tmp/envfile.txt: Permission denied
When you say / do you really mean / of my whole file system? Or do you mean I should put my envfile.txt in the root of my git clone of dataverse?
No, I mean / of the container
please show me a diff
I suppose the /tmp folder has the sticky bit set.
Haha yes that is a security feature
https://stackoverflow.com/a/70894162
So better not mount to /tmp :see_no_evil:
I'm happy to try whatever, a diff would help
It certainly explains why the checks did not fail but the permission still was denied
Just remove /tmp
All of the above but s#/tmp##
like this?
diff --git a/docker-compose-dev.yml b/docker-compose-dev.yml
index ab44dbc180..1d155561ca 100644
--- a/docker-compose-dev.yml
+++ b/docker-compose-dev.yml
@@ -40,7 +40,11 @@ services:
restart: "no"
command:
- bootstrap.sh
+ - -e
+ - envfile.txt
- dev
+ volumes:
+ - ./envfile.txt:envfile.txt
networks:
- dataverse
trying it
[ERROR] DOCKER> I/O Error [Unable to create container for [gdcc/configbaker:unstable] : {"message":"invalid volume specification: '/Users/pdurbin/github/iqss/dataverse/envfile.txt:envfile.txt': invalid mount config for type \"bind\": invalid mount path: 'envfile.txt' mount path must be absolute"} (Internal Server Error: 500)]
haven't given up yet
Dude you deleted both / now
Plz go for /envfile.txt
got it!
diff --git a/docker-compose-dev.yml b/docker-compose-dev.yml
index ab44dbc180..4a0d40917e 100644
--- a/docker-compose-dev.yml
+++ b/docker-compose-dev.yml
@@ -40,7 +40,11 @@ services:
restart: "no"
command:
- bootstrap.sh
+ - -e
+ - envfile.txt
- dev
+ volumes:
+ - ./envfile.txt:/envfile.txt
networks:
- dataverse
$ cat envfile.txt
API_TOKEN=97cad752-983a-446f-866e-b032998ee33d
Splendid!
That's what you wanted me to do?
Yes Sir! :grinning:
phew
So what do we do now...?
We're stuck a little with trivial changes to configbaker
I just approved #9935.
Now we just wait for that PR to get merged, right?
Yeah. Or PRs.
keep 'em coming!
I could hack on the action some more but I'm stuck because I don't have the upstream changes merged
Don't worry, there are only 14 PRs in QA or ready for QA! :sweat_smile:
I'm not sure if our containers-only PRs actually are helpful consuming space in QA
Sounds like its own topic. :big_smile:
True, true
This topic is kinda strange already. We're talking containers in the Python stream... :see_no_evil:
/me looks
d'oh!
It's a cross section topic
![]()
As this is a cross section topic, let's move this to #dev instead of the Python channel...
This topic was moved here from #python > Dataverse GitHub Action by Oliver Bertuch.
Should we brainstorm about a list of inputs and outputs?
We should probably have an input for the tag and the image names
Should we have inputs for a Dataverse ref (tag/commit) and build images from there?
Port would also be good
That way people could use it with forks as well or simply make sure a specific DV version is used
Port for DV only or also for Solr/Postgres?
Oh and is that input only or outputs, too?
I'd say yes. Guess its a rare occasion but in case the default ports are occupied already, users can change it.
Oliver Bertuch schrieb:
Oh and is that input only or outputs, too?
Do you mean outputting the ports?
Yes. Or maybe even provide a URL where DV can be reached once you set it up with the action
I mean that is probably sth people want - interact with Dataverse
Great idea!
Input for bootstrap timeout
Input for bootstrap persona
You've mentioned memory usage too. Would that be one to consider adding?
Absolutely! Some actions might require more than 2GB RAM
@Jan Range https://github.com/gdcc/dataverse-action/issues/3
#9935 has been merged! :tada:
Progress...
Under https://github.com/gdcc/dataverse-action/actions/runs/6254427827/job/16981926546 you can see that we already are able to gather some insights and inputs.
Cleaned up https://github.com/gdcc/dataverse-action/blob/main/action.yml and the compose file already!
Also making use of the metadata for postgres/solr version, but keep it overrideable from the action config
go go go! ![]()
Added option to supply your own JVM options (can also be set/updated after the containers started!)
And added the exposing mechanism added with https://github.com/IQSS/dataverse/issues/9935...
Probably time to start creating some outputs...
If people want to talk about the action, perhaps because they have a question or want to improve it, where should they go? Can we say something about this in the README? I just opened https://github.com/gdcc/dataverse-action/issues/6 about this.
Last week we spent a few minutes talking about the action and clean up the issues a bit at https://github.com/gdcc/dataverse-action/issues
@Jan Range is there any issue or other task we can help with?
Should we talk about it during tomorrow's meeting?
I'm happy to join tomorrow's meeting :blush: One thing that @Oliver Bertuch already addressed here is to specify a version of Dataverse. Perhaps we can utilize this to some extent to enhance the testing process of pyDataverse.
Yes! Great stuff happening. Ok! I added it to the agenda. Thanks!
I am currently setting up tests for python-dvuploader and I am using a local installation via docker-compose of the current develop branch. This stack containes S3-compatible services minio and `localstack. It would be very convenient to also have these included in the GitHub Action for CI/CD tests.
If we decide to include both minio and localstack, should they be included by default or made optional? If we choose to make them optional, how can we include them in the Action?
A challenge might be to set/change a collection's storage. Does the endpoint $SERVER_URL/api/dataverses/$PARENT allow to specify a storage location, given the client is a superuser?
Would it make sense to have a flavor kind of thing going?
We can include a base compose file with the essential services for Dataverse core
And add more compose files as flavors with extension that would reference the base file
A user could choose among the flavors, depending on their use case
Sounds great! I think that's also nice for others who want to deploy DV to re-use these compose files :raised_hands:
https://docs.docker.com/compose/compose-file/14-include/
Note: I'm not sure since which Compose version this is included...
The docs for this have been created ~7 months ago, so it is probably only included in very recent versions of Docker
(See https://github.com/compose-spec/compose-spec/commit/27008f85ac127e4d91e89bcad8bf7686ffde241f)
Fancy!
But at least for the Action, we shall be safe if the Docker version included in the runners is recent enough
It is at least a controlled environment
Off topic: Is there a reason not to use 3 We are using 2.4 atm
Yes. Docker Maven Plugin compatibility
Ah alright, that makes sense
We probably should leave a note in the file about that...
(In upstream - downstream we can do whatever we want)
When this is sorted, how can we create a collection that uses the other storage?
The easiest way for that would probably be a different bootstrapping flavor
That will simply require us to provide such a script as a mounted file into the bootstrapping container
Ok, thats nice!
Yeah the idea behind configbaker was always to keep things modular
I will create a PR this week to test the flavoring :raised_hands:
Delicious! :yum:
I'm including the GitHub Action in a list of ways to run Dataverse in containers (#10273). What use cases am I forgetting?
Use cases for the GitHub Action include:
Testing Client Libraries that interact with Dataverse APIs
Testing Integrations of third party software with Dataverse
Would metadata blocks count as third party software?
What about testing Dataverse itself for compliance with some standard?
Is an SPI extension just an integration with third party software?
Well, I already have a page for metadata blocks: https://dataverse-guide--10273.org.readthedocs.build/en/10273/container/running/metadata-blocks.html ... but let's discuss that over at #containers > tutorial for demo or eval #10238
Hmm, like if Dataverse complies with OAI-PMH?
For SPI stuff, yeah I could maybe see "Testing custom exports" as a potential use of the action.
https://github.com/gdcc/dataverse-exporters contains a "MyJSON" exporter as an example. Maybe we could write a CI workflow that builds the jar file, drops it into the right place, spins up Dataverse, and makes an assertion (look for "myjson" or whatever).
Anyway, we could discuss any of this over in the other thread. I just wanted to let folks working on the action know that I'm writing about it.
Last updated: Nov 01 2025 at 14:11 UTC