I am currently working on my Bachelor Thesis in CS. My job is to integrate Dataverse into Galaxy, so that Galaxy users can import and export files from and to Dataverse directly from the Galaxy UI. The integration is already partially working and will be finished in the coming weeks.
I wonder if there are any Galaxy users here? Would be very interesting to gather some information around Dataverse.
E.g.:
How many people are using Dataverse?
How many of them are using Galaxy or could imagine using it in the future?
Hi! Welcome! That's awesome!
I think we should check with Nils Gehlenborg, even though it's been 7 years (!) since I posted this:
"Galaxy is used at Harvard Medical School: http://www.refinery-platform.org . We had a meeting in this room just last week. See https://github.com/IQSS/open-source-at-harvard/issues/11 . They might create datasets in Dataverse and upload provenance information."
Much more recently, @Vera Clemens linked to https://github.com/elixir-europe/biohackathon-projects-2024/blob/main/19.md which mentions Galaxy. We should see if she uses it.
As for your question of how many people are using Dataverse, I don't think we have a good answer for that. We know about 125 installations in 39 countries. We recently added (in Dataverse 6.2, thanks to @Steven Ferey in https://github.com/IQSS/dataverse/pull/10260) an API that returns the number of users per installation. Harvard Dataverse is the largest installation and https://dataverse.harvard.edu/api/info/metrics/accounts says there are 80237 users. So I guess you could iterate over all installations running 6.2 or higher and count them up. Of course, this only counts people who have accounts. Lots of people download data from Dataverse installations without ever creating an account.
@Kai König oh, one more thing. I highly recommend posting at https://groups.google.com/g/dataverse-community as well. You'll reach many more people there (1000+).
@Philip Durbin 🐉 Thanks a lot for the feedback!!!
I really like the idea of querying the installations to get a user count and wrote this down as a task for me.
I also posted at the google group and wrote an Email only to Nils as you already tagged Vera.
https://groups.google.com/g/dataverse-community/c/5MEWdFfH0cU/m/UvBYZEpDAAAJ looks great! Thanks!
So does the email to Nils (I also wrote him yesterday).
If it helps, we already have code at https://github.com/IQSS/dataverse-metrics that aggregates metrics across installations. @Dimitri Szabo opened https://github.com/IQSS/dataverse-metrics/issues/100 about adding user accounts. It's in Python.
However, I believe that code is somewhat deprecated. @Juan Pablo Tosca Villanueva is working on a new version at https://github.com/IQSS/dataverse-hub . https://github.com/IQSS/dataverse-pm/issues/271 has the details but I think aggregation is either already implemented will be soon. It's in Java.
Either one might be a good starting point for you. Or you could do your own thing, of course! :grinning:
@Kai König what language are you writing the integration in?
Philip Durbin 🐉 schrieb:
Kai König what language are you writing the integration in?
python
@Kai König out of curiosity, have you heard of (or are you using) pyDataverse? We have a #python channel here and I bet @Jan Range would be happy to help (as would I) if you want to try it.
@Kai König I am using Galaxy sometimes! Great to hear there will be a Dataverse integration ![]()
Happy to help out using pyDataverse over at our #python channel
I stumpled upon pyDataverse in the dataverse docs but then decided to first give the native API a try and maybe use a library if I have any issues. But tbh the API documentation is really good and everything worked out of the box easily so far.
Ah, bummer, I was hoping you'd BOTH find a user! :big_smile:
How do you find the root Collection via API for any Dataverse instance? E.g. for the demo this finds the root:
https://demo.dataverse.org/api/search?q=identifier:root&type=dataverse
but the docs say:
Out of the box the top level Dataverse collection has an alias of “root” and a database id of “1” but your installation may vary. The easiest way to determine the alias of your root Dataverse collection is to click “Advanced Search” and look at the URL.
Also apparently I don't have permission to create a dataverse collection? I tried creating a collection with parent "root" with all required fields according to the docs, but I do get this error:
User @kaikoenig is not permitted to perform requested action.
is there maybe some way to create a dataverse in the user context? i.e. not with root as parent?
So to me it looks like creating a Dataverse via UI works and it automatically selects the root as parent. But via API it doesn't... :(
You can use :root as a keyword for the root collection.
This is mentioned at https://guides.dataverse.org/en/6.4/api/native-api.html#view-a-dataverse-collection
@Philip Durbin 🚀 That's great, thanks! Do you have an idea how I can create a dataverse collection via API with the root as parent? Automatically creating a collection and a dataset inside the collection is the last puzzle piece and then from my POV the integration is basically finished.
Sure, like this:
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/dataverses/$PARENT" --upload-file dataverse-complete.json
From https://guides.dataverse.org/en/6.5/api/native-api.html#create-a-dataverse-collection
but if I set $PARENT to root I get an error:
You'd pass :root as $PARENT
User @kaikoenig is not permitted to perform requested action.
oooh but :root works! amazing!!
Oh, hmm. It depends on permissions. It probably works on https://demo.dataverse.org . Where are you testing?
because "root" didnt
(the alias)
waaait, sorry I have to test again
Also, did you prepend the colon? :
:root
yes ":root" works, but "root" doesnt
perfect, thank you!!
looking forward to finish everything this week, not sure when it will be released though, because it will need some testing on the galaxy side of things
@Philip Durbin 🚀 When creating a dataset via API what value can I pass as subject? (if the user selects none)
is there some kind of default value that always exists?
If you go to https://demo.dataverse.org/api/metadatablocks/citation (for example) you'll find a list of allowed values.
but I assume this depends on the configuration of the Dataverse instance? So I would always to query first for the existing subjects and then pick one randomly
btw the PR is already opened :) https://github.com/galaxyproject/galaxy/pull/19367
Well, Dataverse ships with a certain number of subjects, 15 or so. Yes, sometimes installations change this but we ask them not to. We would prefer that they make pull requests to add the subjects they need so that they're available to all installations.
It's great to see the PR! Any feedback? I did mention this new upcoming integration in the community news.
@Kai König I know @Sonia Barbosa already pinged you on the mailing list thread you kicked off but I thought I'd check in as well. :smile:
Congrats on getting https://github.com/galaxyproject/galaxy/pull/19367 merged! Are there any docs? I'm asking because I'd like to add Galaxy to a future version of our integrations page and typically we link to docs.
Also, sorry I didn't notice until now that your PR was merged in January! :sweat_smile:
No worries, yes it was merged into dev in January and I don't know when it will be live or if it is already :)
The documentation is in the PR. For users I linked the PR from a previous integration as everything is the same from a UI perspective: https://github.com/galaxyproject/galaxy/pull/16381
Best,
Kai
Right, the previous Invenio integration. There's even a nice video there.
But no docs for Invenio either, that I can find. Weird!
It looks like Invenio followed up with nice blog posts with screenshots:
Should we do something similar for Dataverse? It would be great to get the word out about your work, @Kai König! :heart:
Last updated: Nov 01 2025 at 14:11 UTC