Stream: community

Topic: Dataverse-Galaxy Integration


view this post on Zulip Kai König (Dec 07 2024 at 12:02):

I am currently working on my Bachelor Thesis in CS. My job is to integrate Dataverse into Galaxy, so that Galaxy users can import and export files from and to Dataverse directly from the Galaxy UI. The integration is already partially working and will be finished in the coming weeks.

I wonder if there are any Galaxy users here? Would be very interesting to gather some information around Dataverse.

E.g.:
How many people are using Dataverse?
How many of them are using Galaxy or could imagine using it in the future?

view this post on Zulip Philip Durbin 🚀 (Dec 07 2024 at 12:22):

Hi! Welcome! That's awesome!

I think we should check with Nils Gehlenborg, even though it's been 7 years (!) since I posted this:

"Galaxy is used at Harvard Medical School: http://www.refinery-platform.org . We had a meeting in this room just last week. See https://github.com/IQSS/open-source-at-harvard/issues/11 . They might create datasets in Dataverse and upload provenance information."

view this post on Zulip Philip Durbin 🚀 (Dec 07 2024 at 12:23):

Much more recently, @Vera Clemens linked to https://github.com/elixir-europe/biohackathon-projects-2024/blob/main/19.md which mentions Galaxy. We should see if she uses it.

view this post on Zulip Philip Durbin 🚀 (Dec 07 2024 at 12:30):

As for your question of how many people are using Dataverse, I don't think we have a good answer for that. We know about 125 installations in 39 countries. We recently added (in Dataverse 6.2, thanks to @Steven Ferey in https://github.com/IQSS/dataverse/pull/10260) an API that returns the number of users per installation. Harvard Dataverse is the largest installation and https://dataverse.harvard.edu/api/info/metrics/accounts says there are 80237 users. So I guess you could iterate over all installations running 6.2 or higher and count them up. Of course, this only counts people who have accounts. Lots of people download data from Dataverse installations without ever creating an account.

view this post on Zulip Philip Durbin 🚀 (Dec 07 2024 at 12:38):

@Kai König oh, one more thing. I highly recommend posting at https://groups.google.com/g/dataverse-community as well. You'll reach many more people there (1000+).

view this post on Zulip Kai König (Dec 10 2024 at 11:53):

@Philip Durbin 🐉 Thanks a lot for the feedback!!!

I really like the idea of querying the installations to get a user count and wrote this down as a task for me.

I also posted at the google group and wrote an Email only to Nils as you already tagged Vera.

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 12:08):

https://groups.google.com/g/dataverse-community/c/5MEWdFfH0cU/m/UvBYZEpDAAAJ looks great! Thanks!

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 12:09):

So does the email to Nils (I also wrote him yesterday).

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 12:17):

If it helps, we already have code at https://github.com/IQSS/dataverse-metrics that aggregates metrics across installations. @Dimitri Szabo opened https://github.com/IQSS/dataverse-metrics/issues/100 about adding user accounts. It's in Python.

However, I believe that code is somewhat deprecated. @Juan Pablo Tosca Villanueva is working on a new version at https://github.com/IQSS/dataverse-hub . https://github.com/IQSS/dataverse-pm/issues/271 has the details but I think aggregation is either already implemented will be soon. It's in Java.

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 12:17):

Either one might be a good starting point for you. Or you could do your own thing, of course! :grinning:

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 13:48):

@Kai König what language are you writing the integration in?

view this post on Zulip Kai König (Dec 10 2024 at 15:21):

Philip Durbin 🐉 schrieb:

Kai König what language are you writing the integration in?

python

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 18:45):

@Kai König out of curiosity, have you heard of (or are you using) pyDataverse? We have a #python channel here and I bet @Jan Range would be happy to help (as would I) if you want to try it.

view this post on Zulip Jan Range (Dec 10 2024 at 19:16):

@Kai König I am using Galaxy sometimes! Great to hear there will be a Dataverse integration :dataverse_man:

view this post on Zulip Jan Range (Dec 10 2024 at 19:17):

Happy to help out using pyDataverse over at our #python channel

view this post on Zulip Kai König (Dec 10 2024 at 20:04):

I stumpled upon pyDataverse in the dataverse docs but then decided to first give the native API a try and maybe use a library if I have any issues. But tbh the API documentation is really good and everything worked out of the box easily so far.

view this post on Zulip Philip Durbin 🚀 (Dec 10 2024 at 22:53):

Ah, bummer, I was hoping you'd BOTH find a user! :big_smile:

view this post on Zulip Kai König (Dec 16 2024 at 09:22):

How do you find the root Collection via API for any Dataverse instance? E.g. for the demo this finds the root:
https://demo.dataverse.org/api/search?q=identifier:root&type=dataverse

view this post on Zulip Kai König (Dec 16 2024 at 09:23):

but the docs say:

Out of the box the top level Dataverse collection has an alias of “root” and a database id of “1” but your installation may vary. The easiest way to determine the alias of your root Dataverse collection is to click “Advanced Search” and look at the URL.

view this post on Zulip Kai König (Dec 16 2024 at 10:27):

Also apparently I don't have permission to create a dataverse collection? I tried creating a collection with parent "root" with all required fields according to the docs, but I do get this error:

User @kaikoenig is not permitted to perform requested action.

view this post on Zulip Kai König (Dec 16 2024 at 10:29):

is there maybe some way to create a dataverse in the user context? i.e. not with root as parent?

view this post on Zulip Kai König (Dec 16 2024 at 11:15):

So to me it looks like creating a Dataverse via UI works and it automatically selects the root as parent. But via API it doesn't... :(

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:13):

You can use :root as a keyword for the root collection.

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:13):

This is mentioned at https://guides.dataverse.org/en/6.4/api/native-api.html#view-a-dataverse-collection

view this post on Zulip Kai König (Dec 16 2024 at 13:24):

@Philip Durbin 🚀 That's great, thanks! Do you have an idea how I can create a dataverse collection via API with the root as parent? Automatically creating a collection and a dataset inside the collection is the last puzzle piece and then from my POV the integration is basically finished.

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:26):

Sure, like this:

curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/dataverses/$PARENT" --upload-file dataverse-complete.json

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:27):

From https://guides.dataverse.org/en/6.5/api/native-api.html#create-a-dataverse-collection

view this post on Zulip Kai König (Dec 16 2024 at 13:27):

but if I set $PARENT to root I get an error:

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:27):

You'd pass :root as $PARENT

view this post on Zulip Kai König (Dec 16 2024 at 13:27):

User @kaikoenig is not permitted to perform requested action.

view this post on Zulip Kai König (Dec 16 2024 at 13:28):

oooh but :root works! amazing!!

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:28):

Oh, hmm. It depends on permissions. It probably works on https://demo.dataverse.org . Where are you testing?

view this post on Zulip Kai König (Dec 16 2024 at 13:28):

because "root" didnt

view this post on Zulip Kai König (Dec 16 2024 at 13:28):

(the alias)

view this post on Zulip Kai König (Dec 16 2024 at 13:28):

waaait, sorry I have to test again

view this post on Zulip Philip Durbin 🚀 (Dec 16 2024 at 13:28):

Also, did you prepend the colon? :

:root

view this post on Zulip Kai König (Dec 16 2024 at 13:29):

yes ":root" works, but "root" doesnt

view this post on Zulip Kai König (Dec 16 2024 at 13:29):

perfect, thank you!!

view this post on Zulip Kai König (Dec 16 2024 at 13:30):

looking forward to finish everything this week, not sure when it will be released though, because it will need some testing on the galaxy side of things

view this post on Zulip Kai König (Dec 17 2024 at 11:06):

@Philip Durbin 🚀 When creating a dataset via API what value can I pass as subject? (if the user selects none)

view this post on Zulip Kai König (Dec 17 2024 at 11:06):

is there some kind of default value that always exists?

view this post on Zulip Philip Durbin 🚀 (Dec 17 2024 at 12:48):

If you go to https://demo.dataverse.org/api/metadatablocks/citation (for example) you'll find a list of allowed values.

view this post on Zulip Kai König (Dec 22 2024 at 12:21):

but I assume this depends on the configuration of the Dataverse instance? So I would always to query first for the existing subjects and then pick one randomly

view this post on Zulip Kai König (Dec 22 2024 at 12:22):

btw the PR is already opened :) https://github.com/galaxyproject/galaxy/pull/19367

view this post on Zulip Philip Durbin 🚀 (Jan 08 2025 at 15:00):

Well, Dataverse ships with a certain number of subjects, 15 or so. Yes, sometimes installations change this but we ask them not to. We would prefer that they make pull requests to add the subjects they need so that they're available to all installations.

view this post on Zulip Philip Durbin 🚀 (Jan 08 2025 at 15:01):

It's great to see the PR! Any feedback? I did mention this new upcoming integration in the community news.

view this post on Zulip Philip Durbin 🚀 (Oct 02 2025 at 18:29):

@Kai König I know @Sonia Barbosa already pinged you on the mailing list thread you kicked off but I thought I'd check in as well. :smile:

Congrats on getting https://github.com/galaxyproject/galaxy/pull/19367 merged! Are there any docs? I'm asking because I'd like to add Galaxy to a future version of our integrations page and typically we link to docs.

view this post on Zulip Philip Durbin 🚀 (Oct 02 2025 at 21:06):

Also, sorry I didn't notice until now that your PR was merged in January! :sweat_smile:

view this post on Zulip Kai König (Oct 03 2025 at 10:18):

No worries, yes it was merged into dev in January and I don't know when it will be live or if it is already :)

The documentation is in the PR. For users I linked the PR from a previous integration as everything is the same from a UI perspective: https://github.com/galaxyproject/galaxy/pull/16381

Best,
Kai

view this post on Zulip Philip Durbin 🚀 (Oct 03 2025 at 11:53):

Right, the previous Invenio integration. There's even a nice video there.

view this post on Zulip Philip Durbin 🚀 (Oct 03 2025 at 12:00):

But no docs for Invenio either, that I can find. Weird!

It looks like Invenio followed up with nice blog posts with screenshots:

Should we do something similar for Dataverse? It would be great to get the word out about your work, @Kai König! :heart:


Last updated: Nov 01 2025 at 14:11 UTC