Let's talk about RO-Crate!
Should this be it's own stream so people only interested in that topic can mute all else?
It might become a larger and long endevour
Meh. If people show up and it gets too crazy in this topic, sure we can move these messages to a stream. :grinning:
@Eryk Kulikowski is presenting on RO-Crate in 3 hours at the community call. Please see https://groups.google.com/g/dataverse-community/c/oXksy1da1KI/m/aimJx8DOGwAJ
The video is up (thanks, @Julian Gautier !): https://groups.google.com/g/dataverse-community/c/jlRanZy4b38/m/lcyiT7F4HQAJ
Hi!
I just submitted a PR with an RO-Crate exporter proof of concept implementation based on our RO-Crate manager code from the ARP project.
https://github.com/IQSS/dataverse/pull/10086
Exciting! Great work, @Balázs Pataki ! ![]()
:point_up: @Eryk Kulikowski
Looks great! @Ozgur Karadeniz is also working on the export of RO-Crate with customization options. We will see how to combine our efforts on it.
@Balázs Pataki I just jumped on your branch and I'm playing around with it. This: RO-Crate exporter PoC #10086
"dsDescriptionValue" : "All earthquakes from 1930 until 2018.",
"name" : "All earthquakes from 1930 until 2018.",
"@id" : "#3::e8603172-347e-4f35-92e5-8e70ec9b977e",
"@type" : "dsDescription"
(I'm adding metadata from this dataset about earthquakes: https://doi.org/10.7910/DVN/UTY03A )
Great! Any trouble so far?
Nope, this is the RO-Crate it generated: rocrate.json
I mean, I have no idea if it's right or not... :grinning:
Looks good to me. :smile:
@Balázs Pataki tomorrow is a holiday but maybe you can show me more RO-Crate stuff next week. More context or whatever.
Some new comments about runcrate: https://github.com/IQSS/dataverse/pull/10086#issuecomment-1831316380
Interesting. RSpace can export an RO-Crate to Dataverse, I'm hearing at a talk in #community > #Dataverse2024
See https://documentation.researchspace.com/article/s3j29if453-export-formats#eln_archive
@Ozgur Karadeniz @Dieuwertje Bloemen are you interested in presenting https://github.com/gdcc/dataverse-exporters/pull/15 at a future community call? (This was a suggestion from @Philipp Conzett via @Sonia Barbosa.)
Philip Durbin said:
Ozgur Karadeniz Dieuwertje Bloemen are you interested in presenting https://github.com/gdcc/dataverse-exporters/pull/15 at a future community call? (This was a suggestion from Philipp Conzett via Sonia Barbosa.)
Sure, we can present the current state as well as the future plans :)
@Ozgur Karadeniz great! I added you to https://docs.google.com/document/d/1lewTB5P7pMwzk8s8VTu0cFrhdOigkRpWbuUgu7w6sZo/edit?usp=sharing
Next we need to figure out who wants September, you or @Jan Range :grinning:
I am fine with both October or September :smile:
Both fine for us too :)
Should I flip a coin? :grinning:
I'd go for october then :-)
Sounds good
I created a new simplified doc to help organize future community calls. @Ozgur Karadeniz would you want either August 6 or Nov 5 to talk about the exporter? September is not available, sorry. And it sounds like @Jan Range wanted October.
I am fine with october :smile:
Ok, Jan said he can do August so I moved him up. @Ozgur Karadeniz I moved you to October. I hope that works for you!
Great, October works for us too :)
Some nice slides from @Ozgur Karadeniz and @Dieuwertje Bloemen - https://zenodo.org/doi/10.5281/zenodo.12548333 (via #8688).
@Ozgur Karadeniz I'm having some trouble with your RO-Crate exporter, unfortunately: https://github.com/gdcc/exporter-ro-crate/issues/2
Probably it's me. Please help! :heart:
@Oliver Bertuch by the way, I'm not sure this one should be on the official list yet: https://github.com/gdcc/dataverse-exporters#list-of-known-exporters
Earlier @Ceilyn Boyd and I were spitballing about some kind of indicator that stuff has been reviewed or QA'ed or vetted. This is a bigger topic than just RO-Crate exporters, of course!
Hi Phillip,
It's not you, I also get that in some datasets :sweat_smile: I started working on that bug, I'll try to fix that bug as soon as possible.
Thanks so much for testing! :smile:
@Ozgur Karadeniz sure! Thanks for getting back to me so quickly. Hey, I'm hoping to co-assign #10744 to you. If you're willing, please accept the invite to join our "read only" team at https://github.com/IQSS I just sent you.
@Balázs Pataki @Ozgur Karadeniz @Dieuwertje Bloemen and others, I'm aware of the wonderful exporters you've created for RO-Crate but does anyone here have researchers who upload RO-Crate files to Dataverse? Asking for a friend. :big_smile:
Well, in our ARP system we have RO-Crate import. In our case we handle rocrates that use the URI-s in context matching the URI-s associated with Dataverse field types.
@Balázs Pataki nice. Can you please link me to an example dataset?
Sure, here's a dataset: https://repo.researchdata.hu/dataset.xhtml?persistentId=hdl:21.15109/ARP/PZINV9#describoTab
And here is its rocrate: https://repo.researchdata.hu/api/arp/rocrate/hdl:21.15109/ARP/PZINV9?version=1.0
This is the rocrate format we generate and can ingest.
I'm passing this along! Thanks!
I see RO-Crate got a mention in this article by @Vaida Plankytė - https://upstream.force11.org/the-time-is-now-vertical-interoperability-between-research-tools-an-essential-enabler-for-the-fairification-of-data/
@Philip Durbin ☃️ oh, you've noticed the new post so quickly! thank you for reading :sunglasses:
Sure. @Sonia Barbosa found it. ![]()
@Eryk Kulikowski when you get a moment, please check this out: https://github.com/gdcc/dataverse-previewers/issues/106
Hello! We're currently gathering information on tools that integrate RO-Crate with Dataverse. I’ve come across the RO-Crate exporter, but I'm particularly interested in whether an importer has been developed.
Specifically, we're exploring the possibility of implementing an importer similar to the one used in WorkflowHub or Zenodo—where an RO-Crate ZIP file is uploaded, automatically unpacked, and the metadata fields are populated based on the ro-crate-metadata.json file.
Do you know if a similar tool already exists for Dataverse?
Thank you in advance!
Hi!
We have both export and import implemented but it is currently only available in our fork. We are in the process of making a PR out of our implementation and submit it to the Dataverse core, but I cannot give you an ETA for this.
Here's a demo of our implementation in the Hungarian ARP repository.
I have not included RO-Crate import in this, but we have it implemented as you mentioned: you can upload a zip with files and and a ro-crate-metadata.json and we populate the dataset based on that. One caveat is that we currently can only import what has been exported from (another) Dataverse so it only works with ro-crate-metadata.json values, which match the Dataverse metadata blocks.
Besides RO-Crate export/import we also have a native RO-Crate editor (AROMA), which allows editing both Dataverse metadata block based metadata and any other metadata (eg. file metadata )you want to associate with your data.
Thank you @Balázs Pataki . Such an impressive work!!! I'll share all this info with my team. Is there a way that I can follow up with you to know when the implementation is submitted to the Dataverse core?
Sure, we will share progress here in Zulip as well.
As Balazs points out, the import is a bit tricky the moment you allow all kinds of RO-Crates (with more metadata than what can be mapped one on one). We've been thinking of creating a general ro-crate metadata importer for a while (my colleague @Ozgur Karadeniz made one of the exporters that's currently available), but after some meetings with other Dataverse installations on how an importer would work, I think there's some more brainstorming necessary on what to do with the non-mappable metadata, and what happens with a following export if you don't discard that non-mappable metadata or if you edit the mapped metadata in Dataverse. Also, RO-Crate can contain files in structures that aren't compatible with the folder structuring in Dataverse, which complicates things if you want to import a full zip RO-Crate even further.
In other words, we would also be very interested in such an importer and would love to discuss what the best way of its functioning would be. Because for an ro-crate importer with extra metadata to work, we need to look at the entire upload, import and export flow. Maybe something to involve the people from the ro-crate consortium in as well, to get their input on the best way to do this.
@Balázs Pataki great video!
@Dieuwertje Bloemen @Balázs Pataki would you agree that the hardest bit about RO-Crate deposition is the ability of RO-Crates to be put together like Matroschka Puppets, which is not something Dataverse supports well with the current concept of Datasets and Collections?
What do you mean by "Matroschka Puppet" structure in the context of RO-Crate? Do you mean the dataset–subdataset relationship in an RO-Crate?
In our ARP case, when you upload a .zip file containing your files in a directory structure, we consider each directory a subdataset. In AROMA, you can associate metadata with each subdataset just as you would with individual files. However, all of this is contained within the RO-Crate and is not reflected in Dataverse, of course, since Dataverse has no concept of subdatasets (and directories).
We could use collections and subcollections for this in Dataverse, but I think that would really stretch the current data model.
Yeah, that's exactly what I meant! The consequence is that much of the content in the RO-Crate being deposited is not really machine actionable unless you download it.
Yah, and also the issue if you upload an RO-Crate, and only map what you can, and then retain the leftover RO-Crate. That one contains information on the file structure and descriptive metadata on a file level too. But what if you then delete a file in the dataset in Dataverse, the uploaded RO-Crate won't be updated, and if someone downloads the entire thing afterwards, the leftover ro-crate won't make sense as there's metadata there on a file that is no longer in there. And that again just one issue. Also, how easy is it to make a 'leftover' RO-Crate, would it be better to just map & store the orginal one as a file too, but then what if you change the description in Dataverse, the ro-crate will have a different description and a new export will then have two contradicting metadata elements in there. There a bunch of considerations to make. The easiest option would be to discard any information that's not mapable, but I don't think that necessarily the 'right' option, as it could mean losing a lot of data.
The question is how you think about the input ro-crate-metadata.json: whether it is used only at import time— after which you either keep it under a different name, remove it from the dataset, or retain it as is and update it as the dataset evolves. In the case of ARP, we follow the latter approach, but we do not manage any "leftovers" during import.
The other day @Balázs Pataki let me know about https://www.researchobject.org/ro-crate/dataverse and it looks great but I did just propose a small tweak at https://github.com/ResearchObject/ro-crate/pull/500
Last updated: Nov 01 2025 at 14:11 UTC