Stream: dev

Topic: RO-Crate


view this post on Zulip Philip Durbin 🚀 (Sep 29 2023 at 16:10):

Let's talk about RO-Crate!

view this post on Zulip Oliver Bertuch (Sep 29 2023 at 16:15):

Should this be it's own stream so people only interested in that topic can mute all else?

view this post on Zulip Oliver Bertuch (Sep 29 2023 at 16:16):

It might become a larger and long endevour

view this post on Zulip Philip Durbin 🚀 (Sep 29 2023 at 16:22):

Meh. If people show up and it gets too crazy in this topic, sure we can move these messages to a stream. :grinning:

view this post on Zulip Philip Durbin 🚀 (Oct 17 2023 at 11:10):

@Eryk Kulikowski is presenting on RO-Crate in 3 hours at the community call. Please see https://groups.google.com/g/dataverse-community/c/oXksy1da1KI/m/aimJx8DOGwAJ

view this post on Zulip Philip Durbin 🚀 (Oct 17 2023 at 18:05):

The video is up (thanks, @Julian Gautier !): https://groups.google.com/g/dataverse-community/c/jlRanZy4b38/m/lcyiT7F4HQAJ

view this post on Zulip Balázs Pataki (Nov 02 2023 at 10:21):

Hi!

I just submitted a PR with an RO-Crate exporter proof of concept implementation based on our RO-Crate manager code from the ARP project.

https://github.com/IQSS/dataverse/pull/10086

view this post on Zulip Philip Durbin 🚀 (Nov 02 2023 at 11:32):

Exciting! Great work, @Balázs Pataki ! :dataverse_man:

view this post on Zulip Oliver Bertuch (Nov 02 2023 at 12:38):

:point_up: @Eryk Kulikowski

view this post on Zulip Eryk Kulikowski (Nov 06 2023 at 09:38):

Looks great! @Ozgur Karadeniz is also working on the export of RO-Crate with customization options. We will see how to combine our efforts on it.

view this post on Zulip Philip Durbin 🚀 (Nov 08 2023 at 15:44):

@Balázs Pataki I just jumped on your branch and I'm playing around with it. This: RO-Crate exporter PoC #10086

view this post on Zulip Philip Durbin 🚀 (Nov 08 2023 at 15:45):

    "dsDescriptionValue" : "All earthquakes from 1930 until 2018.",
    "name" : "All earthquakes from 1930 until 2018.",
    "@id" : "#3::e8603172-347e-4f35-92e5-8e70ec9b977e",
    "@type" : "dsDescription"

view this post on Zulip Philip Durbin 🚀 (Nov 08 2023 at 15:46):

(I'm adding metadata from this dataset about earthquakes: https://doi.org/10.7910/DVN/UTY03A )

view this post on Zulip Balázs Pataki (Nov 08 2023 at 16:04):

Great! Any trouble so far?

view this post on Zulip Philip Durbin 🚀 (Nov 08 2023 at 16:19):

Nope, this is the RO-Crate it generated: rocrate.json

view this post on Zulip Philip Durbin 🚀 (Nov 08 2023 at 16:20):

I mean, I have no idea if it's right or not... :grinning:

view this post on Zulip Balázs Pataki (Nov 08 2023 at 17:05):

Looks good to me. :smile:

view this post on Zulip Philip Durbin 🚀 (Nov 09 2023 at 16:32):

@Balázs Pataki tomorrow is a holiday but maybe you can show me more RO-Crate stuff next week. More context or whatever.

view this post on Zulip Philip Durbin 🚀 (Nov 29 2023 at 15:30):

Some new comments about runcrate: https://github.com/IQSS/dataverse/pull/10086#issuecomment-1831316380

view this post on Zulip Philip Durbin 🚀 (Mar 07 2024 at 16:47):

Interesting. RSpace can export an RO-Crate to Dataverse, I'm hearing at a talk in #community > #Dataverse2024

view this post on Zulip Philip Durbin 🚀 (Mar 07 2024 at 16:48):

See https://documentation.researchspace.com/article/s3j29if453-export-formats#eln_archive

view this post on Zulip Philip Durbin 🚀 (Jun 05 2024 at 12:42):

@Ozgur Karadeniz @Dieuwertje Bloemen are you interested in presenting https://github.com/gdcc/dataverse-exporters/pull/15 at a future community call? (This was a suggestion from @Philipp Conzett via @Sonia Barbosa.)

view this post on Zulip Ozgur Karadeniz (Jun 05 2024 at 14:06):

Philip Durbin said:

Ozgur Karadeniz Dieuwertje Bloemen are you interested in presenting https://github.com/gdcc/dataverse-exporters/pull/15 at a future community call? (This was a suggestion from Philipp Conzett via Sonia Barbosa.)

Sure, we can present the current state as well as the future plans :)

view this post on Zulip Philip Durbin 🚀 (Jun 05 2024 at 14:56):

@Ozgur Karadeniz great! I added you to https://docs.google.com/document/d/1lewTB5P7pMwzk8s8VTu0cFrhdOigkRpWbuUgu7w6sZo/edit?usp=sharing

Next we need to figure out who wants September, you or @Jan Range :grinning:

view this post on Zulip Jan Range (Jun 05 2024 at 20:01):

I am fine with both October or September :smile:

view this post on Zulip Ozgur Karadeniz (Jun 10 2024 at 12:37):

Both fine for us too :)

view this post on Zulip Philip Durbin 🚀 (Jun 10 2024 at 13:38):

Should I flip a coin? :grinning:

view this post on Zulip Jan Range (Jun 10 2024 at 13:39):

I'd go for october then :-)

view this post on Zulip Philip Durbin 🚀 (Jun 10 2024 at 13:39):

Sounds good

view this post on Zulip Philip Durbin 🚀 (Jul 01 2024 at 18:49):

I created a new simplified doc to help organize future community calls. @Ozgur Karadeniz would you want either August 6 or Nov 5 to talk about the exporter? September is not available, sorry. And it sounds like @Jan Range wanted October.

view this post on Zulip Jan Range (Jul 01 2024 at 18:51):

I am fine with october :smile:

view this post on Zulip Philip Durbin 🚀 (Jul 01 2024 at 19:07):

Ok, Jan said he can do August so I moved him up. @Ozgur Karadeniz I moved you to October. I hope that works for you!

view this post on Zulip Ozgur Karadeniz (Jul 02 2024 at 07:37):

Great, October works for us too :)

view this post on Zulip Philip Durbin 🚀 (Jul 08 2024 at 19:25):

Some nice slides from @Ozgur Karadeniz and @Dieuwertje Bloemen - https://zenodo.org/doi/10.5281/zenodo.12548333 (via #8688).

view this post on Zulip Philip Durbin 🚀 (Aug 06 2024 at 20:23):

@Ozgur Karadeniz I'm having some trouble with your RO-Crate exporter, unfortunately: https://github.com/gdcc/exporter-ro-crate/issues/2

view this post on Zulip Philip Durbin 🚀 (Aug 06 2024 at 20:23):

Probably it's me. Please help! :heart:

view this post on Zulip Philip Durbin 🚀 (Aug 06 2024 at 20:28):

@Oliver Bertuch by the way, I'm not sure this one should be on the official list yet: https://github.com/gdcc/dataverse-exporters#list-of-known-exporters

Earlier @Ceilyn Boyd and I were spitballing about some kind of indicator that stuff has been reviewed or QA'ed or vetted. This is a bigger topic than just RO-Crate exporters, of course!

view this post on Zulip Ozgur Karadeniz (Aug 07 2024 at 08:03):

Hi Phillip,

It's not you, I also get that in some datasets :sweat_smile: I started working on that bug, I'll try to fix that bug as soon as possible.

Thanks so much for testing! :smile:

view this post on Zulip Philip Durbin 🚀 (Aug 07 2024 at 15:46):

@Ozgur Karadeniz sure! Thanks for getting back to me so quickly. Hey, I'm hoping to co-assign #10744 to you. If you're willing, please accept the invite to join our "read only" team at https://github.com/IQSS I just sent you.

view this post on Zulip Philip Durbin 🚀 (Jan 23 2025 at 13:50):

@Balázs Pataki @Ozgur Karadeniz @Dieuwertje Bloemen and others, I'm aware of the wonderful exporters you've created for RO-Crate but does anyone here have researchers who upload RO-Crate files to Dataverse? Asking for a friend. :big_smile:

view this post on Zulip Balázs Pataki (Jan 23 2025 at 14:03):

Well, in our ARP system we have RO-Crate import. In our case we handle rocrates that use the URI-s in context matching the URI-s associated with Dataverse field types.

view this post on Zulip Philip Durbin 🚀 (Jan 23 2025 at 14:06):

@Balázs Pataki nice. Can you please link me to an example dataset?

view this post on Zulip Balázs Pataki (Jan 23 2025 at 14:08):

Sure, here's a dataset: https://repo.researchdata.hu/dataset.xhtml?persistentId=hdl:21.15109/ARP/PZINV9#describoTab

And here is its rocrate: https://repo.researchdata.hu/api/arp/rocrate/hdl:21.15109/ARP/PZINV9?version=1.0

view this post on Zulip Balázs Pataki (Jan 23 2025 at 14:09):

This is the rocrate format we generate and can ingest.

view this post on Zulip Philip Durbin 🚀 (Jan 23 2025 at 14:11):

I'm passing this along! Thanks!

view this post on Zulip Philip Durbin 🚀 (Feb 04 2025 at 17:34):

I see RO-Crate got a mention in this article by @Vaida Plankytė - https://upstream.force11.org/the-time-is-now-vertical-interoperability-between-research-tools-an-essential-enabler-for-the-fairification-of-data/

view this post on Zulip Vaida Plankytė 🎨 (Feb 05 2025 at 15:28):

@Philip Durbin ☃️ oh, you've noticed the new post so quickly! thank you for reading :sunglasses:

view this post on Zulip Philip Durbin 🚀 (Feb 05 2025 at 15:37):

Sure. @Sonia Barbosa found it. :dataverse_woman:

view this post on Zulip Philip Durbin 🚀 (Feb 24 2025 at 21:31):

@Eryk Kulikowski when you get a moment, please check this out: https://github.com/gdcc/dataverse-previewers/issues/106

view this post on Zulip Aina Jené (Jun 27 2025 at 10:53):

Hello! We're currently gathering information on tools that integrate RO-Crate with Dataverse. I’ve come across the RO-Crate exporter, but I'm particularly interested in whether an importer has been developed.

Specifically, we're exploring the possibility of implementing an importer similar to the one used in WorkflowHub or Zenodo—where an RO-Crate ZIP file is uploaded, automatically unpacked, and the metadata fields are populated based on the ro-crate-metadata.json file.

Do you know if a similar tool already exists for Dataverse?

Thank you in advance!

view this post on Zulip Balázs Pataki (Jun 27 2025 at 11:01):

Hi!

We have both export and import implemented but it is currently only available in our fork. We are in the process of making a PR out of our implementation and submit it to the Dataverse core, but I cannot give you an ETA for this.

Here's a demo of our implementation in the Hungarian ARP repository.

https://youtu.be/o_ENdITtIQg

I have not included RO-Crate import in this, but we have it implemented as you mentioned: you can upload a zip with files and and a ro-crate-metadata.json and we populate the dataset based on that. One caveat is that we currently can only import what has been exported from (another) Dataverse so it only works with ro-crate-metadata.json values, which match the Dataverse metadata blocks.

Besides RO-Crate export/import we also have a native RO-Crate editor (AROMA), which allows editing both Dataverse metadata block based metadata and any other metadata (eg. file metadata )you want to associate with your data.

view this post on Zulip Aina Jené (Jun 27 2025 at 11:05):

Thank you @Balázs Pataki . Such an impressive work!!! I'll share all this info with my team. Is there a way that I can follow up with you to know when the implementation is submitted to the Dataverse core?

view this post on Zulip Balázs Pataki (Jun 27 2025 at 11:06):

Sure, we will share progress here in Zulip as well.

view this post on Zulip Dieuwertje Bloemen (Jun 27 2025 at 12:42):

As Balazs points out, the import is a bit tricky the moment you allow all kinds of RO-Crates (with more metadata than what can be mapped one on one). We've been thinking of creating a general ro-crate metadata importer for a while (my colleague @Ozgur Karadeniz made one of the exporters that's currently available), but after some meetings with other Dataverse installations on how an importer would work, I think there's some more brainstorming necessary on what to do with the non-mappable metadata, and what happens with a following export if you don't discard that non-mappable metadata or if you edit the mapped metadata in Dataverse. Also, RO-Crate can contain files in structures that aren't compatible with the folder structuring in Dataverse, which complicates things if you want to import a full zip RO-Crate even further.
In other words, we would also be very interested in such an importer and would love to discuss what the best way of its functioning would be. Because for an ro-crate importer with extra metadata to work, we need to look at the entire upload, import and export flow. Maybe something to involve the people from the ro-crate consortium in as well, to get their input on the best way to do this.

view this post on Zulip Philip Durbin 🚀 (Jun 27 2025 at 13:08):

@Balázs Pataki great video!

view this post on Zulip Oliver Bertuch (Jun 27 2025 at 13:16):

@Dieuwertje Bloemen @Balázs Pataki would you agree that the hardest bit about RO-Crate deposition is the ability of RO-Crates to be put together like Matroschka Puppets, which is not something Dataverse supports well with the current concept of Datasets and Collections?

view this post on Zulip Balázs Pataki (Jun 27 2025 at 13:27):

What do you mean by "Matroschka Puppet" structure in the context of RO-Crate? Do you mean the dataset–subdataset relationship in an RO-Crate?

In our ARP case, when you upload a .zip file containing your files in a directory structure, we consider each directory a subdataset. In AROMA, you can associate metadata with each subdataset just as you would with individual files. However, all of this is contained within the RO-Crate and is not reflected in Dataverse, of course, since Dataverse has no concept of subdatasets (and directories).

We could use collections and subcollections for this in Dataverse, but I think that would really stretch the current data model.

view this post on Zulip Oliver Bertuch (Jun 27 2025 at 13:28):

Yeah, that's exactly what I meant! The consequence is that much of the content in the RO-Crate being deposited is not really machine actionable unless you download it.

view this post on Zulip Dieuwertje Bloemen (Jun 27 2025 at 14:02):

Yah, and also the issue if you upload an RO-Crate, and only map what you can, and then retain the leftover RO-Crate. That one contains information on the file structure and descriptive metadata on a file level too. But what if you then delete a file in the dataset in Dataverse, the uploaded RO-Crate won't be updated, and if someone downloads the entire thing afterwards, the leftover ro-crate won't make sense as there's metadata there on a file that is no longer in there. And that again just one issue. Also, how easy is it to make a 'leftover' RO-Crate, would it be better to just map & store the orginal one as a file too, but then what if you change the description in Dataverse, the ro-crate will have a different description and a new export will then have two contradicting metadata elements in there. There a bunch of considerations to make. The easiest option would be to discard any information that's not mapable, but I don't think that necessarily the 'right' option, as it could mean losing a lot of data.

view this post on Zulip Balázs Pataki (Jun 27 2025 at 14:48):

The question is how you think about the input ro-crate-metadata.json: whether it is used only at import time— after which you either keep it under a different name, remove it from the dataset, or retain it as is and update it as the dataset evolves. In the case of ARP, we follow the latter approach, but we do not manage any "leftovers" during import.

view this post on Zulip Philip Durbin 🚀 (Oct 27 2025 at 14:25):

The other day @Balázs Pataki let me know about https://www.researchobject.org/ro-crate/dataverse and it looks great but I did just propose a small tweak at https://github.com/ResearchObject/ro-crate/pull/500


Last updated: Nov 01 2025 at 14:11 UTC