Dataverse holds input data, code, and output artifacts. Users can happily read all three. But re-executing the code is more difficult and more useful than just reading it. More useful because users can tweak the code or try new input data. More difficult because being able to re-execute the code requires more than just the source code, also metadata such as R version (more generally software environment specification) and the "workflow" (what code to run in what order with what parameters, etc.).
Should Dataverse store machine-readable metadata describing how to run the code?
One possible implementation, for example, might be to have a polymorphic "software environment" field (could be requirements.txt, renv.lock, Dockerfile), and a polymorphic "workflow" field (could be a script, CWL workflow, or other kind of workflow, that kicks off the rest of the code in the Dataset (to use Dataverse's term)). Other implementations are possible. For this discussion, I want to ask if _any_ such implementation is considered "in scope" for the Dataverse project by the community (especially devs!), or is it considered better handled by other tools?
Hi! We have an ongoing grant with the NIH (GREI) and as part of "aim 3" we definitely plan to poke around in this area. I don't think much has been written on this yet but there are a few notes at https://github.com/IQSS/dataverse-pm/issues/15
It's a big topic. Overall, I'm just trying to say "yes, and we'd love to hear your ideas"! :grinning:
CodeMeta has "build instructions": https://guides.dataverse.org/en/6.0/user/appendix.html
There's that, I also just learned about Workflow Run Crate profile of RO-Crates
We had a guy at Harvard Medical School create a very specific custom metadata block for x-ray crystallography that included reprocessing instructions.
Ah, you might also be interested in #dev > RO-Crate .
Ooh, do you have a link handy about the x-ray crystallography?
Last updated: Nov 01 2025 at 14:11 UTC