This is again related to RO-Crate. So far we worked with a local filesystem storage and so we could easily create, edit, delete ro-crate-metatadata.json files in the same directory where the data files of a dataset are stored.
However, we want to support any storage type that is available in a Dataverse installation.
So, my question is: how can we manage non-data files that are store along with data files in a dataset in a storage independent manner?
For example, if a dataverse is configured to use S3 storage, how can I create an ro-crate-metatadata.json file in a dataset of that dataverse? I think I somehow need to get access to a StorageIO subclass matching the configured storage of the dataverse ofmy target dataset, eg. S3AccessIO in my example. Given that I have a Dataset object how can I get an S3AccessIO to manage my ro-crate-metatadata.json (create, edit, rename, delete)? Should I maybe use the AuxiliaryFile mechanism? But as far as I understand, an AuxiliaryFile is related to a DataFile and not a Dataset.
Maybe the way files like export_OAI_ORE.cached are handled?
Yes, auxiliary files are associated with data files, not datasets.
I'm a little confused. How is S3 so different than local? You should be able to create a JSON file for either one... unless, are you saying you aren't creating the JSON file with Dataverse? You're using some other process.
Can you please link to an example of how it all looks with local files? It sounds like you want to replicate that for S3.
To put simple: I just want to add a random file next to the datafiles, no matter where the dataset is stored (locally, in S3, Swift, etc.).
I think I was looking for the StorageIO interface and methods like openAuxChannel(), getAuxFileAsInputStream(), etc.
Ok, and you don't want this file to be entered into the database? It just sits there with Dataverse not knowing about it?
Yes. This is not a datafile, but a "random file" (ro-crate-metadata.json), which lives with the dataset. Much like the thumbnails, cache files, etc. that are already handled this way by Dataverse.
I'm not sure but I'm asking internally.
Going back to auxiliary files, what if you associated your ro-crate json file with one of the data files? Would that be a problem? Maybe there could always be a README.md or something.
I'm chatting with @Leo Andreev and Jim a bit.
Can you use the standard exporter framework?
Otherwise we might need to extend the idea of aux files to datasets.
This probably goes without saying, but I assume it's all related to your PR #10086. But now you want it to work with S3.
Yes, it is all in the context of RO-Crate handling. The RO-Crate metadata belongs to the dataset not a datafile, so it would be awkward to artificially join it to an adhoc datafile, like README.md, I think.
I think the "aux" things in StorageIO have nothing to do with the AuxiliaryFile objects and their handling. But it would be great if you could confirm it.
Well, the naming is confusing. They can be related. We have a discussion on Slack about this recently. AuxiliaryFile objects do make use of those "aux" methods in StorageIO. However, those "aux" methods have been around a long time and are used for a number of things besides AuxiliaryFile objects, such as thumbnails, exports, and provenance files.
I know it's confusing! :sweat_smile:
But what about using the standard exporter framework? Will that work for you?
What I try to achieve here is not actually part of #10086, there we don't need this, because there we only generate the ro-crate-metadata.json for the latest version of the dataset (we actually cache it in the filesystem, but it would work without the cache as well).
However, in our custom Dataverse installation we want to keep the ro-crate-metadata.json for all versions of the dataset and besides ro-crate-metadata.json we also store ro-crate-preview.html as well. So, here we need to manage all these "aux" files.
I see. It still feels like the exporter framework is close to what you need. Maybe it could be extended somehow?
Last updated: Nov 01 2025 at 14:11 UTC