Hello,
We have a few collections we'd like to migrate into Dataverse where the files are already in an S3 bucket and curated by another application. Ideally we wouldn't have to move the files, as they could be, in theory, accessed from where they already are, plus they already have handles pointing to them there (not that we couldn't change this pointer, I think). We'd like to just give Dataverse access to this other bucket, in addition to it's other datastores.
I know it'd be straightforward, via the native API, to move the metadata into Dataverse. For the files, if we didn't want to migrate them, would we essentially be following the process for moving a large data set?
0) ensure that the second S3 bucket is configured to be access by Dataverse
1) have the metadata migration create place holder files for the datasets
2) have a script that manipulates the Dataverse database to point to the right S3 bucket and location w/i it. (This would be more than just replacing a placeholder, as the files wouldn't be where the place holder was set)
Would this work?
There are a few unknowns for us --
Note: As mentioned, we do have Handles on the files that point directly to the files in the buckets, and one thought we've had is to just use those as links to the data in the Dataverse record.
(I don't think OAI-PMH harvesting would be enough for this collection because the Datasets wouldn't technically be hosted elsewhere to point to. The goal here is to have the dataset in one place and the curation tool and the public access website (Dataverse) access it from there)
I'm still very new to Dataverse, so there might be other options I missing. Would love to hear some perspectives on this.
My first thought is that Jim Myers knows the most about this so you might want to cross post to https://groups.google.com/g/dataverse-community to get his attention. :grinning:
Yes, Dataverse can link to multiple S3 buckets. Each "store" can be configured separately, to use the same bucket, to use different buckets; it's up to you.
You might want to look into Trusted Remote Storage: https://guides.dataverse.org/en/6.3/installation/config.html#trusted-remote-storage
Do you want Dataverse to take over management of the files? Or do you want to manage them in S3 separately and simply let Dataverse know where the files live?
Thanks, Phil. I'll cross post there.
Good question about file management. For this collection of data, I think it might be the later option, but I'll have to ask the collection manager. What does it mean to have Dataverse take over management of the files?
Here's the cross post. Thanks again.
Well, do you want Dataverse to be able to delete files?
When a dataset is in draft and files are deleted, they are removed from the S3 bucket. Poof.
A new reply from Jim: https://groups.google.com/g/dataverse-community/c/133bNBCtXYc/m/mZkJ-3W_AAAJ
Last updated: Nov 01 2025 at 14:11 UTC