So at UCLA we're still working on updateing from 5.14. One of the issues is that historically we've had separate s3 buckets for direct-uploading and regular uploads. Since that's no longer necessary we'd like to merge thiese.
Any suggestions on how to proceed?
Thank you,
jamie
Hmm, https://guides.dataverse.org/en/6.9/developers/deployment.html#migrating-datafiles-from-local-storage-to-s3 is somewhat related. @Don Sizemore added it a while back, in #6789.
Oh good, I see you asked here as well: https://groups.google.com/g/dataverse-community/c/zONHkY6gJMM/m/xxNSQT28GQAJ
Fortunately I have a test system I can backup incase I break it.
I posed the question on chatgpt, for what it's worth this is the suggestion:
**
You must:
Copy objects into a single bucket
Update database references to the bucket name
Reindex and test every dataset
Typical SQL (example only):
UPDATE datafile
SET storageidentifier = REPLACE(storageidentifier,
's3://old-bucket/',
's3://new-bucket/');
:warning: Risks:
Broken downloads
Corrupt previews
Dataset integrity failures
This should only be done with:
Full DB backup
Test environment
Reindex after (bin/reindex.sh)**
I am rather sure (95%) that ChatGPT is wrong about reindexing the datasets.
I looked at the search/index code and schema and the storage identifier is nowhere to be found expect for one place, but then it is queried from DvObject which comes from the DB and not Solr.
(Also, it's in SolrSearchResult.json(), so very likely to be unrelated)
Aside from that: yes, you will need to change your storage identifiers by upgrading the location.
Keep in mind that the storage identifier has changed a bit over the versions, so take a look at the patterns your identifiers use first
I assume that chatgpt is questionable but there was this in documentation, file-to-s3 transfer:
(https://guides.dataverse.org/en/6.9/developers/deployment.html#migrating-datafiles-from-local-storage-to-s3)
Last thought here. Is there anywere a chart of the tables in Dataverse and how they are connected? Would help when contemplating table editing or moveing files.
Like this? https://guides.dataverse.org/en/latest/schemaspy/
Kudos to @Don Sizemore for keeping the lights on for that deployment... Things to automate one day, so we don't waste his precious time!
update of sorts, merging files in s3:dataverse-files-direct-upload to s3:dataverse-files. Four datasets. Still on Dataverse 5.14
1) copied the files from the direct-load bucket to the dataverse-files (copied so the files still exist on the direct-upload bucket and the dataverse-file bucket)
2) updated postgresql
3) reindexed the datasets
Now the datasets in https://dataverse.ucla.edu/dataverse/textmining are giving 500 errors.
I've checked the database, no datasets with 'direct-upload' and solr so it looks like they are reindexed.
Could it be a problem that the files exist in two locations even though the database points to the new location? Put another way, should they be moved rather than copied to the new location?
Good morning, are there errors in Payara's server.log when you load the problem datasets?
Here is the beginning of the error.
[#|2026-03-26T15:10:25.263+0000|WARNING|Payara 5.2022.4|edu.harvard.iq.dataverse.dataaccess.DataAccess|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(4);_TimeMillis=1774537825263;_LevelValue=900;|
Could not find storage driver for: s3-dataverse-files|#]
[#|2026-03-26T15:10:25.266+0000|WARNING|Payara 5.2022.4|edu.harvard.iq.dataverse.dataaccess.DataAccess|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(4);_TimeMillis=1774537825266;_LevelValue=900;|
Could not find storage driver for: s3-dataverse-files|#]
[#|2026-03-26T15:10:25.267+0000|SEVERE|Payara 5.2022.4|javax.enterprise.resource.webcontainer.jsf.application|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(4);_TimeMillis=1774537825267;_LevelValue=1000;|
Error Rendering View[/dataset.xhtml]
Here are the jvm options for the bucket, which is the original code where there was only one s3 bucket. There is a difference in the name but all other dataset are loading without error.
.
<jvm-options>-Ddataverse.files.s3.label=s3-dataverse-files</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-files</jvm-options>
<jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>
Here is the sql update code:
dvndb=# UPDATE dvobject
dvndb-# SET storageidentifier = REPLACE(storageidentifier, 'dataverse-files-direct-upload', 'dataverse-files')
This is the jvm for the 2nd bucket (that files were moved from):
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.label=s3-dataverse-files-direct-upload</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.bucket-name=dataverse-files-direct-upload</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.upload-redirect=true</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.download-redirect=true</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.url-expiration-minutes=120</jvm-options>
Last updated: Apr 03 2026 at 06:08 UTC