merging s3 buckets · troubleshooting

So at UCLA we're still working on updateing from 5.14. One of the issues is that historically we've had separate s3 buckets for direct-uploading and regular uploads. Since that's no longer necessary we'd like to merge thiese.

Any suggestions on how to proceed?

Thank you,

jamie

Philip Durbin 🚀 (Jan 26 2026 at 19:33):

Hmm, https://guides.dataverse.org/en/6.9/developers/deployment.html#migrating-datafiles-from-local-storage-to-s3 is somewhat related. @Don Sizemore added it a while back, in #6789.

Philip Durbin 🚀 (Jan 26 2026 at 19:35):

Oh good, I see you asked here as well: https://groups.google.com/g/dataverse-community/c/zONHkY6gJMM/m/xxNSQT28GQAJ

jamie jamison (Jan 26 2026 at 19:51):

Fortunately I have a test system I can backup incase I break it.

I posed the question on chatgpt, for what it's worth this is the suggestion:
**

Option B: True merge (advanced, risky)

You must:

Copy objects into a single bucket
Update database references to the bucket name
Reindex and test every dataset

Typical SQL (example only):

UPDATE datafile

SET storageidentifier = REPLACE(storageidentifier,

's3://old-bucket/',

's3://new-bucket/');

:warning: Risks:

Broken downloads
Corrupt previews
Dataset integrity failures

This should only be done with:

Full DB backup
Test environment

Reindex after (bin/reindex.sh)**

Oliver Bertuch (Jan 26 2026 at 22:32):

I am rather sure (95%) that ChatGPT is wrong about reindexing the datasets.

Oliver Bertuch (Jan 26 2026 at 22:33):

I looked at the search/index code and schema and the storage identifier is nowhere to be found expect for one place, but then it is queried from DvObject which comes from the DB and not Solr.

Oliver Bertuch (Jan 26 2026 at 22:34):

(Also, it's in SolrSearchResult.json(), so very likely to be unrelated)

Oliver Bertuch (Jan 26 2026 at 22:35):

Aside from that: yes, you will need to change your storage identifiers by upgrading the location.

Oliver Bertuch (Jan 26 2026 at 22:35):

Keep in mind that the storage identifier has changed a bit over the versions, so take a look at the patterns your identifiers use first

jamie jamison (Jan 27 2026 at 00:33):

I assume that chatgpt is questionable but there was this in documentation, file-to-s3 transfer:
(https://guides.dataverse.org/en/6.9/developers/deployment.html#migrating-datafiles-from-local-storage-to-s3)

jamie jamison (Mar 10 2026 at 22:10):

Last thought here. Is there anywere a chart of the tables in Dataverse and how they are connected? Would help when contemplating table editing or moveing files.

Oliver Bertuch (Mar 11 2026 at 00:39):

Like this? https://guides.dataverse.org/en/latest/schemaspy/

Oliver Bertuch (Mar 11 2026 at 00:39):

Kudos to @Don Sizemore for keeping the lights on for that deployment... Things to automate one day, so we don't waste his precious time!

jamie jamison (Mar 26 2026 at 00:08):

update of sorts, merging files in s3:dataverse-files-direct-upload to s3:dataverse-files. Four datasets. Still on Dataverse 5.14

1) copied the files from the direct-load bucket to the dataverse-files (copied so the files still exist on the direct-upload bucket and the dataverse-file bucket)

2) updated postgresql

3) reindexed the datasets

Now the datasets in https://dataverse.ucla.edu/dataverse/textmining are giving 500 errors.

I've checked the database, no datasets with 'direct-upload' and solr so it looks like they are reindexed.

Could it be a problem that the files exist in two locations even though the database points to the new location? Put another way, should they be moved rather than copied to the new location?

Don Sizemore (Mar 26 2026 at 13:30):

Good morning, are there errors in Payara's server.log when you load the problem datasets?

jamie jamison (Mar 26 2026 at 15:26):

Here are the jvm options for the bucket, which is the original code where there was only one s3 bucket. There is a difference in the name but all other dataset are loading without error.
.
<jvm-options>-Ddataverse.files.s3.label=s3-dataverse-files</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-files</jvm-options>
<jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>

Here is the sql update code:
dvndb=# UPDATE dvobject
dvndb-# SET storageidentifier = REPLACE(storageidentifier, 'dataverse-files-direct-upload', 'dataverse-files')

This is the jvm for the 2nd bucket (that files were moved from):
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.label=s3-dataverse-files-direct-upload</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.bucket-name=dataverse-files-direct-upload</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.upload-redirect=true</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.download-redirect=true</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.url-expiration-minutes=120</jvm-options>

Last updated: May 30 2026 at 09:11 UTC

Stream: troubleshooting

Topic: merging s3 buckets

jamie jamison (Jan 26 2026 at 18:43):

Philip Durbin 🚀 (Jan 26 2026 at 19:33):

Philip Durbin 🚀 (Jan 26 2026 at 19:35):

jamie jamison (Jan 26 2026 at 19:51):

Option B: True merge (advanced, risky)

Oliver Bertuch (Jan 26 2026 at 22:32):

Oliver Bertuch (Jan 26 2026 at 22:33):

Oliver Bertuch (Jan 26 2026 at 22:34):

Oliver Bertuch (Jan 26 2026 at 22:35):

Oliver Bertuch (Jan 26 2026 at 22:35):

jamie jamison (Jan 27 2026 at 00:33):

jamie jamison (Mar 10 2026 at 22:10):

Oliver Bertuch (Mar 11 2026 at 00:39):

Oliver Bertuch (Mar 11 2026 at 00:39):

jamie jamison (Mar 26 2026 at 00:08):

Don Sizemore (Mar 26 2026 at 13:30):

jamie jamison (Mar 26 2026 at 15:26):