Stream: troubleshooting

Topic: merging s3 buckets


view this post on Zulip jamie jamison (Jan 26 2026 at 18:43):

So at UCLA we're still working on updateing from 5.14. One of the issues is that historically we've had separate s3 buckets for direct-uploading and regular uploads. Since that's no longer necessary we'd like to merge thiese.

Any suggestions on how to proceed?

Thank you,

jamie

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 26 2026 at 19:33):

Hmm, https://guides.dataverse.org/en/6.9/developers/deployment.html#migrating-datafiles-from-local-storage-to-s3 is somewhat related. @Don Sizemore added it a while back, in #6789.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 26 2026 at 19:35):

Oh good, I see you asked here as well: https://groups.google.com/g/dataverse-community/c/zONHkY6gJMM/m/xxNSQT28GQAJ

view this post on Zulip jamie jamison (Jan 26 2026 at 19:51):

Fortunately I have a test system I can backup incase I break it.

I posed the question on chatgpt, for what it's worth this is the suggestion:
**

Option B: True merge (advanced, risky)

You must:

  1. Copy objects into a single bucket

  2. Update database references to the bucket name

  3. Reindex and test every dataset

Typical SQL (example only):

UPDATE datafile

SET storageidentifier = REPLACE(storageidentifier,

's3://old-bucket/',

's3://new-bucket/');

:warning: Risks:

This should only be done with:

Reindex after (bin/reindex.sh)**

view this post on Zulip Oliver Bertuch (Jan 26 2026 at 22:32):

I am rather sure (95%) that ChatGPT is wrong about reindexing the datasets.

view this post on Zulip Oliver Bertuch (Jan 26 2026 at 22:33):

I looked at the search/index code and schema and the storage identifier is nowhere to be found expect for one place, but then it is queried from DvObject which comes from the DB and not Solr.

view this post on Zulip Oliver Bertuch (Jan 26 2026 at 22:34):

(Also, it's in SolrSearchResult.json(), so very likely to be unrelated)

view this post on Zulip Oliver Bertuch (Jan 26 2026 at 22:35):

Aside from that: yes, you will need to change your storage identifiers by upgrading the location.

view this post on Zulip Oliver Bertuch (Jan 26 2026 at 22:35):

Keep in mind that the storage identifier has changed a bit over the versions, so take a look at the patterns your identifiers use first

view this post on Zulip jamie jamison (Jan 27 2026 at 00:33):

I assume that chatgpt is questionable but there was this in documentation, file-to-s3 transfer:
(https://guides.dataverse.org/en/6.9/developers/deployment.html#migrating-datafiles-from-local-storage-to-s3)

view this post on Zulip jamie jamison (Mar 10 2026 at 22:10):

Last thought here. Is there anywere a chart of the tables in Dataverse and how they are connected? Would help when contemplating table editing or moveing files.

view this post on Zulip Oliver Bertuch (Mar 11 2026 at 00:39):

Like this? https://guides.dataverse.org/en/latest/schemaspy/

view this post on Zulip Oliver Bertuch (Mar 11 2026 at 00:39):

Kudos to @Don Sizemore for keeping the lights on for that deployment... Things to automate one day, so we don't waste his precious time!

view this post on Zulip jamie jamison (Mar 26 2026 at 00:08):

update of sorts, merging files in s3:dataverse-files-direct-upload to s3:dataverse-files. Four datasets. Still on Dataverse 5.14

1) copied the files from the direct-load bucket to the dataverse-files (copied so the files still exist on the direct-upload bucket and the dataverse-file bucket)

2) updated postgresql

3) reindexed the datasets

Now the datasets in https://dataverse.ucla.edu/dataverse/textmining are giving 500 errors.

I've checked the database, no datasets with 'direct-upload' and solr so it looks like they are reindexed.

Could it be a problem that the files exist in two locations even though the database points to the new location? Put another way, should they be moved rather than copied to the new location?

view this post on Zulip Don Sizemore (Mar 26 2026 at 13:30):

Good morning, are there errors in Payara's server.log when you load the problem datasets?

view this post on Zulip jamie jamison (Mar 26 2026 at 15:26):

Here is the beginning of the error.
[#|2026-03-26T15:10:25.263+0000|WARNING|Payara 5.2022.4|edu.harvard.iq.dataverse.dataaccess.DataAccess|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(4);_TimeMillis=1774537825263;_LevelValue=900;|
Could not find storage driver for: s3-dataverse-files|#]

[#|2026-03-26T15:10:25.266+0000|WARNING|Payara 5.2022.4|edu.harvard.iq.dataverse.dataaccess.DataAccess|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(4);_TimeMillis=1774537825266;_LevelValue=900;|
Could not find storage driver for: s3-dataverse-files|#]

[#|2026-03-26T15:10:25.267+0000|SEVERE|Payara 5.2022.4|javax.enterprise.resource.webcontainer.jsf.application|_ThreadID=97;_ThreadName=http-thread-pool::jk-connector(4);_TimeMillis=1774537825267;_LevelValue=1000;|
Error Rendering View[/dataset.xhtml]

Here are the jvm options for the bucket, which is the original code where there was only one s3 bucket. There is a difference in the name but all other dataset are loading without error.
.
<jvm-options>-Ddataverse.files.s3.label=s3-dataverse-files</jvm-options>
<jvm-options>-Ddataverse.files.s3.bucket-name=dataverse-files</jvm-options>
<jvm-options>-Ddataverse.files.s3.type=s3</jvm-options>

Here is the sql update code:
dvndb=# UPDATE dvobject
dvndb-# SET storageidentifier = REPLACE(storageidentifier, 'dataverse-files-direct-upload', 'dataverse-files')

This is the jvm for the 2nd bucket (that files were moved from):
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.type=s3</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.label=s3-dataverse-files-direct-upload</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.bucket-name=dataverse-files-direct-upload</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.upload-redirect=true</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.download-redirect=true</jvm-options>
<jvm-options>-Ddataverse.files.s3-dataverse-files-direct-upload.url-expiration-minutes=120</jvm-options>


Last updated: Apr 03 2026 at 06:08 UTC