Hi folks, hope this is the right place to ask (and happy to relay elsewhere). We're testing Dataverse with direct uploads to a CloudianS3 bucket with immutability enabled, which in their implementation requires that the Content-MD5 header is sent when PUTing an object to the backend. My understanding is that the presigned URL would need to include Content-MD5 as one of the signed headers and incorporate that into its signature, in addition to the client separately sending that header in the direct PUT request. I don't think that's possible without the UI prompting the user or otherwise calculating the MD5 before submitting the Dataverse API request that generates the presigned URL. But I just wanted to throw it out there in case anyone else had run into this specific issue! Any thoughts would be appreciated.
Hmm. Could you please open an issue about this at https://github.com/IQSS/dataverse/issues ?
@Philip Durbin 🚀 Will do! I'll file that later today.
Ok, submitted —> https://github.com/IQSS/dataverse/issues/11901 Hope that's clear. This is not a big ask for us, as we're going to use a workaround, but given that it touches on data integrity I think it's useful for you (us?) all to ponder even if it doesn't make it into Dataverse.
As for our workaround(s):
Looks great. Thanks! Have you looked at https://guides.dataverse.org/en/6.8/developers/s3-direct-upload-api.html
You'll find md5Hash in there.
We have, but unless I'm mistaken that doesn't solve this problem. We're not able to upload files at all to S3 buckets with ObjectLock enabled b/c the Content-MD5 header needs to be submitted at upload time. That's true whether it goes through Dataverse (direct-upload=false) or from the client (direct-upload=true, in which case the pre-signed URL would also need to contain the header+value). Does that make sense?
It brings up a related question — how are folks verifying file integrity when using S3 backends? Is that all out-of-band? If we're not validating checksums on submission, then the obvious answer would be to audit buckets by comparing metadata in Dataverse with the results of a ComputeChecksum job on the bucket.
Hmm, I see what you mean, I think. That md5Hash I mentioned is what you tell Dataverse the md5 is for the file. But Dataverse is just trusting you on that, right?
Anyway, I think Jim is on his way to the Head of the Charles but he'll probably see your issue and get back to you next week. He implemented most of this. :smile:
Last updated: Nov 01 2025 at 14:11 UTC