Uploading a PNG file via dvuploader resulted in a file of type plain text???
You need to supply the mimeType for the file. I did experiment with leaving mimeType out of the request body, but it did not work. @Philip Durbin once mentioned that there is a trick to trigger detecting the mime type at Dataverse, but I dont remember exactly. Happy to fix this!
Please note that it’s possible to “trick” a Dataverse installation into giving a file a content type (MIME type) of your choosing. For example, you can make a text file be treated like a video file with -F 'file=@README.txt;type=video/mpeg4', for example. If the Dataverse installation does not properly detect a file type, specifying the content type via API like this a potential workaround.
https://guides.dataverse.org/en/6.2/api/native-api.html#add-a-file-to-a-dataset
Works for the native upload now! In the S3 case, it seems not to be possible to leave out the mimeType in the JSON. It will result in a failed registration of each file:
Bad Request: The file content type cannot be determined. <-- Is actually an XML file
I guess that due to the direct upload to S3, no type detection is happening at Dataverse. Is this correct? If so, I would add a step that checks whether each file object has a mime type before the upload is happening.
IIRC when registering these files you need to provide this metadata. There is also no ingest / analysis / unzip happening when using direct upload
Alright, the mime type is then essential for the upload. I will add a check before uploading.
It would mean more deps, but would it make sense to have a mime detection library do this for us?
There is one built into Python mimetypes - Covers most of it but has boundaries
An option would be magic but it requires libmagic to be installed, which is not a Python library.
I don't know if this is worth it since there are extra steps required to make it work.
@Jan Range what would you think about some sort of interface here? e.g., dvuploader could ship with mimetypes usage, but if you wanted to use another option (e.g. magic, or something that calls an external tool)
Sure, that is a great idea. As far as I know, the magic package requires the libmagic binaries.
An alternative would be to port infer from Rust via Python bindings. This way, users do not need to install these manually, and we can ship the interface without the need to install libmagic. Maybe there are some existing already. Otherwise, it is quite easy to set these up - The crate is quite simple.
yeah, there's a similar approach used in the marcel gem as used by Ruby on Rails - it uses the signatures from Apache Tika without otherwise adding a dependency on Tika itself.
There are no bindings yet, but I have created a simple one that guesses the mime type.
It is exclusive to binary formats and fails at CSV and other text-based ones. I can either include another one, or we can simply combine it with Python mimetypes.
Last updated: Nov 01 2025 at 14:11 UTC