Looks like something is wrong with the demo page? https://demo.dataverse.org/
image.png
I really wonder if I just killed it, because it was working until now and I just tried to send a compressed TGZ archive (to test a feature for the galaxy integration). The reason is that zip files get unarchived automatically and I wanted to test if the same happens with .tgz files.
This is the response I just got after I tried to upload it via API and since then the server is not responsive anymore:
raised unexpected: Exception('Request to https://demo.dataverse.org/api/v1/datasets/:persistentId/add?persistentId=doi:10.70122/FK2/3HKFAU failed with status code 400: Failed to add file to dataset.')
CC @Philip Durbin 🚀
reimport-test-3.tar.gz
this should be the file that was sent, just a tar.gz file with two images inside
page is back up apparently
Sorry, we just released Dataverse 6.5 (#community > Dataverse 6.5 is here! ) and were updating the demo site to it.
haha okay I'm glad :D was just in exactly that moment it went down
I still get the 400 error though. Are .tar.gz files not accepted?
hmm I tested it directly via API and it uploads fine actually. I will have to investigate further what exactly Galaxy is sending.
curl -H "X-Dataverse-key:XXX" -X POST -F file=@test.tar.gz "https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.70122/FK2/DIG2DG"
It's weird, I can't see why it works with curl and not with python. I had a look at the raw requests by using both postman and python to send the post requests to https://httpbin.org/post and the requests look virtually equal. However curl successfully uploads the file and in python I get failed with status code 400: Failed to add file to dataset.
Screenshot 2024-12-14 at 12.02.40.png
It would be amazing if somebody could help me out with the dataverse server logs next week :folded_hands: . I just made two requests via galaxy (python), one with the failing .tar.gz file and one with the working .zip (time is CET):
[2024-12-14 12:10:55,328: DEBUG/main] https://demo.dataverse.org:443 "POST /api/v1/datasets/:persistentId/add?persistentId=doi:10.70122/FK2/3HKFAU HTTP/1.1" 400 61
[2024-12-14 12:10:55,330: WARNING/main] RESPONSE: {'status': 'ERROR', 'message': 'Failed to add file to dataset.'}
[2024-12-14 12:12:23,311: DEBUG/main] https://demo.dataverse.org:443 "POST /api/v1/datasets/:persistentId/add?persistentId=doi:10.70122/FK2/3HKFAU HTTP/1.1" 200 None
[2024-12-14 12:12:23,312: WARNING/main] RESPONSE: {'status': 'OK', 'message': {'message': 'This file has the same content as test.txt_64e0efaf9500cb29.txt that is in the dataset. '} ...,
oh and I just made a third one, the working curl request with the .tar.gz at 12:15 CET:
curl -H "X-Dataverse-keyXXX" -X POST -F file=@small-file-history-test.tar.gz "https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.70122/FK2/3HKFAU"
{"status":"OK","message":{"message":"This file has the same content as small-file-history-test.tar.gz that is in the dataset. "},"data":{"files":[{"description":"","label":"small-file-history-test.tar-3.gz","restricted":false,"version":1,"datasetVersionId":276019,"dataFile":{"id":2476460,"persistentId":"doi:10.70122/FK2/3HKFAU/9RJV3C","pidURL":"https://doi.org/10.70122/FK2/3HKFAU/9RJV3C","filename":"small-file-history-test.tar-3.gz","contentType":"application/gzip","friendlyType":"Gzip Archive","filesize":1894,"description":"","storageIdentifier":"s3://demo-dataverse-org:193c4e10eff-c0474fd01718","rootDataFileId":-1,"md5":"f804a9a4e5f8f373dd87938ad1d01325","checksum":{"type":"MD5","value":"f804a9a4e5f8f373dd87938ad1d01325"},"tabularData":false,"creationDate":"2024-12-14","fileAccessRequest":false}}]}}%
Oh, so it wasn't the demo site being down. :thinking:
@Kai König what's the latest, please? It work with curl but not Python? (We might want to move this topic to #python.)
Do you want to give it a try on https://beta.dataverse.org ?
yes exactly works with curl but not with python, my last messages are still the current state. I ignored this issue for now because zip upload works
This topic was moved here from #community > uploading .tar.gz files via API by Philip Durbin 🚀.
Can you please show us your python script?
Sure!
with open(file_path, "rb") as file:
files = {'file': (filename, file)}
payload = dict()
add_files_url = self.add_files_to_dataset_url(dataset_id)
response = requests.post(
add_files_url,
data=payload,
files=files,
headers=headers)
self._ensure_response_has_expected_status_code(response, 200)
def add_files_to_dataset_url(self, dataset_id: str) -> str:
return f"{self.api_base_url}/datasets/:persistentId/add?persistentId={dataset_id}"
@Kai König thanks for reaching out! I'll read through the messages an get back to you asap
@Kai König are you using pyDataverse or playin requests? I have just tested it using the former and everything works well using tar.gz.
When using requests, it's essential to provide the form-data section jsonData as a string. Providing it as a dict may lead to issues. This might explain why the replace endpoint didn't function correctly too.
You may want to check out the pyDataverse implementation as guidance.
@Philip Durbin 🚀 would it make sense to mention this in the general docs or is it to specific for Python?
Well, dicts are Python-specific.
Let's see if @Kai König is unblocked now. Thanks for helping! Then we can figure out where to highlight the fix in the docs.
thanks guys! Will have a look at this tomorrow. If what you wrote is the source of this issue, I would find it weird anyway. Because the .zip file uses the exact same function and it works without problems there.
@Kai König please do feel free to open an issue at https://github.com/IQSS/dataverse/issues about the confusion
It depends, if you post metadata such as description or else, the jsonData field needs to be a string. Very odd, but otherwise you'll get an error. But I missed that you are in fact not passing any metadata, so that might not be relevant here - Probably in the replace case though.
I have used requests to reproduce your error, but I was not able to. Here is the code I have been using to upload a tar.gz file:
from rich import print
import json
import requests
pid = "doi:10.70122/FK2/4ZCAHN"
url = f"https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId={pid}"
files = {
"file": ("some_other_name.tar.gz", open("test.tar.gz", "rb"), "application/octet-stream")
}
metadata = json.dumps({
"description": "Look, I am a DataFile!",
})
headers = {"X-Dataverse-Key": "..."}
resp = requests.post(
url,
files=files,
data={"jsonData": metadata},
headers=headers
)
print(resp.json())
By the way, you get this when you don't serialize the payload to a string.
@Jan Range thanks for looking into this! Well it is odd, I tried again and added "application/octet-stream" as filetype and completely removed the metadat parameter. But I still get the same error
So I guess the galaxy application is doing something to the file that makes the request fail. The server logs would definitely be helpful
but tbh, it's not a super high priority, because people can just export as zip and that works
im going to integrate the last feature and then will try wrap up the integration. If anyone provides me server logs or more insights I might look into this again.
@Kai König Regarding server logs, you can also host Dataverse via Docker locally. There should be no difference to the Demo website. It helps a lot with debugging, especially when the API responses are not specific enough. You can clone the main repo and run the following, given mvn is installed:
mvn -Pct clean package docker:start
I usually do it that way, but @Philip Durbin 🚀 may have better approaches?
Also, if you want to CI/CD test your Python library, I highly recommend using our GitHub Action. Here is an example workflow, which you can mostly copy-paste.
Thanks Jan, Im basically finished now with the integration and in my experience creating a local env always is more work as expected... Is the docker setup really easy or do I have to configure anything?
From my experience, it has always been straightforward. There was no additional configuration necessary, except for adding certain services. I just found the part in the docs, if that helps:
https://guides.dataverse.org/en/latest/container/dev-usage.html
The quickstart ( https://guides.dataverse.org/en/latest/developers/dev-environment.html#quickstart ) should "just work" and if it doesn't, please let us know in #containers! :sweat_smile:
Last updated: Nov 01 2025 at 14:11 UTC