i've been starting to use python-dvuploader's cli for most of our uploads, and i'm running into an issue for zip files that have been "double-zipped" to workaround cases where a zip file in the data set contains more files than the dataverse limit. the upload seems to succeed, but results in an error like ValueError: ('File DXXFAM.zip.zip not found in Dataverse repository.', 'This may be due to the file not being uploaded to the repository:').
i'm guessing this is fine, as the unpacked double-zipped files contents are actually there. any suggestions about what the error handling logic might be here?
@María A. Matienzo Thanks for the feedback! The issue likely stems from the postponed metadata update. I guess you are using the non-S3-upload?
In this case, all files provided are zipped and uploaded. Due to the zipping, the individual file metadata cannot be passed, and this call updates the metadata for each file. My guess is that is that this is causing the issue.
I will look into this and replicate it locally. Maybe the double-zipped case is an edge case I need to take care of.
yes, that's correct - we're still using the native API as opposed to direct upload.
i'll also note that the error where a zip file is not double-zipped and it contains more than the limit is failing silently with dvuploader, which leads to a retry loop (this is based on testing a rebased version of the branch for the tabIngest PR).
silent failure in this case means that it's not reported back from dvuploader to the user, despite the API endpoint returning an error.
That's good to know! I was not aware of this. I guess it would make sense to explicitly check for this here.
I am checking for the status through raise_for_status, but it seems like it is not catching the error. Is the status code a different one in this case?
i'm not sure what the HTTP return code for this is coming from Dataverse, but the following message is returned as JSON:
{"status":"ERROR","message":"The number of files in the zip archive is over the limit (1000); please upload a zip archive with fewer files, if you want them to be ingested as individual DataFiles."}
Okay, it may have a different status code that httpx does not recognize as an error. Reproducing now and will look into the status code.
Okay, I got both cases fixed:
The zip-zip case was related to the update metadata function, which tries to map the local file to the ones at Dataverse. Since the Zip is unpacked and not present in the dataset, this case will be skipped in the update step, since there is nothing to update.
The zip limit case is now handled explicitly as well. It raises a 400 and if that is the case, plus the message matches, the code will raise a ValueError and will stop the upload process. Hence, the retry logic is stopped.
Since these are rather small changes, I would ship it with the tabIngest PR after I have added test cases for this. Thanks again for raising awareness of this :smile:
thank you! this is great. :)
Perfect! I will add these to the PR this week and merge it :smile:
@María A. Matienzo The new version of python-uploader has just been released :raised_hands:
https://pypi.org/project/dvuploader/
Nice, I added it to the upcoming news.
wonderful!
Last updated: Nov 01 2025 at 14:11 UTC