Good morning!
I am testing uploading simaltenously 1gb files. I am quite often getting a failed with 500 error but doens't seem to output anything in the server.log. Has anyone done similar tests ? Smaller file sizes seem to work fine. I wonder what is provoking the 500 error and how i Can track it.
@Simon Carroll this fix might help: https://github.com/gdcc/python-dvuploader/pull/24
Thanks! Let me see (I was actually using the native API before).
Simon Carroll said:
Thanks! Let me see (I was actually using the native API before).
OK now I remember. We are not using S3 so the direct upload fails. It is not expect that 3 concurrent uploads would cause this with the Native API I suppose. I can try to investigate more.
@Simon Carroll are you using the Python-DVUploader native upload or the Native API directly?
Jan Range said:
Simon Carroll are you using the Python-DVUploader native upload or the Native API directly?
Good morning! I was using the Native API. I was seeing 500 errors when launching 3 concurrent uploads of 1GB. Occasionally I was able to upload 2 concurrent but normally launching 3 causes all to fail. I just tried using the Python-DVUploader but since we dont have object storage (yet) it reverts to the using the Native API (I assume the results would be the same but I didn't test it yet). Is this somewhat expected or a suprising result ? We are imagining the use case of several users/jobs uploading data at the same time.
Good morning @Simon Carroll :smile:
I assume that the 500 error stems from a dataset lock due to ingestion. This is typically the case for tabular files, which subsequently induce the lock and no further uploads/edits to the dataset are possible. There are two ways to circumvent this:
Zip files into an archive and upload it. If enabled, Dataverse will unzip the files and register each individually. This is the way Python-DVUploader handles this case in the non-S3 upload.
Disable tabIngest within the payload when sending the request to your instance. This will skip the ingestion process and no locks will happen. The downside is, that you need to manually trigger the ingestion, if wished.
I think the latter is the easiest way to get around this, but the zipping workflow really shines when you have a lot of small files.
If you are uploading tabular files, this could potentially fix the issue :smile:
As a last instance, you could move to sequential uploads and check for dataset locks, but I guess that's not as efficient as concurrent uploads.
Good morning! Thanks a lot for the comprehensive feedback. I will try the different approaches to see what can work for us. Many thanks!
Jan Range said:
Good morning Simon Carroll :)
I assume that the 500 error stems from a dataset lock due to ingestion. This is typically the case for tabular files, which subsequently induce the lock and no further uploads/edits to the dataset are possible. There are two ways to circumvent this:
Zip files into an archive and upload it. If enabled, Dataverse will unzip the files and register each individually. This is the way Python-DVUploader handles this case in the non-S3 upload.
Disable
tabIngestwithin the payload when sending the request to your instance. This will skip the ingestion process and no locks will happen. The downside is, that you need to manually trigger the ingestion, if wished.I think the latter is the easiest way to get around this, but the zipping workflow really shines when you have a lot of small files.
If you are uploading tabular files, this could potentially fix the issue :)
Good moring! I am playing around. If I upload via the native api 2 files iwith tab ingest diabled it seems one fails with internal server error 500 . I will attach an example. I am uploading files in to two seperate datasets (in the same collection). The point is it seems another problem outside of the locks.
nativeAPIuploadTabIngestDisabledFailed.log
I dont see anything in the server logs which is quite strange. Is there some class I need to explictly add to the debug options that can help ?
@Simon Carroll thanks for testing! That is odd, given the tabIngest is turned off. Can you send me the script you are using? For debugging, upon failure the function will raise an error with the message returned by the Dataverse instance. Do you have a full traceback to inspect where the error is happening?
OK here comes a bombardment. Here is the python script :
Here is a log if a single upload
here is a concurrent upload with ingestion on
TwoConcurrentUploadsDiffEnvIngestionOn.log
and here with it off
TwoConcurrentUploadsDiffEnvIngestionOff.log
I have included the errors from just one enviroment. There is no error in dataverse actually and I have noticed it seems the file that seems vo fail via API upload is in dataverse and valid. I suppose this is why I am not seeing a error log sever side ?
Thanks for providing the files! Now it is a bit clearer, because I thought you were using python-dvuploader.
Have you tried adding the tabIngest into the jsonData payload? As far as I know, passing it into the query parameters is not working, but maybe I am wrong @Philip Durbin โ๏ธ? In ther docs it says to add it to the payload.
Yes, it looks like tabIngest:false goes into the payload, the JSON you send.
@Simon Carroll which version of Dataverse are you running? tabIngest:false might be somewhat new. :thinking:
OK thanks. With the param in the jsonData it works as expected. About this :
"* Zip files into an archive and upload it. If enabled, Dataverse will unzip the files and register each individually. This is the way Python-DVUploader handles this case in the non-S3 upload."
Do you mean the python libary does this automatically in the case of non-S3 upload when falling back on the native API??
Philip Durbin โ๏ธ said:
Simon Carroll which version of Dataverse are you running? tabIngest:false might be somewhat new. :thinking:
6.5 but I suppose it came down to me not reading the documentation properly :)
@Simon Carroll yes, the Python library takes care of zipping the data and shipping it. Data >2gb will be zipped into multiple zips and uploaded simultaneously.
Last updated: Nov 01 2025 at 14:11 UTC