Stream: troubleshooting

Topic: โœ” API Rate limits when registering files


view this post on Zulip Jan Range (Feb 28 2024 at 12:12):

Hi @all, I have encountered issues registering many files after directly uploading to an S3 storage.

For context, upon direct upload the python-dvuploader library first uploads all files to the storage using the ticket system and then registers each file asynchronously at the instance. This is where I am running into, I guess, rate-limiting problems. The library that I am using throws an exception that the connection is closed and I suspect that it's simply too many requests at once or too frequent.

Hence, my question is, how many concurrent requests are reasonable to handle by the backend and roughly how many per minute? I am trying to find a good default that can fit most instances equally well while being somewhat performant, compared to synchronous requests.

I've been testing this with demo Dataverse and locally using the docker compose variant with localstack. In both cases I have encountered this issue.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 12:23):

I'm reminded of this cartoon. :sweat_smile:
bridge.png

view this post on Zulip Jan Range (Feb 28 2024 at 12:26):

Well, then it is time to load up the trucks :joy:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 12:27):

I'd say so. We'll learn something!

view this post on Zulip Jan Range (Feb 28 2024 at 14:57):

So I did a little more testing and tried with a larger amount of files (1000 small ones) and looked into the Dataverse logs. Once more than one request is sent simultaneously, the dataset goes into a lock. Plus, I am not able to remove the lock via the UI. If done synchronously, there is no issue and lock.

Do you know what could be the cause of this? I have added the logs below:

2024-02-28 15:52:03 dev_dataverse   | [#|2024-02-28T14:52:03.794+0000|SEVERE|Payara 6.2023.8|edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper|_ThreadID=104;_ThreadName=http-thread-pool::http-listener-1(5);_TimeMillis=1709131923794;_LevelValue=1000;|
2024-02-28 15:52:03 dev_dataverse   |   Failed to add file to dataset.|#]
2024-02-28 15:52:03 dev_dataverse   |
2024-02-28 15:52:03 dev_dataverse   | [#|2024-02-28T14:52:03.795+0000|SEVERE|Payara 6.2023.8|edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper|_ThreadID=104;_ThreadName=http-thread-pool::http-listener-1(5);_TimeMillis=1709131923795;_LevelValue=1000;|
2024-02-28 15:52:03 dev_dataverse   |   Dataset cannot be edited due to dataset lock.|#]

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 16:39):

Sadly, I'm sort of not surprised that you're getting locks when sending files asynchronously.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 16:39):

What do you want to know the cause of? The locks? Or why you can't remove them? Or why you have to upload files synchronously? Or all of the above? :grinning:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:24):

Have you tried this? https://guides.dataverse.org/en/6.1/developers/s3-direct-upload-api.html#to-add-multiple-uploaded-files-to-the-dataset

view this post on Zulip Jan Range (Feb 28 2024 at 17:32):

Sometimes you can't see the forest for the trees :dizzy: That did the trick! Thanks Phil :dataverse_man:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:38):

Fantastic! :tada:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:38):

Jim wrote the code. I'm just the messenger. :grinning:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:39):

Should we improve the docs somehow? :thinking:

view this post on Zulip Jan Range (Feb 28 2024 at 17:52):

It's working flawlessly now, even with a bunch of files! Thanks again :smile:

image.png

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:52):

Great!

view this post on Zulip Jan Range (Feb 28 2024 at 17:52):

Philip Durbin schrieb:

Should we improve the docs somehow? :thinking:

No, I think this was just my fault here. I was thinking that the default file add endpoint is used and assumed that this is the only way. You never stop learning :grinning:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:54):

Ok, well, this should probably go in a new thread but I've been thinking that perhaps we need more tutorials in the API Guide.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2024 at 17:54):

We have https://guides.dataverse.org/en/6.1/api/getting-started.html#uploading-files but it only references the default way.

view this post on Zulip Jan Range (Feb 29 2024 at 06:09):

I think that listing the direct upload feature would also be good. At the very least, it raises awareness of its existence. The sentence within the Native API docs would be sufficient to add to the "uploading files" guide:

when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This is more complex and is described in the Direct DataFile Upload/Replace API guide.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 29 2024 at 11:31):

Sounds good. Do you want to make a PR?

view this post on Zulip Jan Range (Mar 01 2024 at 07:36):

Of course, just opened a PR for this

https://github.com/IQSS/dataverse/pull/10347

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 01 2024 at 11:11):

Thanks! :heart: I'm making a couple minor tweaks.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 01 2024 at 12:04):

Merged! Thanks again!

view this post on Zulip Jan Range (Mar 01 2024 at 16:06):

Awesome! Thanks :heart:

view this post on Zulip Notification Bot (Mar 01 2024 at 16:06):

Jan Range has marked this topic as resolved.


Last updated: Jan 09 2026 at 14:18 UTC