Stream: large-data

Topic: setting parameters for direct upload with python-dvuploader


view this post on Zulip Mattias de Hollander (Mar 05 2025 at 16:26):

python-dvuploader works quite nice now I can test direct upload on demo.dataverse.org. I was wondering if it possible to tune the upload parameters, like when you have a lot of small files of a few larger ones. @Jan Range, what is you experience? For example, my experience with s5cmd is that setting the number of workers and concurrency can improve the transfer speed: https://github.com/peak/s5cmd?tab=readme-ov-file#configuring-concurrency In my case I get 500 MB/s or higher with s5cmd to another s3 bucket, but with dvuploader it is around 80 MB/s, even when I increase the number of jobs. The number of jobs seems to not have much effect. Not sure if this is expected. Happy to hear what others do and get in terms of large data transfers.

view this post on Zulip Mattias de Hollander (Mar 11 2025 at 14:29):

Mattias de Hollander said:

python-dvuploader works quite nice now I can test direct upload on demo.dataverse.org. I was wondering if it possible to tune the upload parameters, like when you have a lot of small files of a few larger ones. Jan Range, what is you experience? For example, my experience with s5cmd is that setting the number of workers and concurrency can improve the transfer speed: https://github.com/peak/s5cmd?tab=readme-ov-file#configuring-concurrency In my case I get 500 MB/s or higher with s5cmd to another s3 bucket, but with dvuploader it is around 80 MB/s, even when I increase the number of jobs. The number of jobs seems to not have much effect. Not sure if this is expected. Happy to hear what others do and get in terms of large data transfers.

Just a friendly reminder about my previous question on optimizing upload speeds with dvuploader. I'm still curious to know if it can utilize parallel threads to boost speeds like s5cmd does - I've noticed s5cmd (written in Go) can reach 500 MB/s or higher, while dvuploader (in Python) tops out at around 80 MB/s for me. Is this expected, or is there a way to tweak dvuploader to get closer to maxing out my network bandwidth?

view this post on Zulip Philip Durbin 🚀 (Mar 11 2025 at 14:33):

I'm not sure but I moved the somewhat off topic posts to their own thread.


Last updated: Nov 01 2025 at 14:11 UTC