Stream: python

Topic: S3 Downloads


view this post on Zulip Jan Range (Nov 17 2025 at 14:55):

A couple of months ago, @Oliver Bertuch requested optimized S3 downloads that could be parallelized. We’ve identified potential libraries to achieve this, but I’m stuck figuring out the S3 URLs. I know these URLs are typically exposed through the redirect when using the DataAccess API, but I’m wondering if there’s a more efficient way to obtain them.

My concern is that not all files will be stored in S3, so the download might fail on instances that use a different storage. Therefore, I’d like to differentiate cases and support S3 downloads whenever possible, and fall back to normal HTTP downloads using httpx in other cases.

Or would you suggest using the redirect URL and checking if s3 is instorageIdentifieris sufficient?

Example DataFile Output

view this post on Zulip Philip Durbin 🚀 (Nov 17 2025 at 15:15):

Yeah, checking the storageIdentifier feels like the right way.


Last updated: Jan 09 2026 at 14:18 UTC