A couple of months ago, @Oliver Bertuch requested optimized S3 downloads that could be parallelized. We’ve identified potential libraries to achieve this, but I’m stuck figuring out the S3 URLs. I know these URLs are typically exposed through the redirect when using the DataAccess API, but I’m wondering if there’s a more efficient way to obtain them.
My concern is that not all files will be stored in S3, so the download might fail on instances that use a different storage. Therefore, I’d like to differentiate cases and support S3 downloads whenever possible, and fall back to normal HTTP downloads using httpx in other cases.
Or would you suggest using the redirect URL and checking if s3 is instorageIdentifieris sufficient?
Example DataFile Output
Yeah, checking the storageIdentifier feels like the right way.
Last updated: Jan 09 2026 at 14:18 UTC