Stream: python

Topic: Dataverse filesystem-like extension


view this post on Zulip Jan Range (Apr 12 2024 at 08:04):

@Oliver Bertuch had the idea of providing a custom filesystem to remotely access and upload files via PyDataverse. This works already and will soon be put into a PR.

Another idea was to allow crawling ZIP files, similar to the Zip Previewer. Unfortunately the previewer initially downloads the Zip file and then displays the content. Hence my question, is there a way or could you think of providing this as a Dataverse endpoint? At least listing the content would be beneficial.

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 11:42):

I may not be following very well..

I did see this issue about a PyFilesystem implementation: https://github.com/gdcc/pyDataverse/issues/178

But now you're asking about zip files? You want to preview the contents from pyDataverse?

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:43):

Exactly! The idea goes like this: someone wants to use a single file from a ZIP living on Dataverse. Using the ZIP Pyfilesystem backed by the Dataverse Pyfilesystem, you would be able to retrieve it, without downloading all of it.

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 11:45):

Does the existing HTTP Range header support help here? Or do we need a new API endpoint? You just want the list of contents in the zip file, right?

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:45):

I'm not sure how the ZIP file previewer does it

view this post on Zulip Jan Range (Apr 12 2024 at 11:46):

It uses HTTPRange too

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:46):

It must extract the list of files from the ZIP to display it

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:46):

And then probably has coded in the ranges to be able to download a single file from the ZIP without downloading the whole ZIP file first

view this post on Zulip Jan Range (Apr 12 2024 at 11:47):

I have never used HTTP Range though. Would need to dig a bit into this.

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 11:48):

Range has your name on it! Please see https://guides.dataverse.org/en/6.2/api/dataaccess.html#headers

view this post on Zulip Jan Range (Apr 12 2024 at 11:49):

Hehe should be familiar thing to me :grinning:

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 11:49):

In my imagination the zip previewer/downloader uses the Range header to get the listing of files. Then it presents the list to the user. When the user clicks a file to download, it usese the Range header again to download just the bytes for that file.

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 11:50):

I don't think it downloads the entire zip file to get the listing of files. I sure hope not.

view this post on Zulip Jan Range (Apr 12 2024 at 11:50):

Maybe there is something to deal with this already present in PyFilesystems. Will check!

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:56):

Probably https://docs.pyfilesystem.org/en/latest/_modules/fs/zipfs.html#ZipFS does not support remote ZIP files or range requests...

view this post on Zulip Jan Range (Apr 12 2024 at 11:56):

I have found another library that supports opening remote S3 and I guess HTTP too:

https://pypi.org/project/smart-open/

view this post on Zulip Jan Range (Apr 12 2024 at 11:57):

Giving it a try now

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:58):

Oh wow!

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:58):

Looks amazing!

view this post on Zulip Jan Range (Apr 12 2024 at 11:58):

The unicorn we needed :grinning:

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:00):

Hmm does it support ZIP?

view this post on Zulip Jan Range (Apr 12 2024 at 12:00):

Loads ZIPs very well from remote! At least we are getting some binary. Checking the content now

view this post on Zulip Jan Range (Apr 12 2024 at 12:00):

image.png

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:01):

Might need to add a compression handler...

view this post on Zulip Jan Range (Apr 12 2024 at 12:01):

Remote file was a tar file btw.

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:02):

From looking at the library, Im not sure it supports extracting the list of files and extracting parts of the ZIP file via HTTP range requests

view this post on Zulip Jan Range (Apr 12 2024 at 12:04):

Yes, that seems impossible. I can't find any documentation about this, and there is no dedicated method. Would it be a contender for the S3 download though? Seems pretty simple to me

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:06):

Related and Interesting: https://github.com/piskvorky/smart_open/issues/725

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:13):

I suppose the ZIP file previewer is loading a few kilobytes from the end of the zip file (the size is known from metadata IIRC or maybe ranges support negative values)

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:14):

If you find the central directory header signature you know you got it all (0x02014b50) and can start browsing for the files

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:14):

https://en.wikipedia.org/wiki/ZIP_(file_format)

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:15):

Oh wait it actually is even easier when you know the last byte... There is a record about the central directory at the end of it all

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 12:17):

Again, it should be possible to see what the ZIP file previewer is doing and try to transfer that to Python

view this post on Zulip Jan Range (Apr 12 2024 at 12:17):

Cool! Learned something new today :smile: Going to check it out!

view this post on Zulip Jan Range (Apr 12 2024 at 12:19):

Feels so good to do some coding again after a week full of enzyme catalysis stuff :grinning:

view this post on Zulip Jan Range (Apr 12 2024 at 12:56):

Some StackOverflow digging has helped!

https://stackoverflow.com/a/17434121 (especially the last section with ZipFile)

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 13:07):

@Markus Haarländer created the zip previewer/downloader. Maybe he can help.

view this post on Zulip Jan Range (Apr 12 2024 at 13:12):

I think I have it!

image.png

view this post on Zulip Jan Range (Apr 12 2024 at 13:13):

Original file - https://darus.uni-stuttgart.de/file.xhtml?persistentId=doi:10.18419/darus-3372/7

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:15):

That looks promising!

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:16):

It might be necessary to repeat the reading of 256k if the ZIP file is large

view this post on Zulip Jan Range (Apr 12 2024 at 13:17):

Yes, I will add an iterative process. According to StackOverflow ZipFile will raise an error if it is not enough. Hence, I would just repeat and increment until there is no error.

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:17):

Sounds good to me!

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:18):

So this would be a feature of DataverseFS, right?

view this post on Zulip Jan Range (Apr 12 2024 at 13:18):

Yes, would be a great feature to have! Next challenge though is to extract a specific file

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:19):

Oh and what about retrieval of a file from the ZIP? You didn't take a look at that yet, right?

view this post on Zulip Jan Range (Apr 12 2024 at 13:19):

I would suggest to first nail this down on the example and then generalize it in DataverseFS

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:19):

Ha! You beat me to it :racecar:

view this post on Zulip Jan Range (Apr 12 2024 at 13:19):

Haha

view this post on Zulip Jan Range (Apr 12 2024 at 13:20):

Will look into this. Guess the ZIP Preview has some ideas already

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 13:21):

Probably they extract the byte locations from the ZIP directory and merge it all into a request including the range

view this post on Zulip Jan Range (Apr 12 2024 at 13:23):

That makes sense

view this post on Zulip Jan Range (Apr 12 2024 at 13:23):

Maybe ZipFile has some nice utilities for that

view this post on Zulip Markus Haarländer (Apr 12 2024 at 13:30):

Hi guys.
Seems you already mastered most of it. Yes, the ZipPreviewer utilizes HTTP Range Requests to read the central directory of a ZIP file first, and to download and extract single files from the ZIP (using ranges from the central directory). It makes use of a great JavaScript Library which can do all of these things: https://github.com/gildas-lormeau/zip.js. But I don't know about a Python library

view this post on Zulip Jan Range (Apr 12 2024 at 15:49):

@Markus Haarländer thanks for the info! Glad to hear we are on the right track :smile:

view this post on Zulip Jan Range (Apr 12 2024 at 15:50):

Coincidentally, right after reading your message I stumbled across something similar to zip.js and it does exactly what we want! The library is called remotezip

image.png

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 15:51):

mmm, pythonic :yum:

view this post on Zulip Jan Range (Apr 12 2024 at 15:55):

For reference, this is a 2.8 GB file, and even this works pretty well/fast. Super nice!!

image.png

view this post on Zulip Philip Durbin 🚀 (Apr 12 2024 at 15:57):

We're going to have a lot to talk about at the next pyDataverse meeting. :grinning:

view this post on Zulip Jan Range (Apr 12 2024 at 15:57):

True :grinning_face_with_smiling_eyes:

view this post on Zulip Jan Range (Apr 15 2024 at 09:37):

The code for listing the contents of a ZIP file and downloading specific parts is working well. I have separated the Dataverse and Zip Filesystem, so passing a DataFile object to the ZIP filesystem is necessary. Here is a working example:

image.png

view this post on Zulip Jan Range (Apr 15 2024 at 09:38):

On the left sidebar, you can see the downloaded part of the ZIP file. Once I have implemented the write method to upload data files, I will create a pull request :smile:

view this post on Zulip Oliver Bertuch (Apr 15 2024 at 09:47):

This looks amazing!

view this post on Zulip Oliver Bertuch (Apr 15 2024 at 09:48):

Question: should the name of the ZipFS rather be "RemoteZipFS" to make it more obvious what this is about? Someone might want to combine it with the ZipFS shipped with PyFilesystem

view this post on Zulip Jan Range (Apr 15 2024 at 09:48):

Yes, that makes sense! Will rename it :smile:

view this post on Zulip Jan Range (Apr 15 2024 at 11:39):

Upload works too :smile:

image.png

view this post on Zulip Jan Range (Apr 15 2024 at 11:44):

@Philip Durbin Would Stefano be interested in putting the "Compute on Data" logic into the filesystem? I think this would be the right place

view this post on Zulip Philip Durbin 🚀 (Apr 16 2024 at 13:39):

Maybe! Let's see what @Leo Andreev thinks.

view this post on Zulip Jan Range (Apr 17 2024 at 06:13):

I think the filesystem is pretty close to the finish. It is now possible to write files similar to regular filesystems using the open("my.file", "w") way. Will create a new data file upon closing and also supports S3 due to DVUploader. Here is an example:

image.png

view this post on Zulip Jan Range (Apr 17 2024 at 06:14):

Feels almost like you are writing files to the hard drive :stuck_out_tongue:

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:41):

Would it be an option to have a "with" thing? So on any close the file automatically gets uploaded?

view this post on Zulip Jan Range (Apr 17 2024 at 06:41):

Do you mean for the filesystem itself?

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:42):

Or is this already happening?

view this post on Zulip Jan Range (Apr 17 2024 at 06:42):

The with operation is already available for files itself

view this post on Zulip Jan Range (Apr 17 2024 at 06:42):

You can either use the with to close automatically or invoke it by yourself afterwards for the upload

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:42):

Ah! So the second loop already does the upload in the background

view this post on Zulip Jan Range (Apr 17 2024 at 06:42):

image.png

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:42):

And the third loop is just an example to upload other files on disk?

view this post on Zulip Jan Range (Apr 17 2024 at 06:43):

Yes, exactly :smile:

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:43):

Great!

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:43):

This is really great!

view this post on Zulip Jan Range (Apr 17 2024 at 06:43):

I thought it would make sense to have a local file option too, since the usual open operation is blocking and yet cant be parallelized

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:43):

Does it support remote files (for download), too?

view this post on Zulip Jan Range (Apr 17 2024 at 06:44):

Yes, you can download any file from a dataset including zip members

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:44):

No I meant files that are in a remote store.

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:44):

So not in S3 and not stored in Dataverse

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:44):

But referenced using a URL

view this post on Zulip Jan Range (Apr 17 2024 at 06:45):

Ah alright, I have not tested this yet, but I am sure there are ways to integrate it

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:45):

It would be great to enable registering URL handlers here

view this post on Zulip Jan Range (Apr 17 2024 at 06:46):

You mean in a way to transfer from a remote store to dataverse?

view this post on Zulip Jan Range (Apr 17 2024 at 06:46):

Would be great if there is a way to not have to download the intermediate file then

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:46):

So people could store something using git-annex and receive the file when they execute the python script

view this post on Zulip Jan Range (Apr 17 2024 at 06:47):

Do you have an example for this? Havent used git-annex yet

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:47):

That way they could very naturally interact with the files and they would be fetched as necessary

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:47):

Ha storing the git annex thing was a wild idea at distribits

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:47):

For now there might be examples using HTTP and Globus links

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 06:48):

Let me get to work then I'll try to cook some better example. Typing this on my mobile is hard...

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:15):

Alright. From https://guides.dataverse.org/en/latest/api/native-api.html#add-remote-file-api we know that files registered as remote files contain lots of information about the file. The most important bit is probably the storage identifier.

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:16):

It will contain a URL that has been configured as a valid base url in the store plus a path within that location

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:16):

The filesystem will be presented with this information when downloading the file metadata

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:17):

So it would know about the files and folder structures

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:17):

But it could not download the file from the Dataverse instance

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:21):

Which means that the filesystem would need to understand how to resolve the URLs into a file

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:22):

The example JSON to register the remote file has the example storage ID trsa://themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:23):

Users of the filesystem would need some means to register a handler that knows how to deal with protocol "trsa://" and the rest of the URL

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:24):

In case of Datalad, the idea is to store "git-annex://" URLs that encode a git-annex remote file reference as a URL.

view this post on Zulip Oliver Bertuch (Apr 17 2024 at 07:24):

Using a handler to access the git-annex URL and download the file would be great

view this post on Zulip Jan Range (Apr 17 2024 at 11:30):

Okay, got it. Sounds great! How can I set up my local instance to test it? Can I add any remote store I want? Tried it using the docs, but I have not gotten it to work.

view this post on Zulip Jan Range (Apr 17 2024 at 11:58):

I've added these JVM args:

        -Ddataverse.files.trsa.type=remote
        -Ddataverse.files.trsa.label=SomeRemoteStorage
        -Ddataverse.files.trsa.base-url=trsa://
        -Ddataverse.files.trsa.base-store=trsa

The script:

export API_TOKEN=7a51588f-8422-4868-bc66-c791016e4a30
export SERVER_URL=http://localhost:8080
export PERSISTENT_ID=doi:10.5072/FK2/ZTXNOV
export JSON_DATA=$(<body.json)

curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_ID" -F "jsonData=$JSON_DATA"

The request body:

{
  "description": "A remote image.",
  "storageIdentifier": "trsa://hello/testlogo.png",
  "checksumType": "MD5",
  "md5Hash": "509ef88afa907eaf2c17c1c8d8fde77e",
  "label": "testlogo.png",
  "fileName": "testlogo.png",
  "mimeType": "image/png"
}

view this post on Zulip Jan Range (Apr 17 2024 at 11:58):

Pretty sure I am doing sth wrong :grinning:

view this post on Zulip Jan Range (Apr 17 2024 at 12:57):

I receive the following message every time I try to add a remote file:

{"status":"ERROR","message":"Dataset store configuration does not allow provided storageIdentifier."}

The storage identifier follows the base URL scheme but does not match.

view this post on Zulip Jan Range (Apr 17 2024 at 12:58):

Tried setting the collection storage to the remote store, but without an effect

view this post on Zulip Philip Durbin 🚀 (Apr 17 2024 at 13:14):

I'm not sure if the helps but we have a test on the Java side (that isn't exercised regularly): https://github.com/IQSS/dataverse/blob/v6.2/src/test/java/edu/harvard/iq/dataverse/api/RemoteStoreIT.java

view this post on Zulip Jan Range (Apr 17 2024 at 13:16):

Awesome! Thanks for the hint. Could it be that I am missing this line?

-Ddataverse.files.trsa.base-store=file

I thought it had a default, but I will give it a try!

view this post on Zulip Philip Durbin 🚀 (Apr 17 2024 at 13:20):

Could be. I'm pretty sure you need a base store for thumbnails, etc.

view this post on Zulip Jan Range (Apr 18 2024 at 12:37):

Stupid idea, but how would it be if we define this filesystem instance-wide? Instead of connecting to a single dataset, you could access all the datasets. I would think of something like this:

from dataversefs import DataverseFS


fs = DataverseFS(base_url="https://demo.dataverse.org")
fs.listdir("doi:10.70122/FK2/TDI8JO://some/dir")

file = open("doi:10.70122/FK2/TDI8JO://some/dir/myfile.txt", "r")

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:38):

You don't even need the ://

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:39):

It's not a resolvable DOI UIR but who cares

view this post on Zulip Jan Range (Apr 18 2024 at 12:39):

Maybe there is a better way, but I think it would be cool to grab from any dataset you want.

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:40):

Maybe for the sake of validity go for doi:10.70122/FK2/TDI8JO?file=/path/to/file

view this post on Zulip Jan Range (Apr 18 2024 at 12:40):

Thats nice!

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:40):

It becomes a valid URI this way but is simple to parse because of the "separator string"

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:41):

If you don't want a parameter, you could use anchors

view this post on Zulip Jan Range (Apr 18 2024 at 12:41):

True that, havent thought about this. The idea just popped in my head :grinning:

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:41):

doi:10.70122/FK2/TDI8JO#...

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:41):

Absolutely! Makes a lot of sense.

view this post on Zulip Jan Range (Apr 18 2024 at 12:42):

I am a fan of the hash - Looks smooth

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:42):

Maybe support both. The names are much shorter when not always needing the full qualified one

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:42):

There might be character limitations for anchors!

view this post on Zulip Jan Range (Apr 18 2024 at 12:43):

I will hack sth. New material for the weekend :grinning_face_with_smiling_eyes:

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:43):

Sorry the correct term is "fragment"

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:43):

Which even makes more sense here - you want a fragment of a dataset

view this post on Zulip Jan Range (Apr 18 2024 at 12:43):

Dataset Fragments sounds nice :grinning:

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:45):

The characters slash ("/") and question mark ("?") are allowed to represent data within the fragment identifier. Beware that some older, erroneous implementations may not handle this data correctly when it is used as the base URI for relative references (Section 5.1).

https://www.rfc-editor.org/rfc/rfc3986#page-24

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:48):

You could even support having a query part

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:48):

doi:10.70122/FK2/TDI8JO?direct-download=false#path/to/file.ext

view this post on Zulip Oliver Bertuch (Apr 18 2024 at 12:52):

Wouldn't it be nice if we had a Dataverse API endpoint like this?

view this post on Zulip Oliver Bertuch (May 20 2025 at 14:57):

@Jan Range have you ever tried benchmarking the pyfilesystem? Wondering what kind of performance one could get using the underlying S3 and potentially caching the files locally to make sure multiple usage of a file doesn't reload over and over.

view this post on Zulip Oliver Bertuch (May 20 2025 at 14:58):

Had a discussion with a few RSEs today that do Electron Microscopy. Their sensors stream a solid 2GiB/s and I was wondering if once they put that kind of data into S3 (maybe Dataverse in the mix) what kind of speed they could achieve reading the data back again.

view this post on Zulip Oliver Bertuch (May 20 2025 at 14:59):

Currently they heavily rely on filesystems and direct IO to avoid page table madness...

view this post on Zulip Oliver Bertuch (May 20 2025 at 15:00):

But they also want to expose data to analysis stations using SMB/NFS, so going through a network stack. Wondering if the S3 direct download stuff with PyFilesystem2 would be able to compete.

view this post on Zulip Jan Range (May 21 2025 at 11:54):

@Oliver Bertuch I have not benchmarked it yet, but I can test it in the upcoming weeks.

I’ve checked the source code, and the fs-s3fs package that provides S3 support in pyFileSystem uses boto3, the official AWS SDK. According to the implementation, it automatically utilizes the Range header and parallel downloads, which should make it noticeably faster than a standard sequential download.

However, when I tried using both boto3 and pyFileSystem’s S3 backend, I ran into an issue: these libraries require AWS credentials, even for publicly accessible files. I attempted to extract credentials or any required information from the S3 redirect URL, but that wasn’t sufficient to get these libraries working. Do you have any ideas on how we could make this work using only the pre-signed URL?

That said, I believe we could still achieve better download speeds by leveraging Range headers and parallelizing the download process ourselves. For reference, I ran a quick benchmark comparing the current, non-parallelized PyDataverse download of a 1.5 GB file to a Rust implementation that uses Range requests for parallel downloading.

Library Size Time Taken Gb/s
PyDataverse 1.5gb ~90s ~0.01
DVCLI (Rust) 1.5gb ~30s ~0.05

The Rust implementation follows the S3 redirect and uses the Range header to enable partial downloads. It splits the file into 5 MB chunks and distributes the workload across 64 workers, which turned out to be the optimal configuration in my tests. To ensure realistic results, I used a file from production as the test case.

https://darus.uni-stuttgart.de/file.xhtml?persistentId=doi:10.18419/DARUS-444/1&version=1.0

I am not sure if we can get any faster, since the AWS libraries practically do the same thing. Do you have any ideas how we could match the insane 2gb/s :smile:

view this post on Zulip Jan Range (May 21 2025 at 11:56):

It could be that my WiFi is not the best to benchmark. I could imagine having a direct cable-based connection would perform even better.


Last updated: Nov 01 2025 at 14:11 UTC