Skip to content

Overview

DataverseFS exposes the files of a single Dataverse dataset version as an fsspec filesystem. Once you have an instance, you can browse, read, write, and delete files using the standard fsspec interface (ls, info, open, cat, rm, find, glob, …) plus a few Dataverse-specific helpers for tabular data.

It is the same machinery that powers the high-level Dataset file operations — dataset.open(...), dataset.files, dataset.upload_file(...) all delegate to a DataverseFS under the hood — but you can also use it directly whenever you want a filesystem-style view of a dataset.

Treating a dataset as a filesystem unlocks two things that are otherwise awkward with a plain REST client:

  • Streaming, not buffering. Reads are served by HTTP Range requests, so seeking or reading a slice never downloads the whole file. Writes are streamed to Dataverse in bounded chunks, so uploading a large file never holds it all in memory.
  • The fsspec ecosystem. Importing pyDataverse registers a dataverse:// URL protocol with fsspec. Any fsspec-aware library — pandas, Dask, Polars, PyArrow, Zarr — can then read a dataset file directly from a URL, with no glue code. This is what makes pd.read_csv("dataverse://...") work.
from pyDataverse.filesystem import DataverseFS
fs = DataverseFS(
base_url="https://demo.dataverse.org",
identifier="doi:10.5072/FK2/ABCDEF",
)
fs.ls("/") # list files
with fs.open("data/notes.txt", "r") as f: # stream a file
print(f.read())