Overview
DataverseFS exposes the files of a single Dataverse dataset version as an
fsspec filesystem. Once you have an
instance, you can browse, read, write, and delete files using the standard
fsspec interface (ls, info, open, cat, rm, find, glob, …) plus a
few Dataverse-specific helpers for tabular data.
It is the same machinery that powers the high-level Dataset
file operations — dataset.open(...), dataset.files, dataset.upload_file(...)
all delegate to a DataverseFS under the hood — but you can also use it directly
whenever you want a filesystem-style view of a dataset.
Why a filesystem?
Section titled “Why a filesystem?”Treating a dataset as a filesystem unlocks two things that are otherwise awkward with a plain REST client:
- Streaming, not buffering. Reads are served by HTTP Range requests, so seeking or reading a slice never downloads the whole file. Writes are streamed to Dataverse in bounded chunks, so uploading a large file never holds it all in memory.
- The fsspec ecosystem. Importing
pyDataverseregisters adataverse://URL protocol with fsspec. Any fsspec-aware library — pandas, Dask, Polars, PyArrow, Zarr — can then read a dataset file directly from a URL, with no glue code. This is what makespd.read_csv("dataverse://...")work.
Quickstart
Section titled “Quickstart”from pyDataverse.filesystem import DataverseFS
fs = DataverseFS( base_url="https://demo.dataverse.org", identifier="doi:10.5072/FK2/ABCDEF",)
fs.ls("/") # list fileswith fs.open("data/notes.txt", "r") as f: # stream a file print(f.read())In this section
Section titled “In this section” Connecting Create a DataverseFS — directly, from a URL, or from a high-level Dataset.
Browsing & metadata List, glob, and inspect files; read rich Dataverse metadata.
Reading files Stream file content in text or binary, including byte-range reads.
Writing files Create, replace, and delete files; attach metadata on upload.
Tabular data Load ingested tabular files straight into pandas DataFrames.
pandas & the fsspec ecosystem Read datasets by URL from pandas, Dask, Polars, and the command line.