pandas & the fsspec ecosystem
Importing pyDataverse registers a dataverse:// URL protocol with fsspec (a
packaging entry point also registers it on install). From then on, any
fsspec-aware library can open a dataset file directly from a URL — no filesystem
object to construct, no download step to manage.
The dataverse:// URL
Section titled “The dataverse:// URL”A URL carries the dataset’s connection details in its query string; the path component is the file’s path inside the dataset:
dataverse://<host>/<file/path>?persistentId=doi:...&version=:latestdataverse://<host>/<file/path>?id=12345| Query parameter | Purpose |
|---|---|
persistentId | Dataset DOI (e.g. doi:10.5072/FK2/ABCDEF). |
id | Dataset numeric database ID (alternative to persistentId). |
version | Dataset version (:latest, :draft, 1.0, …). Optional. |
scheme | Transport for the host, https (default) or http. Optional. |
The API token is not read from the URL. Pass it — when needed — through the
reading library’s storage_options:
storage_options = {"api_token": "your-token"}Reading with pandas
Section titled “Reading with pandas”For a tabular file, point read_csv at the URL. Dataverse serves ingested files
as tab-delimited, so use sep="\t":
import pandas as pdimport pyDataverse # registers the dataverse:// protocol
url = ( "dataverse://demo.dataverse.org/data/table.tab" "?persistentId=doi:10.5072/FK2/ABCDEF")
df = pd.read_csv(url, sep="\t")
# Restricted dataset? Add the token:# df = pd.read_csv(url, sep="\t", storage_options={"api_token": "your-token"})Other fsspec-aware libraries
Section titled “Other fsspec-aware libraries”Any tool that delegates path handling to fsspec accepts dataverse:// URLs the
same way:
# Dask — lazy, parallel readsimport dask.dataframe as ddddf = dd.read_csv(url, sep="\t", storage_options={"api_token": "your-token"})
# Polars (via fsspec)import fsspec, polars as plwith fsspec.open(url, "rb", api_token="your-token") as f: df = pl.read_csv(f, separator="\t")
# Plain fsspec — open any file, tabular or notwith fsspec.open(url, "rb") as f: raw = f.read()From the command line
Section titled “From the command line”fsspec has no standalone CLI, but because the protocol is registered, any
fsspec-aware command-line tool — and a short python -c one-liner — can read a
dataset by URL. The import pyDataverse is what triggers registration:
# Print the first lines of a filepython -c "import pyDataverse, fsspec; \print(fsspec.open('dataverse://demo.dataverse.org/data/notes.txt?persistentId=doi:10.5072/FK2/ABCDEF', \'rt').open().read())"# List the files in a datasetpython -c "import pyDataverse, fsspec; \fs = fsspec.filesystem('dataverse', base_url='https://demo.dataverse.org', \identifier='doi:10.5072/FK2/ABCDEF'); \print('\n'.join(fs.ls('/', detail=False)))"Worked example: a public DaRUS dataset
Section titled “Worked example: a public DaRUS dataset”This reads a real, public tabular file from
DaRUS — no token required. The dataset is
doi:10.18419/DARUS-5539, a kinetic-modeling
study; results/summary.tab is a model-comparison table.
import pandas as pdimport pyDataverse # registers the dataverse:// protocol
url = ( "dataverse://darus.uni-stuttgart.de/results/summary.tab" "?persistentId=doi:10.18419/DARUS-5539")
df = pd.read_csv(url, sep="\t")print(df[["name", "n_parameters", "r2", "aic", "bic"]]) name n_parameters r2 aic bic0 model_04 12 0.997071 214.849030 260.0688781 model_07 10 0.998349 27.439997 65.1232072 model_06 9 0.996706 246.452515 280.3674013 model_08 9 0.998378 19.794846 53.709736The equivalent through a filesystem instance, letting pyDataverse pick the delimiter for you:
from pyDataverse.filesystem import DataverseFS
fs = DataverseFS( base_url="https://darus.uni-stuttgart.de", identifier="doi:10.18419/DARUS-5539",)df = fs.open_tabular("results/summary.tab", api_token=None)Writing back with pandas
Section titled “Writing back with pandas”With a token and edit permission, the protocol also works for writing — for
example, DataFrame.to_csv to a dataverse:// URL streams a new (or replacement)
file into the dataset:
df.to_csv( "dataverse://demo.dataverse.org/results/out.csv?persistentId=doi:10.5072/FK2/ABCDEF", index=False, storage_options={"api_token": "your-token"},)See Writing files for the details of how uploads stream and how to attach metadata.