I have question about PyDataverse. Is the advanced usage of it mean to add datasets that already have DOIs? Can it be used to batch upload datasets without existing DOI?
I assume you're looking at https://pydataverse.readthedocs.io/en/latest/user/advanced-usage.html
"This tutorial will show you how to mass-import metadata from pyDataverse’s own CSV format (see CSV templates), create pyDataverse objects from it (Datasets and Datafiles) and upload the data and metadata through the API."
Hmm, judging from https://github.com/gdcc/pyDataverse/blob/v0.3.3/pyDataverse/templates/datasets.csv it looks like it wants an existing DOI but I've never used this before.
@Hansen Zhang I guess you can try it and see. :grinning:
@Hansen Zhang I haven’t used it myself, but based on the examples, it appears to be a flattened version of the dataset metadata or datafile. The DOI is optional, but if provided, should modify the dataset’s metadata. As shown in the documentation, the list of dictionaries (obtained from the csv using read_csv_as_dicts) is used to generate new datasets:
Hence, once transformed into the dict state, you can use each row/dataset similar to how you would create/edit a normal dataset (see here).
Personally, I'm not sure the CSV approach is the best choice, given that we're working with nested structures. @Philip Durbin 🐉, perhaps using a JSONL format would be more practical for parsing and validation. I'll add this to the agenda for the next pyDataverse WG meeting.
Last updated: Nov 01 2025 at 14:11 UTC