Stream: python

Topic: advanced usage of pyDataverse


view this post on Zulip Hansen Zhang (Sep 04 2024 at 18:30):

I have question about PyDataverse. Is the advanced usage of it mean to add datasets that already have DOIs? Can it be used to batch upload datasets without existing DOI?

view this post on Zulip Philip Durbin 🚀 (Sep 04 2024 at 18:34):

I assume you're looking at https://pydataverse.readthedocs.io/en/latest/user/advanced-usage.html

view this post on Zulip Philip Durbin 🚀 (Sep 04 2024 at 18:35):

"This tutorial will show you how to mass-import metadata from pyDataverse’s own CSV format (see CSV templates), create pyDataverse objects from it (Datasets and Datafiles) and upload the data and metadata through the API."

view this post on Zulip Philip Durbin 🚀 (Sep 04 2024 at 18:37):

Hmm, judging from https://github.com/gdcc/pyDataverse/blob/v0.3.3/pyDataverse/templates/datasets.csv it looks like it wants an existing DOI but I've never used this before.

view this post on Zulip Philip Durbin 🚀 (Sep 04 2024 at 18:38):

@Hansen Zhang I guess you can try it and see. :grinning:

view this post on Zulip Jan Range (Sep 11 2024 at 12:31):

@Hansen Zhang I haven’t used it myself, but based on the examples, it appears to be a flattened version of the dataset metadata or datafile. The DOI is optional, but if provided, should modify the dataset’s metadata. As shown in the documentation, the list of dictionaries (obtained from the csv using read_csv_as_dicts) is used to generate new datasets:

image.png

Hence, once transformed into the dict state, you can use each row/dataset similar to how you would create/edit a normal dataset (see here).

view this post on Zulip Jan Range (Sep 11 2024 at 12:42):

Personally, I'm not sure the CSV approach is the best choice, given that we're working with nested structures. @Philip Durbin 🐉, perhaps using a JSONL format would be more practical for parsing and validation. I'll add this to the agenda for the next pyDataverse WG meeting.


Last updated: Nov 01 2025 at 14:11 UTC