Given an Excel file with columns like "dataset title", "description", etc., what's the recommended way to create datasets using Python? pyDataverse? EasyDataverse? Something else?
So great you asked this today! I was just talking with the manager of a collection about this, but for editing the metadata of multiple datasets. I pointed them to pyDataverse documentation and the advanced usage page, which mentions using their csv templates. Seems like you can create and edit dataset metadata that way. I've never got it to work, but always figured I didn't know enough about Python :grimacing:
In practice I'm using EasyDataverse. I'll push my script to GitHub soon.
Wouldn't it be interesting to have a (community) repository with examples of concrete cases using python or easyDataverse? For people who aren't necessarily developers but can modify scripts. There could also be an entry somewhere in the documentation, in https://guides.dataverse.org/en/latest/admin/reporting-tools-and-queries.html#reporting-tools-and-common-queries for example.
Yes, absolutely. Great idea. I'm not sure where these scripts should go.
We could host a repository that includes Jupyter notebooks with examples. The upside is that these can be used pretty nicely within an mkdocs documentation. Did this for my latest Python course (Doc Page)
@Jan Range we forgot to talk about this today! Where to put scripts, I mean. Next time. :big_smile: (#python > meetings)
I'm still not quite sure where to put this script but I'll go ahead and attach it here so people can see it: create-datasets-from-excel.py
This is a slightly redacted version but quite similar to the one we plan to use.
Note that we plan to fix the license issue mentioned in the script. The discussion is happening here: #python > setting a license with EasyDataverse
We could open a general repository. Maybe something along the lines of "Dataverse Recipes" - Could then be used for Python and other languages.
Back in the day, Perl had a cookbook: https://en.wikipedia.org/wiki/Perl_Cookbook :cook:
It looks like https://github.com/ubc-library-rc/dataverse_utils is still being maintained. We link to it from https://guides.dataverse.org/en/6.5/api/client-libraries.html#python
All commits by Paul Lesack, it looks like. I don't think he's on Zulip but here's his GitHub: https://github.com/plesubc
From the perspective of the person trying to get something with Python, it probably doesn't matter if there's a mix of examples that use pyDataverse, EasyDataverse, requests, the standard library, etc.
We had an intern suggest that we add even more Python snippets to the API Guide in https://github.com/IQSS/dataverse/issues/4255 but we resisted this. We have been trying to keep the API Guide agnostic when it comes to language. So we document everything with curl.
@Don Sizemore you also have https://github.com/uncch-rdmc/dataverse-toolbox
Whether it's that "dataverse_utils" repo or that "dataverse-toolbox" repo, I think it's the right approach, to have a separate repo, rather than trying to maintain these scripts as part of the API Guide.
I agree, a separate repo is a cleaner solution and easier in terms of maintenance. What do you think of the idea of having a documentation on top of it, like I did for the Python course?
Docs - https://jr-1991.github.io/PythonProgrammingBio24/
Repo - https://github.com/JR-1991/PythonProgrammingBio24
Jupyter notebooks integrate quite well into MkDocs and could essentially function as doc and recipe at the same time. Plus, we could add a CI that tests these using the DV Action by running the notebooks. So, when something falls out of sync we know it.
Sure! Docs on top sounds great!
And no objection to CI, of course! Fancy! :unicorn:
I created a dedicated topic for this: #python > a place for Python scripts
Now we can keep talking about Excel if we like. :big_smile:
Sorry for the hijacking
Ha, no problem. :big_smile:
We have a place now! See #python > a place for Python scripts
@Jan Range thanks for https://github.com/gdcc/dataverse-recipes/pull/2 ! Merged! Can you (and others, if you like) please look again at https://github.com/gdcc/dataverse-recipes/pull/1 ? Thanks!
Heads up that I closed PR#1 in favor of this one: https://github.com/gdcc/dataverse-recipes/pull/4
I'm having a little trouble with Excel actually. When I use the original file Sonia gave me, I'm able to extract the hyperlinks.
However, if I edit the file in Excel to edit the hyperlinks (to make them anonymous), the script can no longer extract them.
@Jan Range or others, I'm happy to send you the original file if you want to play with it.
Happy to check this :smile:
Maybe I should add a comment saying that the hyperlink extraction doesn't work with the sample file. What do you think?
Is the data_example.xlsxcausing the issue?
Tested it outside the script and the extraction is working. Looking through your script now.
Oh! Interesting.
Please try it with my script. It doesn't work for me. If you can get it working, great!
Or we could even switch to your way.
Yes, will do. Wanted to sort out the openpyxl stuff beforehand
Got it!
pandas is probably more standard anyway :shrug:
The issue lies in the following:
if values[3].hyperlink:
access_link_url = values[3].hyperlink.display # <-- Extracts the shown text
It should work by using
if values[3].hyperlink:
access_link_url = values[3].hyperlink.target # <-- Extracts the URL
That's a good primer for my fresh local DV 6.5 now :stuck_out_tongue:
Giving it a test on it
Go, go, go!
Okay, it works now, but my guess is that it is just some oddity of openpyxl
you got it working? with that change above?
display vs target?
The boolean check if value[3].hyperlink: is what has caused the condition to never be true. Hence, there was always None. Being explicit by using value[3].hyperlink is not None works though.
But in my Jupyter notebook using the exact same way does not cause these issues :shaking_face:
But why would it work with the original Excel file and not the one I created from scratch?
I dont know. Strangely, I just fumbled around and reverted the explicit check and now it works too. I am confused :-D
Feels like rolling dice :grinning:
But yea, using target extracts the link as expected
Yeah, seems like a good fix. I tested with the original file and my sample. Pushed! Thanks! https://github.com/gdcc/dataverse-recipes/pull/4/commits/9363ce7d705e490abb5cae0d7a9f8d1b0e977fdf
Anything else to fix before we merge it?
Runs fine on my machine, nothing to fix from my side :smile:
Merged!
Last updated: Nov 01 2025 at 14:11 UTC