Over at #python > creating datasets from an Excel file we've been talking about a variety of things including a centralized place to put Python scripts from the community. What do you think?
The consensus so far seems to be a repo under https://github.com/gdcc so maybe the next step is to pick a name (one of the hardest things in computer science :sweat_smile: ).
+1 for "dataverse-recipes"
Done! https://github.com/gdcc/dataverse-recipes
Any thoughts on how to organize it? Should we create a "python" directory?
Or @Jan Range did you want to create docs like you did here?
Docs - https://jr-1991.github.io/PythonProgrammingBio24/
Repo - https://github.com/JR-1991/PythonProgrammingBio24
Yes, I believe it makes sense to categorize languages roughly. Should we also create subdirectories for libraries?
@Philip Durbin โ๏ธ once we have some content, I would re-use the docs action of my repo to create an mkdocs page.
@Jan Range I just made you an admin. Please feel free to push whatever!
Awesome, thanks!
Do you want me to go ahead and push the Excel script? The one from #python > creating datasets from an Excel file ? Into a "python" directory?
Sounds good! Shall we default to Jupyter notebooks? I think the extra docs and compatibility with mkdocs come in handy
Oh. Mine is just a boring old script. Can I add it anyway? :sweat_smile:
Of course :smile:
I made a PR: https://github.com/gdcc/dataverse-recipes/pull/1
@Jan Range what do you think? Should I merge it? Do you want to?
@Philip Durbin โ๏ธ I would like to transfer it to a Jupyter notebook and add some more documentation to it if you don't mind. This would be a great test to look into the documentation functionalities and I could directly implement the MkDocs workflow
In terms of structure, I have a question:
Do we want to structure by library or concepts?
dataset containing notebooks and scripts for handling datasets for each librarypydataverse, easyDataverse and dvcliI am leaning more towards the former, because I think structuring by library requires prior knowledge of each. Concepts my be "quicker" to access and the decision ultimately leads to what a user prefers in terms of code.
Well, we'll structure by language first, right? Then I was thinking by concept. I guess you're considering a dataset to be a concept, which makes sense.
The reason I don't think we should structure by library is that easy library (pyDataverse, etc.) should have its own docs, I would think.
I have a dumb question about notebooks, though. Can I easily extract my script from a notebook? I'd rather not force people to use notebooks if they'd rather just run a script.
Yes, language as the primary structure makes sense.
The reason I don't think we should structure by library is that easy library (pyDataverse, etc.) should have its own docs, I would think.
Would you rather leave the scripts/notebooks as they are in the "concept" directory and leave it to the user to decide which flavour (pyDataverse, easyDataverse) is preferred?
I have a dumb question about notebooks, though. Can I easily extract my script from a notebook? I'd rather not force people to use notebooks if they'd rather just run a script.
Yes, you can convert a Jupyter notebook to a script. We could even employ a CI that does this. I was thinking that depending on the size of the task, a single cell should be sufficient. Hence, one could simply copy the cell and proceed. This way we could put multiple operations into one notebook.
For instance, a single notebook describes how you can handle a dataset from creation, editing and file uploads. You just pick what suits you and re-use the code snippet.
Interesting. Sure, that copy/paste use workflow sounds nice.
I'm tempted to bring up the new https://github.com/gdcc/dataverse-recipes repo during the community call. :thinking: If not today, hopefully next month. People might want to help.
Sounds great! Apologies. I promised to review the PR but have not done so until now. I will do it this week!
I didn't mention the recipes during the call. It was pretty busy with @Slava Tykhonov talking about AI. :big_smile:
First PR merged! https://github.com/gdcc/dataverse-recipes/pull/4 :tada:
Time for a merge party :partying_face:
Ha. Definitely. :tada:
Should I mention the repo on the March 4th community call?
Sounds great!
Should we setup a small readme and contribution guide beforehand?
Sure! PRs welcome! Or just push if you want. :smiling_imp:
Will sketch something until 4th!
Great. Added to the agenda.
I started a new topic here: #community > recipes
Last updated: Nov 01 2025 at 14:11 UTC