Stream: python

Topic: a place for Python scripts


view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:43):

Over at #python > creating datasets from an Excel file we've been talking about a variety of things including a centralized place to put Python scripts from the community. What do you think?

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:44):

The consensus so far seems to be a repo under https://github.com/gdcc so maybe the next step is to pick a name (one of the hardest things in computer science :sweat_smile: ).

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:45):

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:45):

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:45):

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:46):

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:47):

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 19:47):

view this post on Zulip Jan Range (Jan 16 2025 at 20:52):

+1 for "dataverse-recipes"

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 20:55):

Done! https://github.com/gdcc/dataverse-recipes

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 20:55):

Any thoughts on how to organize it? Should we create a "python" directory?

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 20:55):

Or @Jan Range did you want to create docs like you did here?

Docs - https://jr-1991.github.io/PythonProgrammingBio24/
Repo - https://github.com/JR-1991/PythonProgrammingBio24

view this post on Zulip Jan Range (Jan 16 2025 at 20:56):

Yes, I believe it makes sense to categorize languages roughly. Should we also create subdirectories for libraries?

view this post on Zulip Jan Range (Jan 16 2025 at 20:57):

@Philip Durbin โ˜ƒ๏ธ once we have some content, I would re-use the docs action of my repo to create an mkdocs page.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 20:57):

@Jan Range I just made you an admin. Please feel free to push whatever!

view this post on Zulip Jan Range (Jan 16 2025 at 20:57):

Awesome, thanks!

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 20:58):

Do you want me to go ahead and push the Excel script? The one from #python > creating datasets from an Excel file ? Into a "python" directory?

view this post on Zulip Jan Range (Jan 16 2025 at 20:59):

Sounds good! Shall we default to Jupyter notebooks? I think the extra docs and compatibility with mkdocs come in handy

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 20:59):

Oh. Mine is just a boring old script. Can I add it anyway? :sweat_smile:

view this post on Zulip Jan Range (Jan 16 2025 at 21:00):

Of course :smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 16 2025 at 21:02):

I made a PR: https://github.com/gdcc/dataverse-recipes/pull/1

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 21 2025 at 11:58):

@Jan Range what do you think? Should I merge it? Do you want to?

view this post on Zulip Jan Range (Jan 21 2025 at 14:32):

@Philip Durbin โ˜ƒ๏ธ I would like to transfer it to a Jupyter notebook and add some more documentation to it if you don't mind. This would be a great test to look into the documentation functionalities and I could directly implement the MkDocs workflow

view this post on Zulip Jan Range (Jan 21 2025 at 14:36):

In terms of structure, I have a question:

Do we want to structure by library or concepts?

I am leaning more towards the former, because I think structuring by library requires prior knowledge of each. Concepts my be "quicker" to access and the decision ultimately leads to what a user prefers in terms of code.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 21 2025 at 14:47):

Well, we'll structure by language first, right? Then I was thinking by concept. I guess you're considering a dataset to be a concept, which makes sense.

The reason I don't think we should structure by library is that easy library (pyDataverse, etc.) should have its own docs, I would think.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 21 2025 at 14:47):

I have a dumb question about notebooks, though. Can I easily extract my script from a notebook? I'd rather not force people to use notebooks if they'd rather just run a script.

view this post on Zulip Jan Range (Jan 21 2025 at 14:55):

Yes, language as the primary structure makes sense.

The reason I don't think we should structure by library is that easy library (pyDataverse, etc.) should have its own docs, I would think.

Would you rather leave the scripts/notebooks as they are in the "concept" directory and leave it to the user to decide which flavour (pyDataverse, easyDataverse) is preferred?

I have a dumb question about notebooks, though. Can I easily extract my script from a notebook? I'd rather not force people to use notebooks if they'd rather just run a script.

Yes, you can convert a Jupyter notebook to a script. We could even employ a CI that does this. I was thinking that depending on the size of the task, a single cell should be sufficient. Hence, one could simply copy the cell and proceed. This way we could put multiple operations into one notebook.

For instance, a single notebook describes how you can handle a dataset from creation, editing and file uploads. You just pick what suits you and re-use the code snippet.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jan 21 2025 at 14:58):

Interesting. Sure, that copy/paste use workflow sounds nice.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 04 2025 at 14:35):

I'm tempted to bring up the new https://github.com/gdcc/dataverse-recipes repo during the community call. :thinking: If not today, hopefully next month. People might want to help.

view this post on Zulip Jan Range (Feb 04 2025 at 15:14):

Sounds great! Apologies. I promised to review the PR but have not done so until now. I will do it this week!

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 04 2025 at 16:22):

I didn't mention the recipes during the call. It was pretty busy with @Slava Tykhonov talking about AI. :big_smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2025 at 15:44):

First PR merged! https://github.com/gdcc/dataverse-recipes/pull/4 :tada:

view this post on Zulip Jan Range (Feb 28 2025 at 15:44):

Time for a merge party :partying_face:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2025 at 15:45):

Ha. Definitely. :tada:

Should I mention the repo on the March 4th community call?

view this post on Zulip Jan Range (Feb 28 2025 at 15:45):

Sounds great!

view this post on Zulip Jan Range (Feb 28 2025 at 15:45):

Should we setup a small readme and contribution guide beforehand?

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2025 at 15:46):

Sure! PRs welcome! Or just push if you want. :smiling_imp:

view this post on Zulip Jan Range (Feb 28 2025 at 15:47):

Will sketch something until 4th!

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2025 at 15:47):

Great. Added to the agenda.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 03 2025 at 19:26):

I started a new topic here: #community > recipes


Last updated: Nov 01 2025 at 14:11 UTC