Stream: python

Topic: tabular data addition


view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 13:36):

I have a question about https://github.com/gdcc/easyDataverse/pull/14

Ultimately, a dataset receives a .tab file, right?

But more is going on? At first I thought the code was going to call https://guides.dataverse.org/en/6.1/api/native-api.html#editing-variable-level-metadata but apparently not.

I'm sure it's useful, I just don't quite get it. :sweat_smile:

view this post on Zulip Jan Range (Mar 15 2024 at 16:48):

When uploading tab files to Dataverse, the code writes the file to a temporary directory and uploads it using dvuploader. Also, upon adding a DataFrame to a Dataset object, stats and column metadata (name, dtype) are extracted to the TabData object for users to retrieve. Variable metadata editing via DDI has not been implemented yet.

Did that help or did I misunderstood something? Would it be better to store the variable metadata in DDI already within the object? Or better as an exporter?

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:52):

This is the part I'm struggling with:

"Also, upon adding a DataFrame to a Dataset object, stats and column metadata (name, dtype) are extracted to the TabData object for users to retrieve."

Is this persisted in Dataverse? Or is it all happening client-side?

view this post on Zulip Jan Range (Mar 15 2024 at 16:53):

All is happening client-side at the moment.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:54):

Ok. I'm just thinking if you want to persist the types for variables (date, string, integer, whatever) in Dataverse, you can use that API above.

view this post on Zulip Jan Range (Mar 15 2024 at 16:55):

But since there is an endpoint for adding the metadata, I am happy to add that. I haven't been using the tabular features as much and just learned about the edit endpoint.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:56):

I think Dataverse will try to guess types for variables. Definitely for rich formats like Stata. Not sure about CSV.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:57):

So when you say "for users to retrieve" you mean users of EasyDataverse retrieving locally from objects in EasyDataverse, right? Not retrieval over the wire from a Dataverse server.

view this post on Zulip Jan Range (Mar 15 2024 at 16:58):

Yes, it is using Pandas to retrieve types and statistics

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:58):

Cool, cool. Sounds very useful.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:59):

I'm just thinking if we can feed information about types or whatever (descriptions of columns) back into Dataverse, others can benefit.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 16:59):

Of course, only the author can push this info back into Dataverse.

view this post on Zulip Jan Range (Mar 15 2024 at 16:59):

I'll look into pushing the pandas data to Dataverse directly. When there is metadata already present on Dataverse, it will override the pandas stuff upon fetching/uploading

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 15 2024 at 17:00):

Sounds good. And you should definitely play with the Data Curation Tool and the new Croissant Editor. Please see #dev > Croissant Editor

view this post on Zulip Jan Range (Mar 15 2024 at 17:00):

I have checked mlcroissant out too. It looks quite simple, but I have a couple of questions that I hope on wednesday will be resolved


Last updated: Nov 01 2025 at 14:11 UTC