tabular data addition · python · Zulip Chat Archive

Stream: python

Topic: tabular data addition

Philip Durbin 🚀 (Mar 15 2024 at 13:36):

I have a question about https://github.com/gdcc/easyDataverse/pull/14

Ultimately, a dataset receives a .tab file, right?

But more is going on? At first I thought the code was going to call https://guides.dataverse.org/en/6.1/api/native-api.html#editing-variable-level-metadata but apparently not.

I'm sure it's useful, I just don't quite get it. :sweat_smile:

Jan Range (Mar 15 2024 at 16:48):

When uploading tab files to Dataverse, the code writes the file to a temporary directory and uploads it using dvuploader. Also, upon adding a DataFrame to a Dataset object, stats and column metadata (name, dtype) are extracted to the TabData object for users to retrieve. Variable metadata editing via DDI has not been implemented yet.

Did that help or did I misunderstood something? Would it be better to store the variable metadata in DDI already within the object? Or better as an exporter?

Philip Durbin 🚀 (Mar 15 2024 at 16:52):

This is the part I'm struggling with:

"Also, upon adding a DataFrame to a Dataset object, stats and column metadata (name, dtype) are extracted to the TabData object for users to retrieve."

Is this persisted in Dataverse? Or is it all happening client-side?

Jan Range (Mar 15 2024 at 16:53):

All is happening client-side at the moment.

Philip Durbin 🚀 (Mar 15 2024 at 16:54):

Ok. I'm just thinking if you want to persist the types for variables (date, string, integer, whatever) in Dataverse, you can use that API above.

Jan Range (Mar 15 2024 at 16:55):

But since there is an endpoint for adding the metadata, I am happy to add that. I haven't been using the tabular features as much and just learned about the edit endpoint.

Philip Durbin 🚀 (Mar 15 2024 at 16:56):

I think Dataverse will try to guess types for variables. Definitely for rich formats like Stata. Not sure about CSV.

Philip Durbin 🚀 (Mar 15 2024 at 16:57):

So when you say "for users to retrieve" you mean users of EasyDataverse retrieving locally from objects in EasyDataverse, right? Not retrieval over the wire from a Dataverse server.

Jan Range (Mar 15 2024 at 16:58):

Yes, it is using Pandas to retrieve types and statistics

Philip Durbin 🚀 (Mar 15 2024 at 16:58):

Cool, cool. Sounds very useful.

Philip Durbin 🚀 (Mar 15 2024 at 16:59):

I'm just thinking if we can feed information about types or whatever (descriptions of columns) back into Dataverse, others can benefit.

Philip Durbin 🚀 (Mar 15 2024 at 16:59):

Of course, only the author can push this info back into Dataverse.

Jan Range (Mar 15 2024 at 16:59):

I'll look into pushing the pandas data to Dataverse directly. When there is metadata already present on Dataverse, it will override the pandas stuff upon fetching/uploading

Philip Durbin 🚀 (Mar 15 2024 at 17:00):

Sounds good. And you should definitely play with the Data Curation Tool and the new Croissant Editor. Please see #dev > Croissant Editor

Jan Range (Mar 15 2024 at 17:00):

I have checked mlcroissant out too. It looks quite simple, but I have a couple of questions that I hope on wednesday will be resolved

Last updated: Jan 09 2026 at 14:18 UTC