I have a question about https://github.com/gdcc/easyDataverse/pull/14
Ultimately, a dataset receives a .tab file, right?
But more is going on? At first I thought the code was going to call https://guides.dataverse.org/en/6.1/api/native-api.html#editing-variable-level-metadata but apparently not.
I'm sure it's useful, I just don't quite get it. :sweat_smile:
When uploading tab files to Dataverse, the code writes the file to a temporary directory and uploads it using dvuploader. Also, upon adding a DataFrame to a Dataset object, stats and column metadata (name, dtype) are extracted to the TabData object for users to retrieve. Variable metadata editing via DDI has not been implemented yet.
Did that help or did I misunderstood something? Would it be better to store the variable metadata in DDI already within the object? Or better as an exporter?
This is the part I'm struggling with:
"Also, upon adding a DataFrame to a Dataset object, stats and column metadata (name, dtype) are extracted to the TabData object for users to retrieve."
Is this persisted in Dataverse? Or is it all happening client-side?
All is happening client-side at the moment.
Ok. I'm just thinking if you want to persist the types for variables (date, string, integer, whatever) in Dataverse, you can use that API above.
But since there is an endpoint for adding the metadata, I am happy to add that. I haven't been using the tabular features as much and just learned about the edit endpoint.
I think Dataverse will try to guess types for variables. Definitely for rich formats like Stata. Not sure about CSV.
So when you say "for users to retrieve" you mean users of EasyDataverse retrieving locally from objects in EasyDataverse, right? Not retrieval over the wire from a Dataverse server.
Yes, it is using Pandas to retrieve types and statistics
Cool, cool. Sounds very useful.
I'm just thinking if we can feed information about types or whatever (descriptions of columns) back into Dataverse, others can benefit.
Of course, only the author can push this info back into Dataverse.
I'll look into pushing the pandas data to Dataverse directly. When there is metadata already present on Dataverse, it will override the pandas stuff upon fetching/uploading
Sounds good. And you should definitely play with the Data Curation Tool and the new Croissant Editor. Please see #dev > Croissant Editor
I have checked mlcroissant out too. It looks quite simple, but I have a couple of questions that I hope on wednesday will be resolved
Last updated: Nov 01 2025 at 14:11 UTC