I've been reviewing this pull request: JSON Schema creator and validator #10109
I didn't write the code but I'm happy to discuss and answer any questions.
Preliminary docs are here: https://dataverse-guide--10109.org.readthedocs.build/en/10109/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection
In short, you can ask a specific collection for a JSON Schema for creating a dataset within it.
The idea is that some collections require additional fields.
And those fields are reflected in the JSON Schema.
I accidentally marked this as resolved, and now it is back to its original state :rolling_on_the_floor_laughing:. I am doing QA and trying to understand the Issue/PR I am sure I will ask around here for more information.
Sure! I started this thread initially because @Jan Range and I were talking about the JSON Schema stuff in the new pyDataverse revamp doc at #python > PyDataverse Re-Vamp but everyone is absolutely welcome!
That's awesome and solving a couple of issues for pyDataverse/EasyDataverse! Is there a way to test this functionality already?
ghcr.io/gdcc/dataverse:9464-schema-creator-validator I guess. :smile:
You want the code running on your laptop or a server?
Either way works fine for me :blush:
Do you have Java and Maven installed? If so, switch to the 9464-schema-creator-validator branch and run the quickstart: https://guides.dataverse.org/en/6.0/developers/dev-environment.html#quickstart
Smooth! Installation and endpoint working flawlessly :raised_hands:
@Jan Range fantastic! Is the JSON Schema more or less what you expect? Can you work with it?
It looks great so far! I will experiment with it and see how to plug it into EasyDataverse.
One missing thing I found is that controlled vocabularies are not included. Afaik the subject field is a controlled vocab, and maybe this could be added as an enum?
Happy to comment this to the PR if I am not missing something.
Yes, please comment on the PR. Thanks!
Thanks for that @Jan Range I will bring it up during standup today. :smile:
Hey, this PR also addresses, this old issue: Query Dataverse for mandatory metadata fields via API #6978
Is it possible to also add information about datatypes (int,float) to the schema?
If I am correct, the new schemes are meant for the payload input to endpoints that add/update
metadata (see Example 1). These do not contain the type information that is shipped with the basic metadatablock schemes.
Also afaik, the endpoints expect (an array of) strings for the value property, given it is a primitive. This is also part of the schema for a field (see example 2) and thus I expect the types cannot be added in the typical way. Hence, the payload has no type enforcing per-se and types are handled at Dataverse's side. Please correct me if I am wrong @Philip Durbin
Example 1
Example 2
Now that both schemes (basic and collection-sepecifc) are at hand, one could condense this into an intermediate schema that complies with the collection requirements and types expected by the metadata block. That's essentially what EasyDataverse is doing.
Here is an example of a JSON schema for Citation generated by EasyDataverse. @Johannes D would this be useful for you?
@Johannes D thanks for leaving a comment on the PR. That's perfect. I just replied there.
@Jan Range I think I'm confused by how you're saying old and new. To me there's only one schema but I'm sure I'm simply misunderstanding what you're saying. :sweat_smile:
Sorry I should have rephrased that - End of the week and my brain goes :dizzy:
By "old" I mean the basic block schema (such as this), and with "new" the novel collection JSON schema.
Oh, that makes much more sense. Thanks.
Kind of, the old format allows to specify a fieldType and I'd like to have that integrated into the new schema. One use case would be the SPA that uses something like this (https://rjsf-team.github.io/react-jsonschema-form/) to auto generate a form based on the schema. Here the field type is needed to create nice forms for numbers or dates...
Thanks for the explanation @Johannes D :blush: I am unsure if it is possible to include fieldType in the novel JSON schema. The schema validates whether a typeName is given in a payload to check compliance with the collection and utilizes a generic field schema.
To validate, I have checked the JSON schema for a collection that uses the astrophysics config and requires a float field. Unfortunately, the schema does not include any type checks. Only upon sending you will receive a validation error.
Collection schema
@Johannes D thanks! I just copied your comment about that React tool over at https://github.com/IQSS/dataverse-frontend/issues/231#issuecomment-1836116092 (we are actively building forms in React now)
However, given the collection and metadatablock schema, it is possible to create a new schema from both. @Philip Durbin I think receiving a schema such as this one would be awesome since it checks on types too and you can plug it easily into other plugins such as the Form Creator.
@Jan Range The lack of further distinction for the primitives values (int, float, boolean, date) is the problem and I hope we could fix that with the new schema representation. IMHO the new schema would be a perfect start for v2 of the API
Philip Durbin said:
Johannes D thanks! I just copied your comment about that React tool over at https://github.com/IQSS/dataverse-frontend/issues/231#issuecomment-1836116092 (we are actively building forms in React now)
Thanks, before one can use the tooling we need to translate the rather complex dataverse json representation into a more readable, JSON intuitive representation...basically into what Jan suggested. Otherwise the form represents the complex internal data structure, which is something the normal user should not see.
Actually thats one reason why we have a python facade between our react SPA and dataverse
right, v2 territory
well, maybe we could implement a facade in js-dataverse
I'd rather would like to see that the backend as other non js-clients would also benefit from it. I foresee two tasks in the backend: Creation of collection specific simple schemas that include all needed information and transformation of json in the specific schema to the DB model and vise-versa.
Oh, sure. I just meant that until we have a slick v2 API maybe js-dataverse could follow your lead and implement a similar facade.
@Jan Range python stuff ^^ :grinning:
@Johannes D are you actually using react-jsonschema-form or is it just a dream?
@Philip Durbin We wanted to use it but our designers and users requested a complex stepper for the input forms. The effort to adapt the lib for the use case was more complex than writing a form by hand, so we are not using it in this project. In a different project I used the library and was happy with the lib:)
Very interesting. Thanks.
How do folks feel about phase two of "JSON Schema for dataset"? Is #10543 what you expected? I just left a comment but maybe I'm confused: https://github.com/IQSS/dataverse/pull/10543#pullrequestreview-2119147899
@Oliver Bertuch you were part of the discussion early on and created this issue: bklog: Deliverable - As a system integrator, I would appreciate a JSON Schema for validating my dataset JSON before uploading via API - https://github.com/IQSS/dataverse-pm/issues/26
Any thoughts on my comment above?
@Jan Range this is the thread I just mentioned on the pyDataverse call.
@Jan Range @Oliver Bertuch have you had a chance to think about https://github.com/IQSS/dataverse/pull/10543 ?
It came up again in sprint planning yesterday.
I think we all agree that the PR should add value (more detailed error messages). However, it doesn't improve the JSON Schema we offer for datasets. Does that matter? Is that what you want?
To me, it looks great already! I have two small points that could be beneficial for integration into external tools/libs:
The message is good, but it is currently limited to a human-readable format. Adding a JSONPath or any other path that displays the exact location would allow other libraries to do more with the validation result. Furthermore, adding different error types could help. For instance, if a type validation fails, this could be indicated.
If I could imagine a response example it would look something like this:
Paths are not accurate
{
"is_valid": "yes",
"errors": [
{
"location": "citation/fields/0/value",
"error_type": "required",
"message": "The title field is required."
},
{
"location": "citation/fields/1/value",
"error_type": "invalid",
"message": "The description must be a string."
}
]
}
Is it possible to derive such a format from your validator? I know in Python and Rust it is possible, but I am a Java Noob :grinning:
Great, thanks. @Jan Range would you mind copying and pasting your comment into the PR or linking here?
Done :smile:
#10543 has been merged
Last updated: Nov 01 2025 at 14:11 UTC