Stream: dev

Topic: metadata validators per field


view this post on Zulip Oliver Bertuch (Jul 05 2024 at 05:15):

I have a crazy idea. How about we enable configuring a metadata validator per field via MPCONFIG? We could allow loading these as plugins like exporters if an internal one does not suffice. (Even with the SPA we still need a server side validation.) These configurations could even be requested as a JSON Schema expression, allowing for client side validation. Comments?

view this post on Zulip Jan Range (Jul 05 2024 at 06:38):

I like it! How would I write such a field-validator?

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:44):

I'd say some stuff should be included like numeric min/max

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:44):

Then you just use it by configuration

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:45):

But to enable custom validators, these should use an SPI loaded Java plugin

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:45):

Obviously you can use the same tricks again to go from there to Javascript/Python, but this comes with a performance penalty

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:46):

They'd be loaded and connected to a field by configuration

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:46):

It would certainly allow for much more complex validations

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:47):

Probably for a first step it would be instance wide configuration only, but one should consider a later extension to configuration per collection.

view this post on Zulip Jan Range (Jul 05 2024 at 06:54):

Could the numeric and format validation logic be outsourced to the JSON Schema of a metadata block? This way, other instances could reuse the validation logic, and some existing features could be used already. Or is this not possible with the current implementation?

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:56):

This starts to sound like a chicken and egg problem

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:56):

We could of course write a custom validator that extracts information like this from a JSON Schema

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:56):

In fact, that would be one of the ideas for these validators, enabling a textbox to be filled with JSON controlled by a schema

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:58):

Using JSON Schemas as a format for metadata schema definitions in Java (so they become the source, not the target like we're talking about in the JSON schema topic) is a long lasting dream, but goes beyond what I envisioned

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 06:58):

Also, this would still need configuration per collection I suppose

view this post on Zulip Oliver Bertuch (Jul 05 2024 at 07:01):

The validators I envision would have a Java interface that would be asked to hand out a JSON schema thing, to be included in the JSON schema you can retrieve now via the Dataverse API

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 13:36):

Custom validators sound fun. Out of curiosity, do you have a specific use case?

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:37):

Yes!

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 13:37):

Do tell.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:37):

I'm grabbing the links as we speak

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:37):

https://data.fz-juelich.de/guide/juelich/data-linking.html

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:37):

We have this custom metadata field

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:38):

In our fork, we added a custom metadata type "uri" for it

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:38):

As the URL type would not be sufficient

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:38):

With a text field and a custom validator, we could achieve the same but keep upstream compatibility, no fork necessary

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 13:40):

Ah, so you want to support both http:// and smb:// for example.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:41):

Exactly :-)

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 13:41):

Sounds fairly custom but maybe someone can reuse your validator some day. :grinning:

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:41):

I can immediately envision more features for this

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:42):

Controlled vocabularies without adding them to the schema

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:42):

Lookup services

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:43):

Restriction of URLs

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:43):

Someone might want to disallow using certain author schemes that are in citation.tsv

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 13:43):

#9750 from @luddaniel made it into 6.3 but it would be nice to drop in jar and not wait for a release.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 13:43):

But instead of forking the schema, the validator would bark at sth that is forbidden

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 18:41):

Could the custom validators work on guestbook fields? See #10661 opened by @Dimitri Szabo

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 18:42):

Also, what's the plan for keeping React in sync with these custom validators?

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:50):

From a technical viewpoint, these validators would probably hook into Bean Validations.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:51):

So it should be possible to use these on anything we want them to use on.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:51):

Usually Bean Validators get attached using the decorator pattern, so I don't see why this shouldn't be possible for guest books.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:53):

My idea to expose these validators would be to make any of these plugins express themselves as JSON Schema. That way they could be picked up by any client.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:53):

As far as I know, the backend will still be the source of authority for any validation, right @Guillermo Portas ?

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:55):

If I understood it correctly, the frontend will now use the API to retrieve the fields and data types. So aside from including this into the JSON Schema API endpoints, it should be possible to embed these validators in some serialized form into any other API endpoint as well.

view this post on Zulip Oliver Bertuch (Jul 08 2024 at 18:56):

It's not like we would remove everything else - we'd keep the data type around, but extend the definition to possible values/ranges/...

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 08 2024 at 19:00):

For email validation there was a mismatch that was corrected in https://github.com/IQSS/dataverse-frontend/pull/402

view this post on Zulip Oliver Bertuch (Jul 10 2024 at 17:59):

Yeah, but is a duplication of the constraint check. I'm suggesting we enable receiving these constraints as regex or whatever using the API, based on the custom validator implementation. In addition to being able to receive a JSON Schema thing the same interface could request responding with some Javascript validator, reusable in the SPA and other clients.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 10 2024 at 18:49):

Reusable in the SPA would be great.


Last updated: Nov 01 2025 at 14:11 UTC