I have been talking with the folks from https://join2.de today. (My library is a part of the consortium) They are transitioning to use MyCoRe instead of Invenio at the moment. One of the pieces I am very interested in for our institutional repository Jülich DATA is to use their controlled vocabularies in Dataverse. I could of course write some JavaScript that reads their REST API.
But I'm wondering if it wouldn't make more sense - also with the SPA on the horizon where we will still need server side validation of data - that Dataverse itself (or some microservice that the Dataverse backend knows how to talk to) would harvest these vocabularies (e.g. using OAI-PMH) and offer them via the DV REST API / JSON Schema to some client script, via JSF as options and use it for validation on datasets coming in via API.
Harvesting controlled vocabularies makes a lot of sense from a architectural and availibility point of view. What do y'all think @Slava Tykhonov @Philip Durbin @Julian Gautier @Philipp Conzett ?
(This might be connected to the idea of custom validators for fields, see #dev > metadata validators per field )
Hi Oliver, how is it different from https://zenodo.org/records/8133723?
scripts and docs here https://github.com/gdcc/dataverse-external-vocab-support
To quote from the paper:
This mechanism does not currently take advantage of the configuration mechanism, data-* attributes, or caching of our external vocabulary support mechanism which makes it harder to see how they could be shared across repositories.
What I am envisioning is kind of going to that place. If we could harvest the controlled vocabularies from external sources, we get this caching in place. A Dataverse installation would be more independent from the vocabulary provider, as it keeps a _synchronized_ copy. OAI-PMH has been used for such harvesting for a very long time and would come in handy here.
Also, the Javascript solution is great for the UI part. But it doesn't yet allow any server side validation of the content. And it cannot be used from API clients that do not use Javascript.
Coming to think of it: if we have a cache, we can also expose it via OAI-PMH, so others can harvest the vocabularies again...
We could even go a step further and change the data model of our controlled vocabularies to implement them in a more SKOS like manner. Exposing those again as a SKOS via OAI-PMH could at least partially solve the point you mentioned in the paper about "how they could be shared across repositories"
How they're hosting their vocabularies now? If it will be in SKOS you can directly upload in Skosmos https://skosmos.org and get connection to Dataverse working.
We've implemented "cache" from Dataverse in Jena Fuseki which is component of Skosmos platform.
And forget about OAI-PMH, it's not suitable for controlled vocabularies. export in OAI-ORE has it all.
OAI-PMH would only be used as the transport / sync protocol. The payload can be OAI-ORE, serialized as XML-RDF.
Currently, MyCoRe exposes "classifications" as a custom XML thing. They are experimenting with exposing it as SKOS though. https://cmswiki.rrz.uni-hamburg.de/hummel/MyCoRe/Organisation/AnwenderWorkshop2022?action=AttachFile&do=view&target=221109_MyCoRe-ObjectListing_SKOS.pdf
If you can get vocabs in SKOS, the integration is pretty straightforward like we designed it.
It sounds like @Oliver Bertuch wants to pull down and sync the vocab values locally.
@Slava Tykhonov is saying he already implemented a cache. Is that enough? A cache?
It sounds like Oliver wants a local service as well.
The technical details here go over my head, but not having to pull data from an API every time a depositor uses a metadata field could be really helpful. When the external vocabulary support mechanism was used to suggest names from a Crossref API, we saw some performance-related issues that might make it tough for depositors to use those metadata fields, and we talked about how maintaining "local" copies of what's in that API might help.
Yes, exactly, it should be a performance win.
Awesome, yeah. In the UX WG's plans to usability test a redesign of the Citation metadata block that uses the external vocabulary support mechanism, I mention that moderators should look out for problems that might be caused by these performance-related challenges.
Don't forget about ontologies like Dublin Core during redesign.
@Slava Tykhonov could you write more about what that could mean? For example, would this involve thinking about how what's entered in the deposit form is included in the Dublin Core metadata that Dataverse repositories export?
Hi @Slava Tykhonov. The UX WG has been getting more in-depth about the redesign and we haven't discussed Dublin Core in any capacity, including how deposit metadata is imported into and exported out of repositories that use Dataverse. We have discussed the DataCite schema.
But assuming that metadata mapping to Dublin Core, import/export and controlled vocabularies is generally what you were thinking of last month, I've thought there are more appropriate standards for sharing metadata that includes values from controlled vocabularies, and that a lot of details about controlled vocabulary terms used to describe deposits will be lost when Dublin Core is used.
Let me know what you think or if you were thinking about something else. :)
Hi Julian, sorry for the late reaction. It can be very interesting if Dataverse will support DCAT and Croissant next to Dublin Core for the import/export of metadata.
Ah, @Sonia Barbosa asked about interest in DCAT, too, in another Zulip thread at https://dataverse.zulipchat.com/#narrow/stream/375707-community/topic/Support.20for.20Importing.20and.20Exporting.20DCAT.20Metadata
Last updated: Nov 01 2025 at 14:11 UTC