Thanks.
I have raised this issue in Paul Boon's repo for the DCAT-AP exporter: https://github.com/Dans-labs/dataverse-exporter-dcatap/issues/1
We can find the metadata export function in the UI for individual datasets which uses the ordinary API, but we cannot find the exporter in our list of OAI metadata exporters. See: https://demo.dataverse.deic.dk/oai?verb=ListMetadataFormats
Do you have a take on why that is?
Hi, just improved that exporter, now it also has isHarvestable return true, should make it available via OAI then.
The code and Jar is here: https://github.com/Dans-labs/dataverse-exporter-dcatap/releases/tag/v0.1.0
@Philip Durbin 🚀 I hope you understand that plugin loading better that I do, because I do have some issues with the class loading. When I run it from my dev environment (just the maven test via that clean install) I do get different results from running it via the Dataverse application. Somehow it usess different implementation of libs, even though I build a 'fat' jar with all libs in it.
Getting the basic DCAT-AP in RDF (XML) does not look to complicated because there are not many mandatory fields. Those DCAT-AP extensions however are problematic, also because some information must be extracted from custom metadatablocks, as indicated by @Johannes D .
But, before continuing with making the plugin more fexible I want to have the RDF eyeballed by some RDF experts...
@Philip Durbin 🚀 Is it possible for an exporter plugin to be in the OAI output; I see this line in the OAIServlet.java: if (exporter != null && (exporter instanceof XMLExporter) && exporter.isHarvestable()) .
And that plugin is a io.gdcc.spi.export.Exporter. Should the test look for MediaType.APPLICATION_XML instead?
Ok, I will try with io.gdcc.spi.export.XMLExporter, that should work.
@Paul Boon any luck? How's it going?
@Philip Durbin 🚀 Somehow it is not working, so I changed back to Exporter and started improving the RDF output. But I will try to wrap things up by the end of the week. Thanks for asking.
I have found the file for the OAI DDI exporter in the Dataverse source code for inspiration, if this might help:
https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/export/OAI_DDIExporter.java
The main problem, I think, is that RDF/XML is not compatible with the OAI format, which needs an XML schema (xsd file), which we cannot provide.
The alternative (for harvesting clients) might be to use OAI harvesting to retrieve the identifiers and use those to retrieve the individual DCAT-AP metadata from the API.
So, it might be that we won't provide OAI for the DCAT-AP at all!
Huh. Interesting. @Leo Andreev check this out ^^
Maybe I am wrong; I would like to have that RDF/XML in there;
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...>
<GetRecord>
<record>
<header>
<identifier>oai:example.org:item123</identifier>
<datestamp>2025-12-02</datestamp>
<metadataFormat>RDF</metadataFormat>
</header>
<metadata>
<!-- RDF Content Here -->
</metadata>
</record>
</GetRecord>
</OAI-PMH>
@Paul Boon as someone who did a lot of work on the OAI-PMH lib we use (XOAI) I am fairly familiar with the protocol. When you say you need an XSD, what exactly are you talking about? (I'm not sure I follow... :thinking: )
Does the example at https://www.openarchives.org/OAI/openarchivesprotocol.html#Record help clear up the situation? (Assuming you're talking about how to provide the namespace URLs and attach the schemaLocations to it)
@Oliver Bertuch The oai?verb=ListMetadataFormats returns this:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2025-12-03T14:07:45Z</responseDate>
<request verb="ListMetadataFormats">https://dev.dataverse.nl/oai</request>
<ListMetadataFormats>
<metadataFormat>
<metadataPrefix>dcatap</metadataPrefix>
<schema/>
<metadataNamespace/>
</metadataFormat>
<metadataFormat>
<metadataPrefix>Datacite</metadataPrefix>
<schema>http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.5/metadata.xsd</schema>
<metadataNamespace>http://datacite.org/schema/kernel-4</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_datacite</metadataPrefix>
<schema>http://schema.datacite.org/meta/kernel-4.1/metadata.xsd</schema>
<metadataNamespace>http://datacite.org/schema/kernel-4</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_ddi</metadataPrefix>
<schema>https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd</schema>
<metadataNamespace>ddi:codebook:2_5</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>dataverse_json</metadataPrefix>
<schema>https://dataverse.org/schema/core.xsd</schema>
<metadataNamespace>https://dataverse.org/schema/core</metadataNamespace>
</metadataFormat>
</ListMetadataFormats>
</OAI-PMH>
So it has dcatap, but with no shema and no namspace...
Then if I request for the records; oai?verb=ListRecords&metadataPrefix=dcatap I get:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2025-12-03T14:11:32Z</responseDate>
<request>https://dataverse.nl/oai</request>
<error code="cannotDisseminateFormat">Format 'dcatap' not applicable in this context</error>
</OAI-PMH>
So it might be because it has no schema?
Code code code I need code :smiley: Can you point me to a repo with your exporter code?
(BTW why "dcatap" and not just "dcat"?)
Code its here: https://github.com/PaulBoon/dataverse-exporter-dcatap/tree/FixOAIExport
The format is DCAT-AP; https://semiceu.github.io/DCAT-AP/releases/3.0.0
Let me suggest using "dcat_ap" in this case then. As they use "oai_dc" to make it more readable.
Let me also suggest using https://github.com/gdcc/dataverse-exporters as the Parent POM. It's not strictly necessary, but it may be useful to have an intermediate parent pom for all of the exporters. I'm happy to hear contra arguments! :smile:
WRT to no schemas being announced... I think I know the culprit. While you are using XMLExporter (good!), you're returning empty values for these: https://github.com/PaulBoon/dataverse-exporter-dcatap/blob/e12307d3ba32e5178610acebae3cdf6a3068dc1d/src/main/java/io/gdcc/spi/export/dcatap/DCATAPExporter.java#L94-L107
I'd have a hard time figuring this out as library, too :smile:
Exactly, empty values, because I don't have a schema...
But you will need a namespace!
The RDF output starts like this:
<rdf:RDF
xmlns:dct="http://purl.org/dc/terms/"
xmlns:spdx="http://spdx.org/rdf/terms#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcatap="http://data.europa.eu/r5r/"
xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="https://doi.org/10.5072/FK2/6ZUDGC">
...
Also, you're want to use the DCAT-AP namespace, aye?
Yeah that seems right
All the namespaces properly defined
We may need to extend XOAI's model classes here. Currently, they expect 1 metadata format to have 1 namespace and 1 schemaLocation.
With RDF, that may no longer be the case.
Also, the XMLExporter interface may need an extension to reflect these facts.
@Oliver Bertuch I guess it is somewhere validating the XML and discovers it's not valid, then throws an error?
I mean, there is no workaround ?
Not sure about that.
I'd need to dive in deeper, can't tell from the top of my head what'll happen with empty strings like that.
The XML bit is usually just copied from the export (which is cached) and then written. There usually is no more validation happening.
See https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/DataverseXoaiItemRepository.java#L258-L258 - there's the bit that simple takes the exported data, reads it and then dumps it into the OAI metadata record.
Looks like I need to use the debugger...
I'd try to get the metadata format available first.
It's in the list, but not completed.
This may be a problem.
Oliver Bertuch said:
Let me also suggest using https://github.com/gdcc/dataverse-exporters as the Parent POM. It's not strictly necessary, but it may be useful to have an intermediate parent pom for all of the exporters. I'm happy to hear contra arguments! :smile:
@Oliver Bertuch please don't forget that it's still using the legacy ossrh config: https://github.com/gdcc/dataverse-exporters/issues/54
Yikes :see_no_evil:
Wasn't me
:grimacing:
Someone should change that... :speak_no_evil: :hear_no_evil: :see_no_evil:
Well, it was never updated. ossrh was the right way to do things back in the day. :smile:
@Paul Boon I should also mention that you should not put the exporters in the SPI package. It should be its own package. Here's an example: https://github.com/gdcc/exporter-ddipdf/tree/main/src/main/java/io/gdcc/export/ddipdf
Uhm, seems to be working... was looking at the wrong server, Sorry, my bad! :exploding_head:
Only that empty schema in the ListMetadataFormats, but we mayeb harvesting clients have no troubles with that.
Yes, we have a breakthrough! Awesome.
We get this info from list of metadata formats:
<metadataFormat>
<metadataPrefix>dcat_ap</metadataPrefix>
<schema/>
<metadataNamespace>http://purl.org/dc/terms/ http://spdx.org/rdf/terms# http://www.w3.org/1999/02/22-rdf-syntax-ns# http://data.europa.eu/r5r/ http://www.w3.org/2006/vcard/ns# http://www.w3.org/ns/dcat# http://www.w3.org/2000/01/rdf-schema# http://xmlns.com/foaf/0.1/</metadataNamespace>
</metadataFormat>
Why does the namespace list several namespaces? The namespace for DCAT-AP according to the spec is:
http://data.europa.eu/r5r/
source:
5. Terminology
https://semiceu.github.io/DCAT-AP/r5r/releases/3.0.0/#specterminology
The namespace for this vocabulary ishttp://data.europa.eu/r5r/.
Sorry for the short notice but in two hours @Sjaak Derksen will be talking about DCAT at the community call: https://groups.google.com/g/dataverse-community/c/sM5Kdwy1lT0/m/2Z-7a8fcBQAJ
Last updated: Jan 09 2026 at 14:18 UTC