Stream: troubleshooting

Topic: Collection custom metadata & DCAT-AP


view this post on Zulip Asbjørn Skødt (Nov 26 2025 at 11:02):

Thanks.

I have raised this issue in Paul Boon's repo for the DCAT-AP exporter: https://github.com/Dans-labs/dataverse-exporter-dcatap/issues/1

We can find the metadata export function in the UI for individual datasets which uses the ordinary API, but we cannot find the exporter in our list of OAI metadata exporters. See: https://demo.dataverse.deic.dk/oai?verb=ListMetadataFormats

Do you have a take on why that is?

view this post on Zulip Paul Boon (Nov 26 2025 at 13:06):

Hi, just improved that exporter, now it also has isHarvestable return true, should make it available via OAI then.
The code and Jar is here: https://github.com/Dans-labs/dataverse-exporter-dcatap/releases/tag/v0.1.0

view this post on Zulip Paul Boon (Nov 26 2025 at 13:10):

@Philip Durbin 🚀 I hope you understand that plugin loading better that I do, because I do have some issues with the class loading. When I run it from my dev environment (just the maven test via that clean install) I do get different results from running it via the Dataverse application. Somehow it usess different implementation of libs, even though I build a 'fat' jar with all libs in it.

view this post on Zulip Paul Boon (Nov 26 2025 at 13:24):

Getting the basic DCAT-AP in RDF (XML) does not look to complicated because there are not many mandatory fields. Those DCAT-AP extensions however are problematic, also because some information must be extracted from custom metadatablocks, as indicated by @Johannes D .
But, before continuing with making the plugin more fexible I want to have the RDF eyeballed by some RDF experts...

view this post on Zulip Paul Boon (Nov 27 2025 at 15:37):

@Philip Durbin 🚀 Is it possible for an exporter plugin to be in the OAI output; I see this line in the OAIServlet.java: if (exporter != null && (exporter instanceof XMLExporter) && exporter.isHarvestable()) .
And that plugin is a io.gdcc.spi.export.Exporter. Should the test look for MediaType.APPLICATION_XML instead?

view this post on Zulip Paul Boon (Dec 01 2025 at 12:53):

Ok, I will try with io.gdcc.spi.export.XMLExporter, that should work.

view this post on Zulip Philip Durbin 🚀 (Dec 01 2025 at 14:44):

@Paul Boon any luck? How's it going?

view this post on Zulip Paul Boon (Dec 02 2025 at 18:31):

@Philip Durbin 🚀 Somehow it is not working, so I changed back to Exporter and started improving the RDF output. But I will try to wrap things up by the end of the week. Thanks for asking.

view this post on Zulip Asbjørn Skødt (Dec 03 2025 at 11:57):

I have found the file for the OAI DDI exporter in the Dataverse source code for inspiration, if this might help:
https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/export/OAI_DDIExporter.java

view this post on Zulip Paul Boon (Dec 03 2025 at 13:08):

The main problem, I think, is that RDF/XML is not compatible with the OAI format, which needs an XML schema (xsd file), which we cannot provide.
The alternative (for harvesting clients) might be to use OAI harvesting to retrieve the identifiers and use those to retrieve the individual DCAT-AP metadata from the API.
So, it might be that we won't provide OAI for the DCAT-AP at all!

view this post on Zulip Philip Durbin 🚀 (Dec 03 2025 at 13:37):

Huh. Interesting. @Leo Andreev check this out ^^

view this post on Zulip Paul Boon (Dec 03 2025 at 13:43):

Maybe I am wrong; I would like to have that RDF/XML in there;

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...>
    <GetRecord>
        <record>
            <header>
                <identifier>oai:example.org:item123</identifier>
                <datestamp>2025-12-02</datestamp>
                <metadataFormat>RDF</metadataFormat>
            </header>
            <metadata>
                <!-- RDF Content Here -->
            </metadata>
        </record>
    </GetRecord>
</OAI-PMH>

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:11):

@Paul Boon as someone who did a lot of work on the OAI-PMH lib we use (XOAI) I am fairly familiar with the protocol. When you say you need an XSD, what exactly are you talking about? (I'm not sure I follow... :thinking: )

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:14):

Does the example at https://www.openarchives.org/OAI/openarchivesprotocol.html#Record help clear up the situation? (Assuming you're talking about how to provide the namespace URLs and attach the schemaLocations to it)

view this post on Zulip Paul Boon (Dec 03 2025 at 14:16):

@Oliver Bertuch The oai?verb=ListMetadataFormats returns this:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2025-12-03T14:07:45Z</responseDate>
<request verb="ListMetadataFormats">https://dev.dataverse.nl/oai</request>
<ListMetadataFormats>
<metadataFormat>
<metadataPrefix>dcatap</metadataPrefix>
<schema/>
<metadataNamespace/>
</metadataFormat>
<metadataFormat>
<metadataPrefix>Datacite</metadataPrefix>
<schema>http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.5/metadata.xsd</schema>
<metadataNamespace>http://datacite.org/schema/kernel-4</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_datacite</metadataPrefix>
<schema>http://schema.datacite.org/meta/kernel-4.1/metadata.xsd</schema>
<metadataNamespace>http://datacite.org/schema/kernel-4</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>oai_ddi</metadataPrefix>
<schema>https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd</schema>
<metadataNamespace>ddi:codebook:2_5</metadataNamespace>
</metadataFormat>
<metadataFormat>
<metadataPrefix>dataverse_json</metadataPrefix>
<schema>https://dataverse.org/schema/core.xsd</schema>
<metadataNamespace>https://dataverse.org/schema/core</metadataNamespace>
</metadataFormat>
</ListMetadataFormats>
</OAI-PMH>

So it has dcatap, but with no shema and no namspace...
Then if I request for the records; oai?verb=ListRecords&metadataPrefix=dcatap I get:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2025-12-03T14:11:32Z</responseDate>
<request>https://dataverse.nl/oai</request>
<error code="cannotDisseminateFormat">Format 'dcatap' not applicable in this context</error>
</OAI-PMH>

So it might be because it has no schema?

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:17):

Code code code I need code :smiley: Can you point me to a repo with your exporter code?

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:19):

(BTW why "dcatap" and not just "dcat"?)

view this post on Zulip Paul Boon (Dec 03 2025 at 14:21):

Code its here: https://github.com/PaulBoon/dataverse-exporter-dcatap/tree/FixOAIExport
The format is DCAT-AP; https://semiceu.github.io/DCAT-AP/releases/3.0.0

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:22):

Let me suggest using "dcat_ap" in this case then. As they use "oai_dc" to make it more readable.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:24):

Let me also suggest using https://github.com/gdcc/dataverse-exporters as the Parent POM. It's not strictly necessary, but it may be useful to have an intermediate parent pom for all of the exporters. I'm happy to hear contra arguments! :smile:

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:25):

WRT to no schemas being announced... I think I know the culprit. While you are using XMLExporter (good!), you're returning empty values for these: https://github.com/PaulBoon/dataverse-exporter-dcatap/blob/e12307d3ba32e5178610acebae3cdf6a3068dc1d/src/main/java/io/gdcc/spi/export/dcatap/DCATAPExporter.java#L94-L107

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:26):

I'd have a hard time figuring this out as library, too :smile:

view this post on Zulip Paul Boon (Dec 03 2025 at 14:29):

Exactly, empty values, because I don't have a schema...

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:30):

But you will need a namespace!

view this post on Zulip Paul Boon (Dec 03 2025 at 14:31):

The RDF output starts like this:

<rdf:RDF
    xmlns:dct="http://purl.org/dc/terms/"
    xmlns:spdx="http://spdx.org/rdf/terms#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dcatap="http://data.europa.eu/r5r/"
    xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
    xmlns:dcat="http://www.w3.org/ns/dcat#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="https://doi.org/10.5072/FK2/6ZUDGC">
...

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:31):

Also, you're want to use the DCAT-AP namespace, aye?

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:32):

Yeah that seems right

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:32):

All the namespaces properly defined

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:33):

We may need to extend XOAI's model classes here. Currently, they expect 1 metadata format to have 1 namespace and 1 schemaLocation.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:33):

With RDF, that may no longer be the case.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:34):

Also, the XMLExporter interface may need an extension to reflect these facts.

view this post on Zulip Paul Boon (Dec 03 2025 at 14:35):

@Oliver Bertuch I guess it is somewhere validating the XML and discovers it's not valid, then throws an error?
I mean, there is no workaround ?

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:37):

Not sure about that.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:38):

I'd need to dive in deeper, can't tell from the top of my head what'll happen with empty strings like that.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:38):

The XML bit is usually just copied from the export (which is cached) and then written. There usually is no more validation happening.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:42):

See https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/DataverseXoaiItemRepository.java#L258-L258 - there's the bit that simple takes the exported data, reads it and then dumps it into the OAI metadata record.

view this post on Zulip Paul Boon (Dec 03 2025 at 14:48):

Looks like I need to use the debugger...

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:50):

I'd try to get the metadata format available first.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:50):

It's in the list, but not completed.

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:50):

This may be a problem.

view this post on Zulip Philip Durbin 🚀 (Dec 03 2025 at 14:54):

Oliver Bertuch said:

Let me also suggest using https://github.com/gdcc/dataverse-exporters as the Parent POM. It's not strictly necessary, but it may be useful to have an intermediate parent pom for all of the exporters. I'm happy to hear contra arguments! :smile:

@Oliver Bertuch please don't forget that it's still using the legacy ossrh config: https://github.com/gdcc/dataverse-exporters/issues/54

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:55):

Yikes :see_no_evil:

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:55):

Wasn't me

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:55):

:grimacing:

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 14:56):

Someone should change that... :speak_no_evil: :hear_no_evil: :see_no_evil:

view this post on Zulip Philip Durbin 🚀 (Dec 03 2025 at 14:56):

Well, it was never updated. ossrh was the right way to do things back in the day. :smile:

view this post on Zulip Oliver Bertuch (Dec 03 2025 at 15:24):

@Paul Boon I should also mention that you should not put the exporters in the SPI package. It should be its own package. Here's an example: https://github.com/gdcc/exporter-ddipdf/tree/main/src/main/java/io/gdcc/export/ddipdf

view this post on Zulip Paul Boon (Dec 03 2025 at 17:00):

Uhm, seems to be working... was looking at the wrong server, Sorry, my bad! :exploding_head:
Only that empty schema in the ListMetadataFormats, but we mayeb harvesting clients have no troubles with that.

view this post on Zulip Asbjørn Skødt (Dec 05 2025 at 12:30):

Yes, we have a breakthrough! Awesome.

We get this info from list of metadata formats:

<metadataFormat>
<metadataPrefix>dcat_ap</metadataPrefix>
<schema/>
<metadataNamespace>http://purl.org/dc/terms/ http://spdx.org/rdf/terms# http://www.w3.org/1999/02/22-rdf-syntax-ns# http://data.europa.eu/r5r/ http://www.w3.org/2006/vcard/ns# http://www.w3.org/ns/dcat# http://www.w3.org/2000/01/rdf-schema# http://xmlns.com/foaf/0.1/</metadataNamespace>
</metadataFormat>

Why does the namespace list several namespaces? The namespace for DCAT-AP according to the spec is:
http://data.europa.eu/r5r/

source:

5. Terminology

https://semiceu.github.io/DCAT-AP/r5r/releases/3.0.0/#specterminology
The namespace for this vocabulary is http://data.europa.eu/r5r/ .

view this post on Zulip Philip Durbin 🚀 (Jan 06 2026 at 13:01):

Sorry for the short notice but in two hours @Sjaak Derksen will be talking about DCAT at the community call: https://groups.google.com/g/dataverse-community/c/sM5Kdwy1lT0/m/2Z-7a8fcBQAJ


Last updated: Jan 09 2026 at 14:18 UTC