Stream: community

Topic: Harvesting with from= parameter


view this post on Zulip luddaniel (Nov 12 2024 at 15:43):

Hi :)
I'm experiencing issues with the from= parameter when doing a partial harvest.
Most of my repositories expect the format YYYY-MM-DDThh:mm:ssZ (which Dataverse sends) but a few expect the format YYYY-MM-DD.
And I have one that expects a real timestamp (in seconds).
Errors look like

<error code="badArgument">From must be a datestamp</error>
<error code="badArgument">from: Invalid date & time</error>
<error code="badArgument" />
<error code="badArgument">The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax.</error>

Ex:
https://cds.unistra.fr/registry/?verb=ListRecords&metadataPrefix=oai_dc&from=2024-11-07T14%3A40%3A49Z
https://cds.unistra.fr/registry/?verb=ListRecords&metadataPrefix=oai_dc&from=2024-11-07

Did you experienced it ?
Should we add an harvesting client configuration to specify the format to send ?

One related post : https://groups.google.com/g/dataverse-community/c/ceXljDp2uTw/m/1hHX3taGAQAJ

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 12 2024 at 15:51):

Sorry, I'm having a little trouble following this. The problem occurs when Dataverse is asking as a harvesting server, right?

view this post on Zulip luddaniel (Nov 12 2024 at 15:52):

Yes.
First run is good, next runs will add the from= parameter

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 12 2024 at 15:57):

To pick up any changes. Got it. Sounds like a bug. Or some ambiguity in the spec? Does the spec have anything to say about timestamp vs datestamp?

view this post on Zulip luddaniel (Nov 12 2024 at 16:00):

Both seems to be OK and most of my repositories work with both.
But I didn't find the answer.

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 12 2024 at 16:01):

Ok, can you please create an issue?

view this post on Zulip luddaniel (Nov 12 2024 at 16:01):

Do you feel like Dataverse should handle it ? or I should tell to the repositories that they are not permissive enough ?

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 12 2024 at 16:16):

Well, I'm curious what the spec says. Shouldn't the spec say who's right? :grinning:

view this post on Zulip luddaniel (Nov 12 2024 at 16:31):

https://www.openarchives.org/OAI/openarchivesprotocol.html#Dates

Datestamps used as values of the optional arguments from and until in the ListIdentifiers and ListRecords requests are encoded using ISO8601 and are expressed in UTC. These arguments are used to specify datestamp-based selective harvesting. These arguments support the "Complete date" and the "Complete date plus hours, minutes and seconds" granularities defined in ISO8601. The legitimate formats are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. Both arguments must have the same granularity. All repositories must support YYYY-MM-DD. A repository that supports YYYY-MM-DDThh:mm:ssZ should indicate so in the Identify response. A request by a harvester with finer granularity than that supported by a repository must produce an error.

view this post on Zulip luddaniel (Nov 12 2024 at 16:33):

Now we have the answer :D
I'll get back to repositories admin

view this post on Zulip Notification Bot (Nov 12 2024 at 16:33):

luddaniel has marked this topic as resolved.

view this post on Zulip luddaniel (Nov 14 2024 at 10:05):

My god, I read so bad :D

All repositories must support YYYY-MM-DD. A repository that supports YYYY-MM-DDThh:mm:ssZ should indicate so in the Identify response.

https://api.nakala.fr/oai2?verb=Identify
<granularity>YYYY-MM-DD</granularity>
https://api.nakala.fr/oai2?verb=ListRecords&metadataPrefix=oai_dc&from=2024-11-07T14%3A40%3A49Z
Error Code badArgument

https://www.seanoe.org/oai/OAIHandler?verb=Identify
<granularity>YYYY-MM-DD</granularity>
https://www.seanoe.org/oai/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&from=2024-11-07T14%3A40%3A49Z
<error code="badArgument">The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax.</error>

There is some issue in Dataverse as it always use YYYY-MM-DDThh:mm:ssZ. I'll create an issue and look at the fix soon.

view this post on Zulip luddaniel (Nov 14 2024 at 10:45):

https://github.com/IQSS/dataverse/issues/11020

view this post on Zulip Notification Bot (Nov 14 2024 at 12:09):

Philip Durbin ๐Ÿ‰ has marked this topic as unresolved.

view this post on Zulip Philip Durbin ๐Ÿš€ (Nov 14 2024 at 12:09):

@luddaniel no worries and thanks for creating that issue!


Last updated: Nov 01 2025 at 14:11 UTC