Hello .. me again with a probably stupid question .. I admit I set up our dataverse harvesting server when I first installed the site and really haven't touched it since. I only have the DEFAULT no name set that doesn't have a "setspec" set and should be all published datasets right? Anyway, I've noticed recently that the feed is missing some datasets .. specifically there should be 83 and there are only 79. I've recently upgraded to v6.5 and did a full index of the site at that time. Should I delete/recreate the set and/or create a set that is explicitly defined? Or, any ideas why else datasets would be missing or how I can fix?
when i try to re-run the default export i see these messages in the log (and it just hangs forever and never completes):
[[setService, findAllNamedSets; query: select object(o) from OAISet as o where o.spec != '' order by o.spec]]
[[ 0 results found.]]
What have I completely messed up?
Hmm, yep, should be all. I'm not sure why a few are missing. I'm also not sure if Solr is involved or not. :thinking:
Yeah, it does look like Solr is involved.
Do you see something like this in your logs?
"set query expanded to " + datasetIds.size() + " datasets."
I think harvesting might have its own log, apart from server.log I mean.
Actually, I take it back. I don't think Solr is involved for the default set.
if (!oaiSet.isDefaultSet()) {
datasetIds = expandSetQuery(query);
exportLogger.info("set query expanded to " + datasetIds.size() + " datasets.");
} else {
// The default set includes all the local, published datasets.
// findAllLocalDatasetIds() finds the ids of all the local datasets -
// including the unpublished drafts and deaccessioned ones.
// Those will be filtered out further down the line.
datasetIds = datasetService.findAllLocalDatasetIds();
databaseLookup = true;
}
yes, i do see that .. and it looked like it was recreating the missing datasets, and maybe i just didn't wait long enough ..
.. i also created a new set and used the example for pulling the identifier .. and it looked like it was going to export 83 records as well ..
.. so wonder why the original set stopped updating?
Not sure. Strange. Please feel free to open a issue if you think it's a bug.
ohhh okay, so the new set i created says "83 datasets (79 records exported, 0 marked as deleted)" .. so it is not exporting 4 for some reason
the actual OAI log just says "Calling OAI Record Service to re-export 93 datasets."
Weird. I wonder if we'll be able to reproduce it, though. It is particular to your database? :thinking:
93? that would include the unpublished ones i think
i just don't know how to figure out why it isn't exporting those 4 .. they are from various time periods and don't seem to have weird formatting .. although there are some differences in all of them
What if you make a set with one of the missing datasets? Does it work?
trying now ..
it says "1 dataset (0 records exported, 0 marked as deleted)"
it finds the dataset but can't export those particular ones for some reason
And if you create a set for a working dataset? Does it say 1 exported?
yes it worked
ok, so something is wrong with those few, hmm :thinking:
anything in server.log?
the only thing that i'm seeing in server.log are messages like this:
[2025-02-11T20:09:02.324+0000] [Payara 6.2024.7] [INFO] [] [edu.harvard.iq.dataverse.harvest.server.OAISetServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1739304542324] [levelValue: 800] [[
setService, findAllNamedSets; query: select object(o) from OAISet as o where o.spec != '' order by o.spec]]
[2025-02-11T20:09:02.325+0000] [Payara 6.2024.7] [INFO] [] [edu.harvard.iq.dataverse.harvest.server.OAISetServiceBean] [tid: _ThreadID=93 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1739304542325] [levelValue: 800] [[
3 results found.]]
but that looks like the "all published" set and not sure why it just says "3 results found"??
Hi,
Yes, "83 datasets (79 records exported ..." would almost certainly indicate that the search query has found 83 published datasets, but only 79 of them have been successfully exported, so, the remaining 4 were not included in the OAI set advertised to the clients.
You already know the actual DOIs of the 4 missing/un-unexported datasets, correct? (ok, it looks like you know at least one - the one you've tried creating a set with...)
First step would be to identify which of the metadata formats is failing to export. (So, yes, this is a limitation of our export system - it's kind of binary/all-or-nothing; it's enough for just one format out of 10+ to fail, for the dataset to end up being "unexported". Which is a bit counter-productive, for the purposes of OAI especially - since the 3 formats needed for that are somewhat less likely to fail...)
Take a look at the storage folder for the dataset in question, on the filesystem or on S3, whichever is the case, and look for the files with the names like export_*.cached - for example, export_oai_dc.cached etc., and see which ones are missing, when compared to one of the successfully exported datasets.
Once you see which formats are missing, try exporting them individually via
curl "http://localhost:8080/api/datasets/export?exporter=xxx&persistentId=yyy"
while watching the server log; and hopefully there will be some errors/exceptions that will tell us what Dataverse doesn't like in the metadata. (during a bulk export of an OAI set error messages are suppressed, I believe).
This is pretty time-consuming, unfortunately. (but maybe someone can chime in with something easier in mind)
for one of the datasets, in the storage location I see "export_Datacite.cache", "export_oai_dc.cache" and "export_OAI_ORE.cache" ..
I tried the curl command for all three exporters for a dataset id that did export and for one that did not, and they all seemed to work .. nothing appears in server.log
these curl commands:
curl "http://localhost:8080/api/datasets/export?exporter=oai_dc&persistentId=doi:<ours>/C1CWX9"
curl "http://localhost:8080/api/datasets/export?exporter=OAI_ORE&persistentId=doi:<ours>/C1CWX9"
curl "http://localhost:8080/api/datasets/export?exporter=Datacite&persistentId=doi:<ours>/C1CWX9"
all of those seemed to generate results for the non-exported dataset
looking at one of the datasets that is working, it has more "export" files in the storage location .. ie: ddi, dcterms, dc, schema.org, etc
Yes, so, the next step should be to try the export API for the formats that are NOT there/not cached.
Note that if there are a few formats that are missing, it does NOT mean that all of them have failed to export; it may just mean that the exporter stopped once it encountered the first format it wasn't able to produce.
so try each of these?:
https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats
the oai_datacite one failed (but the Datacite one worked)
and this in the log:
IllegalStateException caught when exporting oai_datacite for dataset doi:10.48349/ASU/C1CWX9; may or may not be due to a mismatch between an exporter code and a metadata block update.
how do i fix it? :sweat_smile:
we dont' have any custom metadata blocks .. other than the computational workflow one that I accidentally installed
"may or may not"
just tell me!
haha yea!
Yes! Except you _may_ have more metadata formats configured, on top of the 9 listed in the guide. (for example, we also have "croissant" added). So, comparing to what's cached in one of the known exported datasets directories may be the safest...
OK, sounds like you have already found one that is failing. The error message in the log is not super helpful, unfortunately... if you haven't yet, could you please try all the remaining formats too, and _maybe_ we'll see something more interesting in the log?
and tell me what it is and how to fix it :smile:
The comment above the error is not very promising: https://github.com/IQSS/dataverse/blob/v6.5/src/main/java/edu/harvard/iq/dataverse/export/ExportService.java#L340
i tried all of these: ddi, oai_ddi, dcterms, oai_dc, schema.org , OAI_ORE , Datacite, oai_datacite and dataverse_json
and the only one that failed was the oai_datacite one
Which dataset is failing? Can you please give us the doi or landing page?
... Another (also very time-consuming) way of going about it is to open the full metadata edit form for this dataset, next to one for one of the known "good" datasets, and then stare at the two looking for any visible differences. Some obscure field of which you have 2 populated entries in the former, but only one in the latter, etc.
(I'm wondering if this fix will help: Openaire fix for multiple productionPlaces #11194 )
Finally, since the formats that the OAI server _actually needs_ have been exported successfully, let me think of a way to cheat and mark the dataset as "exported", for the purposes of adding it to the OAI set.
okay, i will start working on comparing the metadata .. i was doing that and didn't really see anything but i understand better now what i am looking for
Multiple productionPlace:
{
"typeName": "productionPlace",
"multiple": true,
"typeClass": "primitive",
"value": [
"Phoenix, Arizona, USA",
"Los Angeles, California, USA",
"Santa Barbara, California, USA",
"Austin, Texas, USA"
]
},
At https://dataverse.asu.edu/dataset.xhtml?persistentId=doi:10.48349/ASU/C1CWX9
thank you both!
So for that dataset, #11194 should help. (Thank you, @Florian Fritze !)
yay! okay have to go to a meeting then will read/try it
The fix in the PR above will only be added in 6.6.
If you are willing to resort to hacks, you could try and set the lastexporttime to some time today in the dataset table for this dataset, and re-export the OAI set again. (I would only try that with datasets for which at least the oai_dc format exports successfully). May or may not work, no promises. :)
oh got it .. i will try updating the table .. that sounds like a good solution for now if it works!
looking at the rest of the datasets that won't export to see if it is the same thing
they do all have multiple production locations
the db hack worked at least for that one dataset .. trying the rest
THANK YOU!! :tada:
well, it worked for the new oai set that i created with the persistent id set, but not for the default dataset .. it still says 79 .. wonder why? we will probably need to change our primo feed to point to the new one i guess
and it worked for 2 of the datasets but not the other 2 .. :confused:
when i point to one of the "fixed" ones it says it has been deleted
https://dataverse.asu.edu/oai?verb=GetRecord&identifier=doi%3A10.48349%2FASU%2FC1CWX9&metadataPrefix=oai_dc
I'm sorry I sent you on this hacky path!
Let's try and erase and rebuild the default set from scratch; but if these datasets are still left out of it after that, I think the sensible thing to do will be to wait for 6.6 to fix it properly
So, first, please erase all the records in the default set:
DELETE FROM oairecord WHERE setname = '';
(please be super careful! deleting things from the database directly is inherently risky...)
After that, the control panel should be showing "no active records" for the default set. Then re-export the set, and see what happens.
ha no worries I always learn things that I didn't know doing hacky things! :big_smile: I would like to clean up that default set anyway. Will do this morning.
well, it's better, it is now showing 81 rows .. so it included 2 of the ones that were missing before but 2 are still missing .. I'll figure out which ones are still missing and look at the metadata and see if there is anything else that could be causing this .. otherwise we will wait for the fix! :smile:
thanks so much for your help! :glowing_star:
Well, we are 2 datasets better off than we were yesterday, so I'll call it progress :)
ha yes for sure!
The 2 datasets still not showing are:
https://dataverse.asu.edu/dataset.xhtml?persistentId=doi:10.48349/ASU/7DCWIK
https://dataverse.asu.edu/dataset.xhtml?persistentId=doi:10.48349/ASU/C1CWX9
They both have multiple production locations. I tried changing the lastexporttime for both to an earlier date (most of them show november '24) .. but it didn't make a difference, still didn't show in the feed after re-export.
I guess we could try deleting all but one of the production locations until the fix is released and see if that works, and then add them back after the fix.
Should work, unless that exporter is failing for more reasons than just Production Place.
And maybe double-check the lastexporttime in the dataset table for these 2, that it is actually something later than the releasetime on their latest datasetversions - ?
ohhhhhh that was it! The release date on those 2 was blank again (maybe because I tried to set it earlier?) .. idk .. but i made sure the lastexportdate was later than the release date, and now everything is showing as exported! I swear, yesterday when i did this I set them all to yesterday's date and it didn't work .. but for some reason those had to be set again. THANK YOU! We are good for now but will wait for the fix!
Last updated: Oct 30 2025 at 06:21 UTC