getting dataset ID in a timely manner · community

Stream: community

Topic: getting dataset ID in a timely manner

Péter Pallinger (Oct 10 2024 at 09:53):

I am trying to find a way to get the internal dataset id for a dataset with many files (170k), the PID is known.
Search API finishes in about two minutes, and does not return the dataset ID. (/api/search?q=PID)
The native API times out, because it seems to try to serialize all file metadata. (/api/datasets/:persistentId/?persistentId=PID)

Péter Pallinger (Oct 10 2024 at 09:56):

So can either the dataset id be gotten from the search api somehow, or the files be excluded from the dataset listing in the native API?

Philip Durbin 🚀 (Oct 10 2024 at 10:21):

I would suggest setting show_entity_ids=true when using the Search API: https://guides.dataverse.org/en/6.4/api/search.html

Péter Pallinger (Oct 10 2024 at 11:40):

Thanks, this seems to work, even if it is quite slow.

Philip Durbin 🚀 (Oct 10 2024 at 12:12):

Is something like this faster? https://dataverse.harvard.edu/api/datasets/:persistentId/versions/:latest-published?persistentId=doi:10.7910/DVN/TJCLKP&excludeFiles=false

Péter Pallinger (Oct 10 2024 at 12:26):

I tried with both excludeFiles=true and excludeFiles=false but they do not work. Most probably because our dataverse is still on 6.1 :( . But I will keep that in mind in case we upgrade.

Philip Durbin 🚀 (Oct 10 2024 at 13:17):

Oh, right. I forget when it was added. Pretty recently, for the SPA, because it was slow to retrieve that data otherwise.

Philip Durbin 🚀 (Oct 10 2024 at 13:19):

It looks like it was added in 6.2 (renamed at least): https://guides.dataverse.org/en/6.4/api/changelog.html#v6-2

Philip Durbin 🚀 (Oct 10 2024 at 13:20):

Renamed in PR #10191.

Philip Durbin 🚀 (Oct 10 2024 at 13:21):

Anyway, @Péter Pallinger I guess we could add a new API that takes a DOI/PID and simply returns the database id. If you want something like this, please feel free to open an issue.

Last updated: Jan 09 2026 at 14:18 UTC