Stream: community

Topic: getting dataset ID in a timely manner


view this post on Zulip Péter Pallinger (Oct 10 2024 at 09:53):

I am trying to find a way to get the internal dataset id for a dataset with many files (170k), the PID is known.
Search API finishes in about two minutes, and does not return the dataset ID. (/api/search?q=PID)
The native API times out, because it seems to try to serialize all file metadata. (/api/datasets/:persistentId/?persistentId=PID)

view this post on Zulip Péter Pallinger (Oct 10 2024 at 09:56):

So can either the dataset id be gotten from the search api somehow, or the files be excluded from the dataset listing in the native API?

view this post on Zulip Philip Durbin 🚀 (Oct 10 2024 at 10:21):

I would suggest setting show_entity_ids=true when using the Search API: https://guides.dataverse.org/en/6.4/api/search.html

view this post on Zulip Péter Pallinger (Oct 10 2024 at 11:40):

Thanks, this seems to work, even if it is quite slow.

view this post on Zulip Philip Durbin 🚀 (Oct 10 2024 at 12:12):

Is something like this faster? https://dataverse.harvard.edu/api/datasets/:persistentId/versions/:latest-published?persistentId=doi:10.7910/DVN/TJCLKP&excludeFiles=false

view this post on Zulip Péter Pallinger (Oct 10 2024 at 12:26):

I tried with both excludeFiles=true and excludeFiles=false but they do not work. Most probably because our dataverse is still on 6.1 :( . But I will keep that in mind in case we upgrade.

view this post on Zulip Philip Durbin 🚀 (Oct 10 2024 at 13:17):

Oh, right. I forget when it was added. Pretty recently, for the SPA, because it was slow to retrieve that data otherwise.

view this post on Zulip Philip Durbin 🚀 (Oct 10 2024 at 13:19):

It looks like it was added in 6.2 (renamed at least): https://guides.dataverse.org/en/6.4/api/changelog.html#v6-2

view this post on Zulip Philip Durbin 🚀 (Oct 10 2024 at 13:20):

Renamed in PR #10191.

view this post on Zulip Philip Durbin 🚀 (Oct 10 2024 at 13:21):

Anyway, @Péter Pallinger I guess we could add a new API that takes a DOI/PID and simply returns the database id. If you want something like this, please feel free to open an issue.


Last updated: Nov 01 2025 at 14:11 UTC