Stream: python

Topic: Network search


view this post on Zulip Jan Range (Apr 10 2025 at 20:10):

Now that we have the hub, wouldn’t it be great to have a “hub search” feature that queries each Search API? I think this could be a useful addition to DVCLI or pyDataverse.

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:11):

Yes!

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:11):

I don't really think of this as "hub search" though... maybe "network search"?

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:12):

A while back Jon Crabtree had the idea of a Dataverse installation that doesn't have any data of its own but harvests from all known installations of Dataverse.

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:12):

@Don Sizemore ^^

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:12):

But it sounds like you have a different implementation in mind, Jan.

view this post on Zulip Jan Range (Apr 10 2025 at 20:13):

Network search is nice!

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:15):

The software used to be called DVN for Dataverse Network. But now that we call it just Dataverse I suppose that frees up the word "network" for us. :big_smile:

view this post on Zulip Jan Range (Apr 10 2025 at 20:16):

DVCLI already has an implementation of the Search API. It could grab the URLs of all or a subset of installations and send search queries to each. This would kind of be an equivalent of the "empty" Harvester, but it would not rely on hosting a separate installation.

view this post on Zulip Jan Range (Apr 10 2025 at 20:16):

As a CLI command this could be pretty neat, or what do you think?

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:17):

I think playing around with client code for now sounds like a good idea. Longer term a service might be nice.

view this post on Zulip Jan Range (Apr 10 2025 at 20:17):

True, I'll add this once the Hub implementation is in both libraries :dataverse_man:

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:18):

How well would it scale, client side, I wonder. :thinking:

view this post on Zulip Jan Range (Apr 10 2025 at 20:24):

Roughly a hundred requests in parallel should be manageable, at least in Rust. I fear the output would be quite large. Maybe an export to CSV or XLSX could render it more user-friendly.

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:25):

yeah

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:25):

One could also search DataCite: https://commons.datacite.org

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:25):

Or https://datasetsearch.research.google.com

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:25):

But I don't see a way to filter down to just Dataverse installations. :big_smile:

view this post on Zulip Jan Range (Apr 10 2025 at 20:26):

Good point :grinning:

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:26):

It would be nice to have our own thing.

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:26):

A benefit of being part of the club.

view this post on Zulip Jan Range (Apr 10 2025 at 20:27):

"The Dataverse Club" - Hosting merge parties :grinning:

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:27):

we have our own DJ even

view this post on Zulip Sherry Lake (Apr 10 2025 at 20:35):

Philip Durbin ☀️ said:

But I don't see a way to filter down to just Dataverse installations. :big_smile:

We do have the DOIs for Dataverse installations in the crowd-source spreadsheet. And maybe they are all DataCite, none cross-ref? There is also a way to tell which minting service a DOI came from (10.18130 is UVa's DOI which comes from DataCite):

https://doi.org/ra/10.18130

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:38):

True, and they're in https://hub.dataverse.org as well.

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:38):

And yeah, I don't think there are any CrossRef in there yet.

view this post on Zulip Philip Durbin 🚀 (Apr 10 2025 at 20:38):

A few Handles.


Last updated: Nov 01 2025 at 14:11 UTC