Improved "Related datasets" · dev

Stream: dev

Topic: Improved "Related datasets"

Vera Clemens (Aug 18 2025 at 14:56):

@Philip Durbin 🚀 asked me to open a topic to discuss improving "Related datasets", since it ties into a new Project "Trusted Data".

I previously wrote a feature proposal on replacing the current unstructured string fields for dataset relationships with a more structured approach. I also built a small prototype using the CVOC mechanism, which can be installed in any Dataverse instance for testing. The prototype includes autosuggest for entering related datasets within the same Dataverse, displays related datasets on the dataset page with clickable links, and automatically infers inverse relationships.

Since then, we’ve also integrated and extended this in our custom UI (see screenshots). The code is still somewhat prototype/hack-ish and in internal testing, but our implementation is based on a custom datasetGroupId field. This lets us group related datasets in search results and display all related datasets (including inferred inverse and transitive relationships) on the dataset page. Dataset groups are automatically updated whenever "Related datasets" metadata changes, via a pre-publication workflow.

Screenshot from 2025-08-18 16-45-53.png
(Dataset page)
Screenshot from 2025-08-18 16-45-12.png
(Search result)

Philip Durbin 🚀 (Aug 18 2025 at 15:40):

Hi! Yes, thanks for kicking off this topic! (I know there's a thread on the google group going as well.)

Philip Durbin 🚀 (Aug 18 2025 at 15:43):

In short, we are playing around with having a new research object in Dataverse. For now, we're calling it a "review" and the idea is that people would be able to review datasets.

We figured we'd try building on the new-ish "dataset type" functionality and create a new type: datasetType=review. That's what #11747 is about and I have some code I'm playing around with at https://github.com/IQSS/dataverse/compare/11747-review-dataset-type

Philip Durbin 🚀 (Aug 18 2025 at 15:46):

Reviews should refer to datasets so in the code you'll see I'm using the Related Dataset metadata field.

However, very quickly I was reminded that Related Dataset has only a single, unstructured field, as you say. "Primitive" we call it in the code.

So I thought I'd at least (finally) play around with the prototype at https://github.com/vera/related-datasets-cvoc

Philip Durbin 🚀 (Aug 18 2025 at 15:48):

As far as actually changing in the fields in Related Dataset, yes we'd need to migrate the existing data somehow: https://github.com/vera/related-datasets-cvoc/issues/4

Philip Durbin 🚀 (Aug 18 2025 at 15:48):

Overall, I like the prototype, especially how you can link to local datasets and any remote URL.

Philip Durbin 🚀 (Aug 18 2025 at 15:49):

I gave @Ceilyn Boyd a demo on Friday.

Bethany Seeger (Aug 18 2025 at 20:45):

I'm very interested in following this. We were talking about something like this today, where we could see harvesting data sets, and also having a "review" dataset about them, for reproducability, etc. This review dataset would essentially augment the original one by containing some files created during curation / review. The goal would be to not copy the original files, just have these extra ones available for review/reproducability.

Philip Durbin 🚀 (Aug 18 2025 at 20:46):

Ah, great. @Ceilyn Boyd how do you feel about a Zulip topic on review datasets? Or maybe trusted data generally, since review datasets are a bit of an implementation detail (one of the options).

Notification Bot (Aug 19 2025 at 13:07):

A message was moved from this topic to #dev > review datasets by Philip Durbin 🚀.

Philip Durbin 🚀 (Aug 21 2025 at 12:03):

We created a new topic on review datasets: #dev > review datasets @ 💬

Philip Durbin 🚀 (Aug 21 2025 at 12:05):

@Vera Clemens meanwhile, I briefly demo'ed your new and improved related datasets prototype to @Ceilyn Boyd @Sonia Barbosa @Julian Gautier @Ellen K and @Danny Ebanks yesterday.

Last updated: Jan 09 2026 at 14:18 UTC