Stream: dev

Topic: Improved "Related datasets"


view this post on Zulip Vera Clemens (Aug 18 2025 at 14:56):

@Philip Durbin ๐Ÿš€ asked me to open a topic to discuss improving "Related datasets", since it ties into a new Project "Trusted Data".

I previously wrote a feature proposal on replacing the current unstructured string fields for dataset relationships with a more structured approach. I also built a small prototype using the CVOC mechanism, which can be installed in any Dataverse instance for testing. The prototype includes autosuggest for entering related datasets within the same Dataverse, displays related datasets on the dataset page with clickable links, and automatically infers inverse relationships.

Since then, weโ€™ve also integrated and extended this in our custom UI (see screenshots). The code is still somewhat prototype/hack-ish and in internal testing, but our implementation is based on a custom datasetGroupId field. This lets us group related datasets in search results and display all related datasets (including inferred inverse and transitive relationships) on the dataset page. Dataset groups are automatically updated whenever "Related datasets" metadata changes, via a pre-publication workflow.

Screenshot from 2025-08-18 16-45-53.png
(Dataset page)
Screenshot from 2025-08-18 16-45-12.png
(Search result)

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 15:40):

Hi! Yes, thanks for kicking off this topic! (I know there's a thread on the google group going as well.)

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 15:43):

In short, we are playing around with having a new research object in Dataverse. For now, we're calling it a "review" and the idea is that people would be able to review datasets.

We figured we'd try building on the new-ish "dataset type" functionality and create a new type: datasetType=review. That's what #11747 is about and I have some code I'm playing around with at https://github.com/IQSS/dataverse/compare/11747-review-dataset-type

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 15:46):

Reviews should refer to datasets so in the code you'll see I'm using the Related Dataset metadata field.

However, very quickly I was reminded that Related Dataset has only a single, unstructured field, as you say. "Primitive" we call it in the code.

So I thought I'd at least (finally) play around with the prototype at https://github.com/vera/related-datasets-cvoc

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 15:48):

As far as actually changing in the fields in Related Dataset, yes we'd need to migrate the existing data somehow: https://github.com/vera/related-datasets-cvoc/issues/4

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 15:48):

Overall, I like the prototype, especially how you can link to local datasets and any remote URL.

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 15:49):

I gave @Ceilyn Boyd a demo on Friday.

view this post on Zulip Bethany Seeger (Aug 18 2025 at 20:45):

I'm very interested in following this. We were talking about something like this today, where we could see harvesting data sets, and also having a "review" dataset about them, for reproducability, etc. This review dataset would essentially augment the original one by containing some files created during curation / review. The goal would be to not copy the original files, just have these extra ones available for review/reproducability.

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 18 2025 at 20:46):

Ah, great. @Ceilyn Boyd how do you feel about a Zulip topic on review datasets? Or maybe trusted data generally, since review datasets are a bit of an implementation detail (one of the options).

view this post on Zulip Notification Bot (Aug 19 2025 at 13:07):

A message was moved from this topic to #dev > review datasets by Philip Durbin ๐Ÿš€.

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 21 2025 at 12:03):

We created a new topic on review datasets: #dev > review datasets @ ๐Ÿ’ฌ

view this post on Zulip Philip Durbin ๐Ÿš€ (Aug 21 2025 at 12:05):

@Vera Clemens meanwhile, I briefly demo'ed your new and improved related datasets prototype to @Ceilyn Boyd @Sonia Barbosa @Julian Gautier @Ellen K and @Danny Ebanks yesterday.


Last updated: Nov 01 2025 at 14:11 UTC