Stream: dev

Topic: datasetType (software, workflow, etc.)


view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 12 2024 at 10:45):

For this sprint I picked up #10517 which is part of #10489.

I think of this project as allowing type=software for a dataset (and other types such as type=workflow, etc.)

I will, of course, be looking at Proposal: Supporting Multiple Dataset Types in Dataverse but I'm happy to get the latest thoughts from people who are interested.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 12 2024 at 10:45):

@Oliver Bertuch this is important for HERMES, of course. I'm happy to hear what you think.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 16 2024 at 02:34):

I made a couple commits:

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 16 2024 at 13:30):

As of those commits you can create a dataset of type "software". Here's a test: https://github.com/pdurbin/dataverse/blob/b3c2dce4c4ade96dabadadf5139d3868ac6b8854/src/test/java/edu/harvard/iq/dataverse/api/DatasetTypesIT.java

view this post on Zulip Oliver Bertuch (Jul 17 2024 at 18:55):

I had a quick glimpse at your types code today. Didn't have the time to dig deep, but I have some ideas. This is not a PR yet, where should I leave comments and thoughts?

view this post on Zulip Oliver Bertuch (Jul 17 2024 at 18:56):

Should this be a draft PR?

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 17 2024 at 18:59):

I mean, I can make a draft PR if that would help you capture some thoughts.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 17 2024 at 20:12):

Here you go, a draft PR: dataset types (software, workflow, etc.) - initial support #10694

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 17 2024 at 20:17):

(From a new branch I pushed to IQSS instead of pdurbin, by the way.)

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 23 2024 at 20:04):

Thanks all, for listening to me talk about and demo dataset types so far at tech hours.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 26 2024 at 20:40):

As discussed, I stopped hard-coding the dataset types in a enum and moved them into the database. Please see https://github.com/IQSS/dataverse/pull/10694/commits/c8adf259ec9684c7db3d6a2a324973e0407cdfd5

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 29 2024 at 21:18):

Ok as of https://github.com/IQSS/dataverse/pull/10694/commits/8593d328ea5fbb94c1435b0ff4651199dcf7abef we are sending "Dataset", "Software", or "Workflow" to DataCite (using this XML: <resourceType resourceTypeGeneral="${resourceTypeGeneral}"/>).

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 29 2024 at 21:18):

Here's a software example:

Screenshot-2024-07-29-at-5.16.31PM.png

view this post on Zulip Philip Durbin ๐Ÿš€ (Jul 29 2024 at 21:19):

(Notice that it says "Software" next to the name, pyDataverse.)

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 06 2024 at 14:02):

#10694 has been merged! :tada:

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 06 2024 at 17:56):

The next issue is this one: Implement datasetType metadata block support (at global level) #10519

"Designate specific metadata fields for a specific datasetType"

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 06 2024 at 17:56):

@Oliver Bertuch is this something you want?

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 06 2024 at 17:57):

I'm having trouble imagining what the API test would look like. :thinking:

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 09 2024 at 14:00):

I'm reticent to pick it up because I'm not sure what it entails.

Also, it seems like it will make a lot of changes to the old JSF UI, which we will be throwing away some day. :trash_can:

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 09 2024 at 16:05):

At standup I argued that #10519 seems to involve a lot of JSF work and should be deferred. We're planning on talking about it more during tech hours tomorrow.

view this post on Zulip Philip Durbin ๐Ÿš€ (Dec 09 2024 at 12:36):

I made this pull request a little over a month ago. I'm happy to get some (more) feedback on it:

allow links between dataset types and metadata blocks #11001

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 05 2025 at 18:58):

Merged! allow links between dataset types and metadata blocks #11001 :tada:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 14:43):

@Oliver Bertuch @Dorothea Iglezakis @Jan Range are you still interested in this feature, in general, the idea of designating a dataset as a "software" dataset? With different metadata blocks (e.g. CodeMeta) and different licenses (Apache, etc.)? Sending "software" as the type to DataCite?

view this post on Zulip Jan Range (Feb 26 2025 at 14:43):

I would definitely +1 this :raised_hands:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 14:45):

Ok, great. I've worked on the last two PRs (both merged now) and there's another to work on but I'm feeling a bit disconnected from actual users. I hope we're building the right thing!

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 14:45):

This is the next thing to work on: Implement list of allowed datasetType licenses (at global level)ย #10520

view this post on Zulip Jan Range (Feb 26 2025 at 14:51):

Happy to test once #10520 is merged :smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 15:09):

The thing is that these features are API-only. We don't plan to modify the old JSF UI code. We'll be making use of the features in the new React UI but only after the new React UI gets to a certain level of readiness.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 15:11):

But for this feature about licenses, for example, how should it work? If you set datasetType=software, what would you like to happen with regard to licenses?

view this post on Zulip Jan Range (Feb 26 2025 at 19:23):

I am fine with an API-first solution. Ideally, the transfer of software to a dataset happens via some GitHub/Lab Action.

view this post on Zulip Jan Range (Feb 26 2025 at 19:24):

In terms of licenses, it would be great if it would switch to the subset of software licenses. I will ping Doro as she is more into the license topic and probably has some better suggestions/wishes.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 19:30):

Hmm, a subset of software licenses... not all software licenses? :big_smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 19:30):

Anyway, yes, please ping Doro!

view this post on Zulip Jan Range (Feb 26 2025 at 19:51):

Philip Durbin โ˜ƒ๏ธ schrieb:

Hmm, a subset of software licenses... not all software licenses? :big_smile:

Ah, fingers faster than the brain :grinning: Meant the subset of licenses meant for software :woozy_face:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 26 2025 at 20:38):

Ok, makes sense!

view this post on Zulip Dorothea Iglezakis (Feb 28 2025 at 08:52):

Yes, this sounds perfectly fine. In practice, we have - at the moment - three different cases: Datasets that consist only of data, datasets that consists only of software code and datasets that consist partly of code, partly of data. For the first two parts, licensing is sort of straitforward and it would be perfect, if only data licenses (CC-Licenses, ODBl, ..) would be available for data datasets and only software licenses (MIT, GPL, BSD, Apache, ...) for software licenses. It gets trickier for the mixed datasets. Sometimes the dataset is mainly code with a bit of data or mainly data with bit of code, but in most cases, we handle the license of a mixed dataset by a custom license like darus-1854 or darus-2134**. Also interesting cases are software code that is consists of other software parts licensed under another license. The problem on this custom terms are the missing machine readability of the terms.

For these cases, it would be perfect, if more than one license could be chosen and some specification of the part of the dataset would be possible (like everything in folder xyz is licensed under license A, the rest licensed under license B). I know that there is also discussion about licensing on the file level, but this could get complicated if there are a lot of files in a dataset.

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2025 at 14:19):

@Dorothea Iglezakis thanks! I'm glad two of the three use cases are straightforward! :sweat_smile:

view this post on Zulip Philip Durbin ๐Ÿš€ (Feb 28 2025 at 14:22):

I just linked to your post in a comment in the design doc. Thanks again. :heart:

view this post on Zulip Philip Durbin ๐Ÿš€ (Jun 06 2025 at 15:10):

In another topic I'm musing about what should go in the <head> for non-datasets. :thinking:

view this post on Zulip Philip Durbin ๐Ÿš€ (Jun 24 2025 at 15:30):

@Oliver Bertuch I mentioned #11589 in standup this morning. Please feel free to add a 6.7 milestone if you really want it for that release.

view this post on Zulip Oliver Bertuch (Jun 25 2025 at 15:47):

@Philip Durbin done! I added a test and some doc tweaks as requested.

view this post on Zulip Philip Durbin ๐Ÿš€ (Jun 26 2025 at 11:03):

Merged! Thanks!


Last updated: Nov 01 2025 at 14:11 UTC