Stream: community

Topic: Datasets - Migration - License JSON


view this post on Zulip Julien C (Sep 28 2023 at 07:34):

Hello Dataverse community,

I have a question about how to migrate datasets with the correct licensing terms.

In the documentation (version 5.10) it is indicated that you must specify which license you want to transfer in the JSON file,
But this is not taken into account with the API command (https://guides.dataverse.org/en/5.10/api/native-api.html#id41)

I want to integrate this but the "license" field under "datasetVersion" in the Json doesn't seem to work.
I tested it on the same Dataverse installation with the same version 5.10.

The other condition fields work but not this one...
Is there a way to explain this or tell how to put this field correctly in the Json file?
Note: the field 'termsOfUse' is not present or set to null (it doesn't change the result)

Thanks in advance,
Best regards

view this post on Zulip Julien C (Sep 28 2023 at 07:36):

One important precision: the controlled Vocabulary field for 'license' is the same on the target server.

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 28 2023 at 11:07):

Quite possibly you have found a bug.

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 28 2023 at 11:07):

What license are you using here?

    "license": {
      "name": "CC0 1.0",
      "uri": "http://creativecommons.org/publicdomain/zero/1.0"
    },

view this post on Zulip Julien C (Sep 29 2023 at 06:16):

Hello Philip,
The license concerned is :
"license": {"name": "etalab 2.0", "uri": "https://spdx.org/licenses/etalab-2.0.html"}
Same license on source server and target server.
I can't get to it to work.

view this post on Zulip Philip Durbin ๐Ÿš€ (Sep 29 2023 at 21:07):

Sorry, sorry, busy with #9919. If I don't look at this early next week, please remind me! Have a good weekend!

view this post on Zulip Julien C (Oct 02 2023 at 07:53):

Hello Philippe,
No problem, I checked it several times and tested it, same results every time.
The licenses are the same on both sides (origin and target server with the same API version 5.10).
This is annoying because other terms and conditions are taken into account.
Maybe I'm defining things the wrong way in JSON for licensing.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 10 2023 at 14:00):

@Sherry Lake does this sound reminiscent of problems you were having updating licenses/terms via API?

view this post on Zulip Sherry Lake (Oct 10 2023 at 23:35):

Could be... but my question about this unique license is has it been defined ("added/configured") before it is used in the migration json?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 11 2023 at 00:30):

I had lunch with @Dimitri Szabo today and yes, he confirmed that the license should be there on the server side.

view this post on Zulip Julien C (Oct 13 2023 at 15:03):

Hello, thank you for your answers.
I specified that my test was on a duplicate server (same licenses on both sides) and with the correct license indicated in the JSON file and it doesn't work: I mean the API request was OK but the license has not been selected.
Am I missing something for this to work correctly?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 13 2023 at 20:57):

Jim thinks we should try the Migration API. However, it requires an API token. Can you please send me the JSON from the "get dataset metadata" curl command at https://guides.dataverse.org/en/5.10/developers/dataset-semantic-metadata-api.html ?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 17 2023 at 15:48):

@Julien C did you get a chance to try this?

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 18 2023 at 14:01):

@Julien C and did you see this? A new Python script for migrating datasets from one installation to another: https://groups.google.com/g/dataverse-community/c/PfKIZFxFZhE/m/RBC0j4q1HQAJ

view this post on Zulip Julien C (Oct 20 2023 at 09:31):

Hello and thanks for sharing (I see some SQL on dvobject table). This is not something I can do on the target server.
Maybe it can be done in the future, but I need to have confirmation this type of modification directly in SQL is Ok for the whole integrity of Database.

view this post on Zulip Julien C (Oct 20 2023 at 09:37):

https://guides.dataverse.org/en/5.10/developers/dataset-semantic-metadata-api.html
I will run some tests. Thanks, If it works it needs to be done separately with one more step for license.
My initial attempt was to integrate this with the rest of all available metadata. (not working for license field but all others terms tab working)
{ "datasetVersion": {"license": {"name": "etalab 2.0", "uri": "https://spdx.org/licenses/etalab-2.0.html"}, ...}]

view this post on Zulip Julien C (Oct 20 2023 at 09:49):

Philip Durbin said:

Julien C and did you see this? A new Python script for migrating datasets from one installation to another: https://groups.google.com/g/dataverse-community/c/PfKIZFxFZhE/m/RBC0j4q1HQAJ

There is many things to consider.
Dataset published and only in version 1.0, others with versions and possible Draft, those only on draft and with no publication yet.
Get a perfect image of the installation is possible (I'm pretty close to make it done).
But remain the problem of DOI for Datasets, and DOI for DataFiles and change the date of publication (version included) and date of upload for resources, it's a lot of things to consider (Datasets version it's more complicated because you need a DOI validation on 1st version before u can do some update on metadata for Datasets itself and DataFiles in version (with addition, deletion)...
My main problem is to handle correctly all DOI migration. If Publication date and upload date can be done with SQL, ok but I need these too before updating to DOI provider on first version of a dataset.

I have forgot to mentioned all my work in actually scripted (Python in ETL). So no manual edition possible, especially when you have 15 000 published Datafiles for a total of 22 000 for around 100 Go of data.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 20 2023 at 10:53):

Right, it sounds like keeping the DOIs (for both datasets and files) is super important for you, which makes sense.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 20 2023 at 10:53):

Please let me know how the testing with the Migration API goes.

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 20 2023 at 11:09):

At a high level, especially for the SQL question (which I can't answer), I'll be looking for help from the team via the ticket you opened: https://help.hmdc.harvard.edu/Ticket/Display.html?id=349765

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 20 2023 at 18:17):

@Julien C I just checked with our tech lead and either he or someone else will reach out via that ticket on Monday.

view this post on Zulip Julien C (Oct 23 2023 at 09:18):

SOLVED
With API called Dataset Semantic Metadata the modification of license attribution works. (tested and approved).
https://guides.dataverse.org/en/5.12/developers/dataset-semantic-metadata-api.html

The only important thing is that the target server must already have implemented the equivalent license for selection.
In python with request library it's simple, the json-ld must be indicated in data parameter in the put request.

Maybe it will help other people in the same situation.
Regards

view this post on Zulip Philip Durbin ๐Ÿš€ (Oct 23 2023 at 11:44):

Great news! :tada:


Last updated: Nov 01 2025 at 14:11 UTC