Hi! I am very happy to see the support for multiple PID providers in 6.2. Right now I am looking for a way to set the PID provider for a dataverse via API. Maybe I have looked in the wrong places, but I didn't find anything. Only for datasets/datafiles: https://guides.dataverse.org/en/latest/api/native-api.html#configure-the-pid-generator-a-dataset-uses-if-enabled
Is it possible for dataverses as well? Thank you!
It looks like setting the PID provider for a collection is only possible via UI for now. It seems to have been overlooked by us to add an Admin API endpoint like we have for the storage driver of a collection.
I'll ping folks on Slack, but I fear this will be released with 6.3. Usually a patch release is only done when we find very critical bugs.
Aw, I see. Thanks for checking.
Another question regarding multiple PID providers, is it possible at all to change the PID provider of a dataset after it has been created? Example: I moved a dataset from a Permalink dataverse to a DOI dataverse, but the dataset still has a Permalink. Is it possible for that dataset to lose the Permalink and receive a DOI, without re-creating it?
I don't think that's possible.
It kinda violates the principle that a PID is permanent...
There might be an exception with the FAKE and PermaLink providers, but that has not been put into code
As far as I know, that is
Hmm. Yes. In our use case, we would like to optionally mint DOIs for datasets, otherwise only a Permalink. So the default would be to receive a Permalink, with the option to receive a DOI instead or "upgrade" to a DOI later. (The Permalink would actually also continue to exist, but afaik Dataverse doesn't support multiple PIDs per dataset.)
If it were possible to change the PID provider of a dataset from FAKE or Permalink to DOI, that would be very nice. Is this something that might be put into code, you think?
No API for this? ![]()
In our use case, we would like to optionally mint DOIs for datasets, otherwise only a Permalink.
That is possible: everything that needs a real DOI goes into a DOI-PID-provider enabled collection and permalink stays the default provider for the instance.
with the option to receive a DOI instead or "upgrade" to a DOI later
Feel free to open an issue for this feature request
afaik Dataverse doesn't support multiple PIDs per dataset
We kind of do. We allow "alternative identifiers", but obviously these are unmanaged. On the other hand: a permalink thing is not necessary to manage, as the metadata is not going anywhere outside the instance.
something that might be put into code, you think?
I don't see why not. It will require extending the DOI provider interface with sth like "allowMigration" so no one moves real stuff to other real stuff and trips over. Migrating datasets from FAKE to a real provider after some initial demo phase etc sounds very reasonable to me, so I don't see why it shouldn't happen.
Obviously: if you want such a feature and you can contribute a PR it speeds up things. Please let us know in the issue description.
@Vera Clemens Jims says this about migrating PIDs on the internal Slack:
As for migrating PIDs, there is no support for changing the PID of a dataset, aside from editing in the db, but that's probably not the only model (presumably people are referencing the existing PID). What is supported now is migrating a dataset into Dataverse with a PID that doesn't match the local protocol/authority/shoulder, and either 1) adding a new provider that matches the protocol/authority/shoulder to allow that dataset to be managed, or 2) asking DataCite to move that specific PID(s) to the existing account and adding those additional PIDs to the managed list for the existing provider (which would then work for the original protocol/authority/shoulder plus only the specific PIDs listed that don't match the pattern.
Again, please feel free to create more feature requests and discuss possible code contributions! :smiley_cat:
@Vera Clemens I see Oliver already encouraged you to open an issue. I'll just echo that. We've been talking about it in Slack and we all agree we need this.
I have opened two issues for what we have discussed here: https://github.com/IQSS/dataverse/issues/10497 and https://github.com/IQSS/dataverse/issues/10496
@Philip Durbin @Oliver Bertuch I believe the current implementation of multiple PID provider has an inconvenient bug.
There is an API to move an unpublished dataset from one dataverse to another. Assuming both dataverses have different PID providers configured. Which PID provider do we expect to mint the DOI? IMHO it shall be the new parent, i.e. the one it was moved to. Hence, the PID properties must be altered during the move operation, right?
Nope, the PID will stay consistent.
Anything already minted will not be changed
An unpublished dataset does not have a minted DOI, does it?
Oh well yeah that is a corner case
From the data model perspective it has though...
I'm looking through the code to learn what would happen
So we are talking about this case: the dataset has not published, the create time for the PID is null. https://github.com/poikilotherm/dataverse/blob/222b326aab6be19be9d7bc8907504801bc362343/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java#L101-L101
So we're now looking at this: https://github.com/poikilotherm/dataverse/blob/222b326aab6be19be9d7bc8907504801bc362343/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/AbstractDatasetCommand.java#L156-L156
So the provider will be looked up from the PID that has been assigned when the dataset was created
So it will still use the provider that was selected on creation
So this definitely is a corner case. Let's see what's in MoveDatasetCommand about this
I agree with you that things that once minted shall not be altered! However, tracing minting issues and configuration management of moved datasets will be a mess...but thats another story. This is possible a special case of a previous configured provider is no longer available...
@Vera Clemens is currently hacking some unit/integration tests to see whats happening.
Jupp, nothing in that command dealing with migrating the PID provider.
@Vera Clemens go go go! I would love to see some tests for this! It would be great to see some BDD testing here. Describing the scenario and forming it into a test...
It would be really great to see some Cucumber around!
We have so many tests, but they would benefit a lot from story driven testing
Is it a bug or a feature? IMHO we need to thing about some edge scenarios and at least document the expected behaviour. I have those cases in mind: a) removed PID provider, thats was used to mint resources - what happens during publication of an update? b) moved minted PIDs to other dataverse - what happen if I publish an updated version 1) provider still present and 2) provider no longer present (analog to a) and 3) moved non-minted datasets to other dataverse - which PID provider is used?
You're right, this should be documented somewhere.
I'm not sure if there is a design document somewhere, I might have lost the link
@Philip Durbin @Gustavo Durand another reason to make those open, at least after the implementation is done...
(Maybe even shape them into Architecture Decision Records?)
The US is not awake yet, lets pester them once they are around
Maybe we can create a PID FAQ somewhere in the docs? Assuming lay/ non-technical users have the same questions and are overwhelmed with technical details in the design documents.
Sounds like a great addition to the Admin Guide. Usually moving datasets is admin only
With DataCite anyway, the DOI is "registered" (reserved) on the DataCite side when a draft dataset is created in Dataverse.
Researchers might put this DOI in their publication. So I think it would be potentially surprising to them if the DOI/PID changes.
Once the DOI is published, it changes state from "registered" to "findable". Please see also https://support.datacite.org/docs/doi-states
Further up in this discussion I said this might be a feature depending on the provider
So DRAFTS in dataverse are registered with datacite and thus are already present via the handle system and subsequently cannot be deleted anymore? I assumed they are just draft records with in datacite, and are deleted if the dataset is deleted in dataverse. What happens with the DOI when a draft is deleted?
Registered != published. A DataCite DOI in registered mode is not discoverable and still deletable
Honestly I'm not sure what happens :see_no_evil:
registered != publihsed but also registered != draft
From this code snippet it assume its a DRAFT record and not a REGISTERED record. This means it can be deleted...
And if its only a DRAFT, it shall not be used in any publication or published content. Thus, we can delete and create a new DOI according to another DOI-Provider configuration.
https://support.datacite.org/docs/doi-states
Reserving in Dataverse means draft state at DataCite
IIRC
Let me check
That is a good thing, yet a bit confusing.
publicizeIdentifier() is used to switch from draft to findable.
This means, we could alter the moveDatasetCommand to delete the DRAFT DOI of one provider and create a new draft with another provider, such that configured PID provider of the target dataverse is used.
Please create a feature request if not yet existing :smile_cat:
Already created I assume, but as said Vera is on the issue and will create some tests for it. This is not the only edge case we need to think of.
It might be good to talk about this at a future tech hour discussion @Gustavo Durand @Philip Durbin
Sure, but again, please warn the user:
"Are you sure you want to change the PID of this dataset? If you put the dataset's original PID in your unpublished paper, please update it!"
Philip Durbin said:
Sure, but again, please warn the user:
"Are you sure you want to change the PID of this dataset? If you put the dataset's original PID in your unpublished paper, please update it!"
Is there an UI feature to move a dataset?
Yes. It's superuser-only right now. Screenshots:
Docs: https://guides.dataverse.org/en/6.2/admin/dashboard.html#move-data
Thanks, I've never noticed this button
Would this call-out/documentation be sufficient and describe the intended behaviour of the system? "This function can be used to transfer a dataset from one dataverse to another. The PID settings of the target dataverse become active when an unpublished (i.e. draft) dataset is moved to another dataverse. This invalidates the existing PID and creates a new one. If the PID has already been in use outside the system, it will have to be adjusted. The PID configuration is not adjusted for dataset that have already been published."
It certainly helps! My concern is... is the dataset author in the loop? Do they know what the superuser is up to?
Fair enough, but I imagine this function is performed as part of a service request on behalf of a user. We could add a notification to inform the user about the performed change.
Sure, that makes sense.
At some point we should summarize in #10497
This could something like: The dataset [] was moved from [] to []. Since it wasn't published the planned PID changed to []. ย If the former PID has already been in use outside the system, it will have to be adjusted.
Philip Durbin said:
At some point we should summarize in #10497
This is another feature request.
Sure, I would show the old PID as well.
Oh, well, maybe we need a new issue?
It different than this one: set dataverse PID provider via API #10496
This thread talks about moved datasets, they should pick up the new PID configuration if not already published. ย #10497 is about upgrading published datasets from a suboptimal PID system to a "better" one. (e.g. started with internal PermaLinks and later upgrade to nice DOI PIDs.) ย #10496 is just about configure a dataverse PID provider via API.
Originally, this Zulip topic was about changing dataverse PID provider via API. :grinning:
Now it's about much more. :grinning:
We could start new topics and move messages around.
Or maybe re-title this thread? We're trying to say "collection" instead of "dataverse" these days. Maybe "changing collection PID provider"? Broad enough?
Thats nice, we also use collection in our project and renamed dataverse.
Great, I renamed this topic.
But do we have the right number of issues? We still need one more, right? "moved datasets, they should pick up the new PID configuration if not already published"
Philip Durbin said:
But do we have the right number of issues? We still need one more, right? "moved datasets, they should pick up the new PID configuration if not already published"
I going to create one and most likely implement it. Hopefully this feature can be part of the next release.
Here is the issue https://github.com/IQSS/dataverse/issues/10501
Looks great. Could you please add a link back to this topic on Zulip?
I wanted to but I cannot find the option to create a link in zulip.
Nevermind, got it now!
I usually use the sidebar
Ah, you linked to where you brought this up. Perfect.
I have been working on the tests to illustrate this issue. I played around with Cucumber @Oliver Bertuch thank you for the pointer. It has been a while since I used Cucumber for testing.
I think it makes the most sense to implement the tests as API tests, so that is what I started with. I have now run into the following issue: how can we best test the cases involving a DOI provider? I can configure the tested dataverse with a DOI provider, however we don't want the tests to cause actual requests to be sent to the DataCite APIs. We also don't want the tests to fail because we have configured a fake DataCite API URL that doesn't respond in the expected way. Is mocking the DataCite API endpoints the right way to go? How would I go about this?
Or do you envision some other way for the tests to be implemented?
I've pushed my current state here https://github.com/vera/dataverse/tree/moving-datasets-between-pid-providers it's very WIP, happy to receive feedback on it (run tests with mvn test -Dtest=RunCucumberTest)
Hmm, mocking sounds fine to me.
We discussed this issue with DataCite sometime ago on Slack as well
One idea to avoid real calls to the DataCite (Test) Fabrica was to use sth like WireMock
Last updated: Nov 01 2025 at 14:11 UTC