Hi,
We are currently installing Dataverse. The purpose is twofold: publish datasets externally and/or make data sets available for internal consumption and reuse. Depending on the data set all three combinations are possible. The idea is to have 2 web-portals: one public - and one internal portal. I'm wondering whether such setup is possible at all to start with.
If so, the next problem is in the "and" case. We want to have the same data set available with two metadata descriptions which in general overlap but have also have distinct fields. Think for instance on the contact field. On the public portal: we want to have as contact our service desk. On the internal portal we want to have as contact a real person or alternatively we would be an additional field "internal contact" which is not publicly available.
Is such setup possible?
Best regards,
Sjaak Derksen
Everything is possible. But it might be harder to do than you think. If I understood you correctly, you want to have your inner portal as the "source of truth" and then mirror this with some minor modifications to some outside visible portal.
This kind of setup is similar to what @Leo Andreev and others have been doing by mirroring the database, just the other way around (from production to an internal test system). So if you can have a sync job that enables you to automate the mirror, inject or transform any metadata you want changed from inside to outside, sure, that could work.
I would take great care though about who is able to do changes on the outside instance. It should be "read-only", as you're likely to get into trouble when you need to do two-way merges. Also, you need to think about (read-only!) file access.
Another way to go about this might be using OAI-PMH. Your internal instance would do all the heavy lifting like PIDs, etc and you use a search query to build a specific set of datasets that should be harvested by your outside instance. Keep in mind though that this will only take care of mirroring the metadata, but will not grant file access. There's probably some other kinks and caveats one need to look at.
If you're willing to code some stuff or spend money for someone else to do that, there's probably more that can be made possible, but that would need a more in-depth discussion. Probably would be good to reach out to the GDCC (www.gdcc.io) and ask for a consult.
Thanks Oliver. It is a bit more intricate I think. We would have datasets that are only available internally (to have an internal repository, not all our data is public or worth to publicize). We also would have datasets that are available both externally and internally. The latter with metadata that is common and some fields distinct. We need to think a bit more on this. Perhaps mirroring is a viable approach.
Also, we have 2 realms (IAM wise). An internal realm (hooked up to SSO) and an external realm (user+pwd+token). I guess a portal cannot be in 2 realms at the same time. So I think we need an internal portal for our admins and an external one: public most of the time, but some data sets private. For the time being we focus on the external portal.
I'll also forward this to Jim (GDCC) as it sounded like he might have some related work.
I don't think Dataverse easily supports the exact requirements above but like Oliver said, everything is possible with code. :smile:
If you only needed to advertise to the outside world the existence of some datasets (only their metadata, that is), yes, you could harvest (using OAI-PMH) from the private to the public instance.
As long as you don't use the "native JSON" format when harvesting, you could put your "internal only" fields in a custom metadata block in the private instance. That is, if you use Dublin Core or DDI to harvest, those formats will contain an incomplete record (which is good for you!) from the private instance. So the public instance wouldn't have those "internal only" fields.
Like Oliver says, this wouldn't help at all with file requests. It would just be a way to advertise some metadata for discovery.
Last updated: Nov 01 2025 at 14:11 UTC