Good afternoon,
I'm currently working on a task to migrate our Dataverse installation from a VM-based setup to a new Docker-based architecture. I have a question regarding the migration of production data (version 5.12.1), including users, metadata, and files, to the new 6.8 version.
Is there a recommended strategy for backing up and migrating all data between these versions?
Iโve checked the official documentation, which mentions the option of migrating data between repositories, but Iโd like to confirm whether that is the only available approach or if there are better methods for this kind of upgrade.
My initial plan is:
Could you advise on the best approach to ensure data integrity and consistency throughout the migration process?
@Marcos Anjos hi! Unfortunately, you just missed our Containerization Working Group meeting. This would have been a great topic? Would you like us to put it on the agenda for next month? Here's our website: https://ct.gdcc.io
Meanwhile, we'll try to get you some answers but I'll probably need to lean on others' expertise.
Generally, I'll say that your plan sound fine! Please let us know if find any bumps in the road!
Philip Durbin ๐ said:
Meanwhile, we'll try to get you some answers but I'll probably need to lean on others' expertise.
Just tag @Don Sizemore and me :stuck_out_tongue_closed_eyes: :wink:
Welcome @Marcos Anjos and congrats on choosing containers! You're in for a fun joyride :smiling_imp: But don't worry, this is #troubleshooting and we're here to help.
For the migration, let me highlight some bits I'm currently doing with my own container to container migration from 4.20 to 6.8...
It's really handy to have a snapshot of the database around for experiments. With containers, you can do most of these experiments on your laptop. For many scenarios it's not necessary to do this on some beefy machine.
In case you don't want to jump through the releases one by one, which is the only officially supported way of doing upgrades, there's a few ways you can go about this.
Some have done dumps of the metadata and re-importing in a fresh installation.
I will not do it like that. Here's what I presented at DCM2025 about my upgrade adventure: http://talks.bertuch.name/dcm2025-upgrade/#/
The most important bit around the actual data is that you need to be careful how you move it around.
Depending on the storage system you're using, you need to make sure not to alter the "identifier" of the storage system in the config. Otherwise, you will have to do a data migration in the database to make it available again (the ID is part of the information stored about a file in the database)
Also, if you plan on moving the data between places, you must make sure the structure is all the same.
If you are planning to migrate between storage types, things will become a bit more tricky... :see_no_evil: You'll probably be best of with a data migration for it, so you can have the new config in the upgraded instance
For your experiments you should know that in theory, it is enough to have Dataverse and Postgres running.
This will at least let you make sure the database is still available. But using a browser, the UI will not show you anything because Solr will be either missing or empty.
While it is possible to migrate Solr data, I'd advise against doing it. The reason is simple: 6.0 will require a full re-index anyway. So why bother moving that data around just to throw it away.
Of course, if you are using fulltext indexing in your instance, you will need to either disable it during experiments or make sure that you have a connection to your data. Just a word of caution: this will take a long time depending on your data amount...
If you have custom metadata schemas, don't forget to apply the necessary changes to the latest schema.xml. Otherwise your re-index is probably going to fail, as Dataverse will not be able to add the metadata from the DB to the Index.
That's all I have for now!
Last updated: Jan 09 2026 at 14:18 UTC