I asked this in the google forum (Borealis is also interested): https://groups.google.com/g/dataverse-community/c/Zf9ME-8Uc0Y
How does Harvard (IQSS) refreshΒ demo.dataverse.org? The note at the top of the page says datasets deleted after 30 days.
We would like to incorporate that on our test server.
Thanks.
So, I look a quick look and it's a bit complicated. Let me at least throw some of the code over the wall to you (and Borealis).
From root's crontab:
0 5 1 * * (cd /usr/local/dvn-admin/clean; /usr/local/dvn-admin/clean/delete_older_datasets.sh) > /dev/null 2>&1
# cat delete_older_datasets.sh
#!/bin/sh
datestamp=`date +%Y%m%d`; export datesteamp
psql -U postgres -d dvndb -t -f finddatasetstoclean.sql > cleandatasets_${datestamp}.sh
if /bin/sh cleandatasets_${datestamp}.sh
then
/bin/rm cleandatasets_${datestamp}.sh
fi
# cat finddatasetstoclean.sql
select 'curl --header "X-Dataverse-key: REDACTED" -X DELETE http://localhost:8080/api/datasets/:persistentId/destroy?persistentId=doi:'||authority||'/'||identifier||';echo;sleep 1;' from dvobject where dtype='Dataset' and modificationtime < current_date-60 and identifier not in ('FK2/Q1RSNG','FK2/XSAZXH','FK2/U6AEZM','FK2/PPPORT','FK2/AJM2AT','FK2/HJVOXL','FK2/PPIAXE','FK2/HXJVJU','FK2/QZDNI4') and owner_id not in (select id from dataverse where alias in ('trade_statistics','csvconfv723','CAP_demo','CAP_USA', 'rdprincipal')) order by indextime asc;
You end up with a shell script with lines like this in it:
curl --header "X-Dataverse-key: REDACTED" -X DELETE http://localhost:8080/api/datasets/:persistentId/destroy?persistentId=doi:10.70122/FK2/MSSBNE;echo;sleep 1;
I hope that helps! :sweat_smile:
This is awesome! We are interested as well. I have been running a terraform script to completely destroy and rebuild our QA environment from snapshots and this is waaaayy better!
so, you keep the last 60 days and then a few demo datasets that are preserved?
Yes, we skip over a few datasets to avoid deleting them.
60 days? Well, whatever that crontab says. :grinning:
Oh i was looking at the "and modificationtime < current_date-60" in the sql query .. i may be reading it wrong. This is awesome! Thanks for posting.
It worked! Just need another script/query to delete empty dataverses using that API .. working on that
Nice. I'm glad it's useful. I'm pretty sure we have @Leo Andreev to thank for writing those scripts. :grinning:
oops that is deleting everything within the last 60 days .. lol
i created a script that just deletes all of the dataverses except the root dataverse (ID = 1) using that API .. but we currently just want to wipe everything clean .. so we could set the filter to filter on specific IDs if we wanted to keep some
Oh, if you want to delete everything, you might like destroy_all_dvobjects.py from https://github.com/IQSS/dataverse-sample-data
Oh that's awesome, too! This would work well for our Dev env so will try it. For QA we will likely want to keep some, so will still use Leonid's script/query and adjust the filters, as we can be much more selective.
so for the demo dataverse, you just keep all of the dataverses ever created?
sorry to hijack this stream, @Sherry Lake .. I've been working on the same thing! :laughing:
Yes, I'm pretty sure we do keep collections. They're "free" in the sense that they only take up a tiny bit of space in the database. No files.
No problem @Dieuwertje Bloemen I'm learning lots tool!
Last updated: Nov 01 2025 at 14:11 UTC