Stream: community

Topic: Refreshing Demo Sites


view this post on Zulip Sherry Lake (Aug 21 2024 at 12:51):

I asked this in the google forum (Borealis is also interested): https://groups.google.com/g/dataverse-community/c/Zf9ME-8Uc0Y

How does Harvard (IQSS) refreshΒ demo.dataverse.org? The note at the top of the page says datasets deleted after 30 days.

We would like to incorporate that on our test server.
Thanks.

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 13:35):

So, I look a quick look and it's a bit complicated. Let me at least throw some of the code over the wall to you (and Borealis).

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 13:35):

From root's crontab:

0 5 1 * * (cd /usr/local/dvn-admin/clean; /usr/local/dvn-admin/clean/delete_older_datasets.sh) > /dev/null 2>&1

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 13:35):

# cat delete_older_datasets.sh
#!/bin/sh

datestamp=`date +%Y%m%d`; export datesteamp
psql -U postgres -d dvndb -t -f finddatasetstoclean.sql > cleandatasets_${datestamp}.sh
if /bin/sh cleandatasets_${datestamp}.sh
then
    /bin/rm cleandatasets_${datestamp}.sh
fi

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 13:37):

# cat finddatasetstoclean.sql

select 'curl --header "X-Dataverse-key: REDACTED" -X DELETE http://localhost:8080/api/datasets/:persistentId/destroy?persistentId=doi:'||authority||'/'||identifier||';echo;sleep 1;' from dvobject where dtype='Dataset' and modificationtime < current_date-60 and identifier not in ('FK2/Q1RSNG','FK2/XSAZXH','FK2/U6AEZM','FK2/PPPORT','FK2/AJM2AT','FK2/HJVOXL','FK2/PPIAXE','FK2/HXJVJU','FK2/QZDNI4') and owner_id not in (select id from dataverse where alias in ('trade_statistics','csvconfv723','CAP_demo','CAP_USA', 'rdprincipal')) order by indextime asc;

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 13:38):

You end up with a shell script with lines like this in it:

curl --header "X-Dataverse-key: REDACTED" -X DELETE http://localhost:8080/api/datasets/:persistentId/destroy?persistentId=doi:10.70122/FK2/MSSBNE;echo;sleep 1;

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 13:38):

I hope that helps! :sweat_smile:

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 15:15):

This is awesome! We are interested as well. I have been running a terraform script to completely destroy and rebuild our QA environment from snapshots and this is waaaayy better!

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 15:30):

so, you keep the last 60 days and then a few demo datasets that are preserved?

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 15:32):

Yes, we skip over a few datasets to avoid deleting them.

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 15:33):

60 days? Well, whatever that crontab says. :grinning:

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 15:34):

Oh i was looking at the "and modificationtime < current_date-60" in the sql query .. i may be reading it wrong. This is awesome! Thanks for posting.

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 16:29):

It worked! Just need another script/query to delete empty dataverses using that API .. working on that

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 17:33):

Nice. I'm glad it's useful. I'm pretty sure we have @Leo Andreev to thank for writing those scripts. :grinning:

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 17:56):

oops that is deleting everything within the last 60 days .. lol

i created a script that just deletes all of the dataverses except the root dataverse (ID = 1) using that API .. but we currently just want to wipe everything clean .. so we could set the filter to filter on specific IDs if we wanted to keep some

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 17:57):

Oh, if you want to delete everything, you might like destroy_all_dvobjects.py from https://github.com/IQSS/dataverse-sample-data

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 18:04):

Oh that's awesome, too! This would work well for our Dev env so will try it. For QA we will likely want to keep some, so will still use Leonid's script/query and adjust the filters, as we can be much more selective.

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 18:05):

so for the demo dataverse, you just keep all of the dataverses ever created?

view this post on Zulip Deirdre Kirmis (Aug 21 2024 at 18:06):

sorry to hijack this stream, @Sherry Lake .. I've been working on the same thing! :laughing:

view this post on Zulip Philip Durbin πŸš€ (Aug 21 2024 at 18:06):

Yes, I'm pretty sure we do keep collections. They're "free" in the sense that they only take up a tiny bit of space in the database. No files.

view this post on Zulip Sherry Lake (Aug 21 2024 at 18:09):

No problem @Dieuwertje Bloemen I'm learning lots tool!


Last updated: Nov 01 2025 at 14:11 UTC