I've created some API scripts to update the metadata of multiple datasets. I'm now testing the scripts in our test environment, which currently is a copy of production from yesterday, but we switched the DOIs from production DOIs (10.18710) to our test DOIs (10.21337). Here's what I've been doing:
*{
"id": 2966,
"datasetId": 175757,
"datasetPersistentId": "doi:10.21337/MXCA5S",
"storageIdentifier": "S3://10.18710/MXCA5S",
"versionNumber": 1,
"versionMinorNumber": 0,
"versionState": "RELEASED",*
Hmm, what errors do you get from the client side (curl, python, etc.)?
And what errors do you get in server.log?
Thanks! No errors in the command line. I'll need to ask our devops for the server.log. I'm now trying this on production for one dataset without publishing.
I forgot to mention, we only copied the metadata, not the data to test.
What if you try downloading the JSON from a dataset in your test environment and try to make a change? Does that work? Just a simple change like an edit to the description or something.
Yes, that's basically what I've been doing, but uploading the modified JSON file does not work :-/
Maybe we should get you set up with a dev environment on your laptop so you can see server.log. :big_smile:
@Oliver Bertuch what do you think? Is it time for Docker?
Probably...?
Easiest way to setup a clean environment
Yes, but the idea was to test it on ~identical datasets before we run it on prod.
What if I tried to import your prod JSON into my dev environment? Would that be a good test? I'm running the tip of the develop branch.
Thanks, I might want to do that. Let me just first test a couple of datasets on prod.
In the same clean-up job, I'll be publishing new versions of about 800 datasets. From previous, similar jobs (e.g. uploading many files to a dataset via API), I've learned to put a sleep command after each API publishing command. This means running the script will take about 8-10 hours. The idea is to disable login during that time. Now, to reduce the work load, I'm considering turning off file validation, like this:
Before the script is run:
curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FileValidationOnPublishEnabled
After the script is run:
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FileValidationOnPublishEnabled
None of the changes are at file level. Are there any concerns turning off file validation in this case?
I don't think so. And I saw your mailing list post. You're only changing metadata, not files. Should be fine.
Great, thanks, good to get this confirmed. I think turning it off will make the process smoother and faster.
Last updated: Nov 01 2025 at 14:11 UTC