Hi @all,
We recently had a user who uploaded more than 35.000 files into one dataset on our Dataverse (6.5) instance. Subsequently, not only the dataset, but the whole system became unstable and unusable for most of the time. Deleting the dataset via API or UI did not work anymore, we had to remove the files from the different database tables to get back a running system.
I know that Dataverse does not get along well with such a large number of files, so I was looking for a configuration setting that allows to limit the number of files per dataset. Couldn't find one, and I also could not find any related issues on GitHub. I know that @Eryk Kulikowski worked on performance improvements some time ago regarding large number of files, and I have in mind that it worked better for some time, but it seems the improvements are gone in v6.5?
So we would be interested how other instances handle such issues? Is there some kind of configuration option I just can't find? Wouldn't it make sense to have such an option, so a user cannot bring the whole system to a halt by just uploading too many files?
The issue is less than a month old: Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275
But we've talked about it off and on for years. :sweat_smile:
It completely makes sense that users should not be able to bring the system to a halt by uploading too many files!
We do have a rate limiting feature: https://guides.dataverse.org/en/6.5/installation/config.html#rate-limiting
But I'm wondering if there's a command you can target. :thinking:
When lots of upload are happening, do you see a lot of the same command in the actionlogrecord table? https://guides.dataverse.org/en/6.5/admin/monitoring.html#actionlogrecord
Thanks Phil.
There's the "CreateNewDataFilesCommand" in the actionlogrecord, but not sure if rate limiting is the right way to go here.
But glad to hear that the discussion about this is alive. We'll also have to discuss internally, if such a feature will be available, what numbers would make sense. An instance-wide config option would be sufficient for us. I'm currently quite busy with another project, but when I find the time I'll try to understand the current implementation of the collection quotas and maybe get some ideas out of it for such a file limit configuration.
Sounds good!
@Markus Haarländer
Can you say more about what tables you edited to remove the files? We at University of Virginia have such a dataset that I can't delete (over 14,000 files). See my ask in the google group: https://groups.google.com/g/dataverse-community/c/WFf34d8R0Aw
You can either reply here or send email to shlake@virginia.edu
Hi Sherry
Here's a (not very sophisticated) SQL script which I used. Not 100% sure if everything was removed, but it worked for us. The files didn't have any tags or restrictions. If they do, maybe other tables have to be cleaned, too.
UPDATE dataset SET thumbnailfile_id = NULL WHERE id = '<dataset-id>';
DELETE FROM filemetadata WHERE datafile_id IN (SELECT id FROM dvobject WHERE owner_id = '<dataset-id>');
DELETE FROM datafile WHERE id IN (SELECT id FROM dvobject WHERE owner_id = '<dataset-id>');
DELETE FROM dvobject WHERE owner_id = '<dataset-id>' AND dtype='DataFile';
Last updated: Oct 30 2025 at 06:21 UTC