We have a dataset with a large number of files (about 450000). Loading this dataset takes about 30 seconds, and raises the (permanent) memory usage of payara by at least 1GB (based on the garbage collection logs).
Is there a know mitigation for this? Some switch to disable file listing or similar?
As it is now, about 20 loads of this dataset in the browser sends payara thrashing in GC hell. The only fix I know is restarting payara.
Hmm, can you put those ~half a million files in a zip and use that instead as a new version of the dataset? We do have a nice zip previewer/downloader.
Possibly, of course. I will have to talk with the uploader. Is this the only way?
Of course, no single files could be download then. Or not even some smaller parts (sub-directories)...
With the zip previewer/downloader, single files can be downloaded. You can try it on a zip file in my dataset if you like: https://dataverse.harvard.edu/file.xhtml?fileId=6867328&version=4.0
Cool, I will look into this.
https://dataverse.harvard.edu/dataverse/ashkelonexcavations has 28K files. One dataset per file. Perhaps an extreme example but another way of avoiding having too many files in a single dataset.
Yeah, one file per dataset is not really a good solution IMHO.
Also, the zip previewer needs to get the file list from the zip on the server, and parse it. With 450k files, it may crash the browser doing that...
Also, the "all files in a dataverse" approach makes it possible to search among them.
Well, perhaps @Markus Haarländer, author of the zip previewer/downloader, can confirm, but I believe it only downloads the bytes it needs to get the list.
Yes, it really only downloads the needed parts of the zip. However, representing a 450k long list in javascript and/or in the DOM is challenging for most browsers.
Thank you for your help. I will try to convince the dataset owner to use a smaller number (~1000) of zip files, that way some of the search functionality remains but listing would be fast and would not leak (that much) memory.
If it helps, @Ceilyn Boyd gave a talk fairly recently called "Transforming a Digital Collection into a Data Collection". About 80K files were in play: https://groups.google.com/g/dataverse-community/c/Teb7_Pj2ajg/m/HO0E0vMnAQAJ
If I upload a zip (even through the API), it will be decompressed. How can you upload a zip so that it is left as a zip?
Double-zip it :smiley: (The official workaround)
Yes, a please consider voting and commenting on this issue: Support uploading of archives (ZIP, other). #8029
Last updated: Nov 01 2025 at 14:11 UTC