Does anyone have experience how matlab .fig figure files interact with Dataverse's full-text index? One of our scientists uploaded such files and the tika parser crashes with the following exception
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mat.MatParser@4a18daf3
Unfortunately, I have no easy way to verify if the files themselves are corrupted and I would be fine to just exclude them from the full text index.
Is there any way to exclude those files from indexing? They are not particularly large, so exclduing them via size is not an option.
Hmm, as you say we have a setting called :SolrMaxFileSizeForFullTextIndexing but nothing about file types.
@Henning Timm please feel free to open an issue: https://github.com/IQSS/dataverse/issues
Thanks for the reply Phil! In our case this error was probably responsible for gobbling up enough memory to make the Dataverse Server swap and then stay in HTTP 503 until we did a restart. We will try to reproduce the problem before opening an issue. Its still unclear if our users files were the problem (e.g. if they were corrupted somehow).
Sounds good, thanks.
Some small updates on this:
The MatParser crashes for (our) matlab figure files, but this does not seem cause any further problems. I will open an issue once I have a matlab .fig file that causes this problem and that I am allowed to pass on. Our affected files are under embargo.
Last updated: Oct 30 2025 at 05:14 UTC