Stream: troubleshooting

Topic: Full text indexing MATLAB .fig files


view this post on Zulip Henning Timm (Mar 25 2024 at 18:19):

Does anyone have experience how matlab .fig figure files interact with Dataverse's full-text index? One of our scientists uploaded such files and the tika parser crashes with the following exception

  org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mat.MatParser@4a18daf3

Unfortunately, I have no easy way to verify if the files themselves are corrupted and I would be fine to just exclude them from the full text index.

Is there any way to exclude those files from indexing? They are not particularly large, so exclduing them via size is not an option.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 25 2024 at 18:21):

Hmm, as you say we have a setting called :SolrMaxFileSizeForFullTextIndexing but nothing about file types.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 25 2024 at 18:22):

@Henning Timm please feel free to open an issue: https://github.com/IQSS/dataverse/issues

view this post on Zulip Henning Timm (Mar 27 2024 at 15:19):

Thanks for the reply Phil! In our case this error was probably responsible for gobbling up enough memory to make the Dataverse Server swap and then stay in HTTP 503 until we did a restart. We will try to reproduce the problem before opening an issue. Its still unclear if our users files were the problem (e.g. if they were corrupted somehow).

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 27 2024 at 15:52):

Sounds good, thanks.

view this post on Zulip Henning Timm (Apr 26 2024 at 14:06):

Some small updates on this:

The MatParser crashes for (our) matlab figure files, but this does not seem cause any further problems. I will open an issue once I have a matlab .fig file that causes this problem and that I am allowed to pass on. Our affected files are under embargo.


Last updated: Oct 30 2025 at 05:14 UTC