ingest job exhausting resources · troubleshooting

Stream: troubleshooting

Topic: ingest job exhausting resources

Jay Sundu (Mar 20 2025 at 14:23):

Hi there. We have an ingest job that is exhausting all our resources. We have run the imqcmd purge command to try to clear the job, but it will not clear for some reason. Are there any other steps we can take to clear the job?

Don Sizemore (Mar 20 2025 at 14:47):

it was my understanding that I could purge the job queue, but not running jobs - I just had to wait.

Don Sizemore (Mar 20 2025 at 15:20):

A number of installations preempt this problem by setting https://guides.dataverse.org/en/latest/installation/config.html#tabularingestsizelimit to some fraction of your Payara JVM heap setting. Leonid has said that R formats in particular can consume up to 10x the file size in memory during ingest.

Jay Sundu (Mar 20 2025 at 15:22):

That's very helpful @Don Sizemore I'll try that once this job finishes.

Jay Sundu (Mar 20 2025 at 16:29):

Is it possible to separate out the ingest process onto another machine? Has anyone else done that? We're thinking of using a worker job on another machine whose only job would be to process ingest jobs.

Don Sizemore (Mar 20 2025 at 16:30):

There is a proposal to do exactly that but I don't think the work has been planned / picked up yet.

Philip Durbin 🚀 (Mar 20 2025 at 16:36):

Yeah. Here's a related issue: Ingest Modularity/Improvements #7852

Jay Sundu (Mar 20 2025 at 17:30):

Is there a way to know if this particular job is actually making progress? How can we monitor it and know when it's done?

Jay Sundu (Mar 20 2025 at 17:51):

Is it feasible (and safe) to run two active Dataverse instances on different VMs, but using the same database, filesystem, etc.? We're wondering if, in that setup, we could load balance ingestion requests to one of the DV instances and web requests to the other. If it's possible without risking data corruption, that would eliminate the problem of ingestion interfering with web users.

Philip Durbin 🚀 (Mar 20 2025 at 18:16):

That's what https://github.com/IQSS/dataverse.harvard.edu/issues/111 is about, setting up a dedicated ingest server for Harvard Dataverse. We haven't done it though, and that issue is quite old at this point.

Don Sizemore (Mar 20 2025 at 18:17):

Harvard runs with a dual-application-node setup and has for some time: https://guides.dataverse.org/en/latest/installation/prep.html though there were I think two concurrency problems in the database over the years.

Philip Durbin 🚀 (Mar 20 2025 at 18:17):

It is possible and I dare say safe to run multiple app servers pointed at the same database. We do this for Harvard Dataverse (two app servers) but you'll want to keep in mind the caveats at https://guides.dataverse.org/en/6.6/installation/advanced.html#multiple-app-servers

Jay Sundu (Mar 20 2025 at 18:23):

Thanks! What about the question of monitoring the ingest job. Is there a way to observe it's progress? We just want to make sure that it is in fact making progress.

Philip Durbin 🚀 (Mar 20 2025 at 18:34):

Hmm, nothing at https://guides.dataverse.org/en/6.6/admin/troubleshooting.html#long-running-ingest-jobs-have-exhausted-system-resources

Philip Durbin 🚀 (Mar 20 2025 at 18:34):

I assume that's where you found the imqcmd command.

Don Sizemore (Mar 20 2025 at 18:38):

@Jay Sundu if you're running Linux and have strace installed, you can watch the system calls made by the sub-process handling ingest. In my case I could see it reading and seeking, and just let it finish.

Don Sizemore (Mar 20 2025 at 18:40):

IIRC you can find the subprocess in top by pressing H? then strace -p pid

Don Sizemore (Mar 20 2025 at 18:40):

Stopping and starting Payara will only slow things down, as Payara will maintain job state and pick up where it left off once you start it back up.

Jay Sundu (Mar 20 2025 at 19:54):

FYI, our long running job just finished and I've put in place the TabularIngestSizeLimit so hopefully that'll give us some safety but we're still looking at perhaps setting up another instance to offload the ingest process. Thanks for all your help!

Philip Durbin 🚀 (Mar 20 2025 at 20:11):

Phew! How long did it take?

Jay Sundu (Mar 20 2025 at 21:02):

About twenty hours.

Philip Durbin 🚀 (Mar 20 2025 at 21:03):

Wow, what kind of file was it?

Jay Sundu (Mar 20 2025 at 21:05):

There were six 3-5GB files with TXT and CSV. I haven't seen them myself yet just was told by the person who did the uploading what they were.

Don Sizemore (Mar 20 2025 at 21:07):

now THAT's gonna be some variable-level metadata!

Philip Durbin 🚀 (Mar 20 2025 at 21:07):

Interesting. Was ingest successful?

Jay Sundu (Mar 20 2025 at 22:02):

Apparently the publish is still in progress

Don Sizemore (Mar 21 2025 at 11:33):

@Jay Sundu Dataverse will verify checksums on dataset publication; on larger files this can take some time depending on your datastore type. There is a maximum setting for that as well, but I haven't yet implemented it.

Philip Durbin 🚀 (Mar 21 2025 at 12:14):

https://guides.dataverse.org/en/6.6/installation/config.html#datasetchecksumvalidationsizelimit

Last updated: Oct 30 2025 at 06:21 UTC