✔ UCLA Dataverse down might be thread pool issue · troubleshooting

Stream: troubleshooting

Topic: ✔ UCLA Dataverse down might be thread pool issue

jamie jamison (Apr 25 2024 at 19:54):

UCLA Dataverse has been down for about a day and we can't bring it up. It might be thread-pool related. Current setting:
<orb use-thread-pool-ids="thread-pool-1"></orb>

What are people setting the thread pool to?

jamie jamison (Apr 25 2024 at 22:30):

Definately screaming time. At this point we can't start up Dataverse.

Philip Durbin 🚀 (Apr 26 2024 at 01:55):

@jamie jamison sorry, I don't know what Harvard Dataverse has this set to.

And I don't see this in our guides.

In our Docker image we set this:

${ASADMIN} set default-config.thread-pools.thread-pool.thread-pool-1.max-thread-pool-size="250"

But I'm not sure if that helps.

You might want to post at https://groups.google.com/g/dataverse-community where you'll reach more people. Hang in there!

jamie jamison (Apr 26 2024 at 04:44):

Guess I'm not the only one who works odd hours (9:42 in Los Angeles). As you suggested I posted the question to the google group. Tim was wondering if this relates to the past bot problem.
As always, thank you and I'll post what we change to fix this.

Philip Durbin 🚀 (Apr 26 2024 at 10:35):

Thanks, yes, I see it: https://groups.google.com/g/dataverse-community/c/pD-TF7IJiQU/m/Tu7qhbExBAAJ

Don Sizemore (Apr 26 2024 at 11:21):

what symptoms are you seeing when you try to launch Dataverse? sustained CPU usage, or does it go idle? what does server.log look like when things stall out?

Don Sizemore (Apr 26 2024 at 13:14):

I should've asked first: how much RAM does the system have, and what's your Xmx setting in domain.xml?

jamie jamison (Apr 26 2024 at 16:08):

Its a t2.xlarge instance, 4 cpu, 16 gib ram
Xms 4096m

jamie jamison (Apr 26 2024 at 16:15):

Restarting seems to require I restart Shibboleth, httpd and payar5. I'll check with Tim Dennis for a better description but it seems to go down within about 5 minutes.

jamie jamison (Apr 26 2024 at 16:30):

[#|2024-04-25T23:26:32.917+0000|SEVERE|Payara 5.2022.4|org.glassfish.jersey.server.ServerRuntime$Responder|_ThreadID=99;_ThreadName=http-thread-pool::jk-connector(5);_TimeMillis=1714087592917;_LevelValue=1000;| An I/O error has occurred while writing a response message entity to the container output stream. org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection is closed

Tim Dennis (Apr 26 2024 at 16:42):

thanks jamie for posting, any help would be appreciated.

Don Sizemore (Apr 26 2024 at 16:54):

try bumping your JVM heap to 8192m, and restarting Payara? if the problem is Payara, httpd/shibd shouldn't need a restart

Don Sizemore (Apr 26 2024 at 16:54):

also feel free to send any pertinent server.logs to support@dataverse.org , I'll be glad to take a look.

Tim Dennis (Apr 26 2024 at 17:20):

Thanks Don

jamie jamison (Apr 26 2024 at 17:22):

I raised the heap to 8192 and restarted. Dataverse came up rather quickly, was able to log in. Cautious success!

jamie jamison (Apr 26 2024 at 17:52):

So far system still up and running. I'm going to send the payara and access logs to the support email.

I really need to learn how to monitor this system and catch problems before they get critical. How to you guys monitor your installation? (Subject wasn't covered in Art School)

Philip Durbin 🚀 (Apr 26 2024 at 17:58):

Ha, well we do have a page at https://guides.dataverse.org/en/6.2/admin/monitoring.html but it's sort of a stub. We should add more from what various installations actually do (we could start a new topic for this).

Oliver Bertuch (Apr 26 2024 at 18:33):

I suggest you take a look at the /metrics endpoint, ready to be digested by Prometheus. :hamburger:

Oliver Bertuch (Apr 26 2024 at 18:34):

You can have simple alerts with Prometheus, too.

Philip Durbin 🚀 (Apr 26 2024 at 18:36):

/me looks at https://demo.dataverse.org/metrics

jamie jamison (Apr 28 2024 at 00:41):

I have been to the monitoring page. So far trying to install munin. I seem to have depandancy problems (RedHat7.9). I might try compiling it on our test server and see how that runs.

jamie jamison (Apr 29 2024 at 04:46):

Don Sizemore said:

try bumping your JVM heap to 8192m, and restarting Payara? if the problem is Payara, httpd/shibd shouldn't need a restart

Don - I've sent the payara5 server log to the support email.
Thank you,
Jamie

jamie jamison (Apr 29 2024 at 22:11):

About munin. I've had no luck installing munin on RHEL7 either from rep or source. I seem to have a problem with perl dependencies.

Philip Durbin 🚀 (Apr 30 2024 at 01:14):

Hmm, maybe we could start a new thread about Munin but for me the killer feature was that it was so easy to install. If that has changed, it's way less attractive!

jamie jamison (Apr 30 2024 at 01:15):

Perhaps it's because I'm running an older RHEL - RHEL 7.9.

Philip Durbin 🚀 (Apr 30 2024 at 01:19):

Or not old enough. I probably last used it on CentOS 6. :crazy:

Philip Durbin 🚀 (Apr 30 2024 at 01:24):

I'm just looking at https://guide.munin-monitoring.org/en/latest/installation/install.html#redhat-centos-fedora

... do you have EPEL installed?

jamie jamison (Apr 30 2024 at 01:39):

I believe so.
Repo-id : epel/x86_64
Repo-name : Extra Packages for Enterprise Linux 7 - x86_64
Repo-revision: 1713925428
Repo-updated : Wed Apr 24 02:26:33 2024
Repo-pkgs : 13,604
Repo-size : 15 G
Repo-metalink: https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=x86_64&infra=$infra&content=$contentdir
Updated : Wed Apr 24 02:26:33 2024
Repo-baseurl : https://d2lzkl7pfhq30w.cloudfront.net/pub/epel/7/x86_64/ (64
: more)
Repo-expire : 21,600 second(s) (last: Thu Apr 25 21:40:15 2024)
Filter : read-only:present
Repo-excluded: 194
Repo-filename: /etc/yum.repos.d/epel.repo

Philip Durbin 🚀 (Apr 30 2024 at 17:59):

This is what I see on my Rocky 8 server when I run yum install munin: yum-install-munin.txt

Philip Durbin 🚀 (Apr 30 2024 at 18:39):

Bah, SELinux errors. I give up. :shrug:

jamie jamison (Apr 30 2024 at 21:03):

I'm going to try setting up the payara5 web UI again and see if that helps.

Philip Durbin 🚀 (Apr 30 2024 at 21:07):

Sounds good. I did ask at standup today if anyone had any monitoring ideas but it was just crickets. :cricket:

Philip Durbin 🚀 (Apr 30 2024 at 21:07):

You might want to ask on the google group. I'm sure someone has set up monitoring.

jamie jamison (Apr 30 2024 at 21:08):

:+1:

Notification Bot (May 16 2024 at 20:46):

Philip Durbin has marked this topic as resolved.

Last updated: Jan 09 2026 at 14:18 UTC