Stream: troubleshooting

Topic: โœ” UCLA Dataverse down might be thread pool issue


view this post on Zulip jamie jamison (Apr 25 2024 at 19:54):

UCLA Dataverse has been down for about a day and we can't bring it up. It might be thread-pool related. Current setting:
<orb use-thread-pool-ids="thread-pool-1"></orb>

What are people setting the thread pool to?

view this post on Zulip jamie jamison (Apr 25 2024 at 22:30):

Definately screaming time. At this point we can't start up Dataverse.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2024 at 01:55):

@jamie jamison sorry, I don't know what Harvard Dataverse has this set to.

And I don't see this in our guides.

In our Docker image we set this:

${ASADMIN} set default-config.thread-pools.thread-pool.thread-pool-1.max-thread-pool-size="250"

But I'm not sure if that helps.

You might want to post at https://groups.google.com/g/dataverse-community where you'll reach more people. Hang in there!

view this post on Zulip jamie jamison (Apr 26 2024 at 04:44):

Guess I'm not the only one who works odd hours (9:42 in Los Angeles). As you suggested I posted the question to the google group. Tim was wondering if this relates to the past bot problem.
As always, thank you and I'll post what we change to fix this.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2024 at 10:35):

Thanks, yes, I see it: https://groups.google.com/g/dataverse-community/c/pD-TF7IJiQU/m/Tu7qhbExBAAJ

view this post on Zulip Don Sizemore (Apr 26 2024 at 11:21):

what symptoms are you seeing when you try to launch Dataverse? sustained CPU usage, or does it go idle? what does server.log look like when things stall out?

view this post on Zulip Don Sizemore (Apr 26 2024 at 13:14):

I should've asked first: how much RAM does the system have, and what's your Xmx setting in domain.xml?

view this post on Zulip jamie jamison (Apr 26 2024 at 16:08):

Its a t2.xlarge instance, 4 cpu, 16 gib ram
Xms 4096m

view this post on Zulip jamie jamison (Apr 26 2024 at 16:15):

Restarting seems to require I restart Shibboleth, httpd and payar5. I'll check with Tim Dennis for a better description but it seems to go down within about 5 minutes.

view this post on Zulip jamie jamison (Apr 26 2024 at 16:30):

[#|2024-04-25T23:26:32.917+0000|SEVERE|Payara 5.2022.4|org.glassfish.jersey.server.ServerRuntime$Responder|_ThreadID=99;_ThreadName=http-thread-pool::jk-connector(5);_TimeMillis=1714087592917;_LevelValue=1000;| An I/O error has occurred while writing a response message entity to the container output stream. org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection is closed

view this post on Zulip Tim Dennis (Apr 26 2024 at 16:42):

thanks jamie for posting, any help would be appreciated.

view this post on Zulip Don Sizemore (Apr 26 2024 at 16:54):

try bumping your JVM heap to 8192m, and restarting Payara? if the problem is Payara, httpd/shibd shouldn't need a restart

view this post on Zulip Don Sizemore (Apr 26 2024 at 16:54):

also feel free to send any pertinent server.logs to support@dataverse.org , I'll be glad to take a look.

view this post on Zulip Tim Dennis (Apr 26 2024 at 17:20):

Thanks Don

view this post on Zulip jamie jamison (Apr 26 2024 at 17:22):

I raised the heap to 8192 and restarted. Dataverse came up rather quickly, was able to log in. Cautious success!

view this post on Zulip jamie jamison (Apr 26 2024 at 17:52):

So far system still up and running. I'm going to send the payara and access logs to the support email.

I really need to learn how to monitor this system and catch problems before they get critical. How to you guys monitor your installation? (Subject wasn't covered in Art School)

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2024 at 17:58):

Ha, well we do have a page at https://guides.dataverse.org/en/6.2/admin/monitoring.html but it's sort of a stub. We should add more from what various installations actually do (we could start a new topic for this).

view this post on Zulip Oliver Bertuch (Apr 26 2024 at 18:33):

I suggest you take a look at the /metrics endpoint, ready to be digested by Prometheus. :hamburger:

view this post on Zulip Oliver Bertuch (Apr 26 2024 at 18:34):

You can have simple alerts with Prometheus, too.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 26 2024 at 18:36):

/me looks at https://demo.dataverse.org/metrics

view this post on Zulip jamie jamison (Apr 28 2024 at 00:41):

I have been to the monitoring page. So far trying to install munin. I seem to have depandancy problems (RedHat7.9). I might try compiling it on our test server and see how that runs.

view this post on Zulip jamie jamison (Apr 29 2024 at 04:46):

Don Sizemore said:

try bumping your JVM heap to 8192m, and restarting Payara? if the problem is Payara, httpd/shibd shouldn't need a restart

Don - I've sent the payara5 server log to the support email.
Thank you,
Jamie

view this post on Zulip jamie jamison (Apr 29 2024 at 22:11):

About munin. I've had no luck installing munin on RHEL7 either from rep or source. I seem to have a problem with perl dependencies.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 01:14):

Hmm, maybe we could start a new thread about Munin but for me the killer feature was that it was so easy to install. If that has changed, it's way less attractive!

view this post on Zulip jamie jamison (Apr 30 2024 at 01:15):

Perhaps it's because I'm running an older RHEL - RHEL 7.9.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 01:19):

Or not old enough. I probably last used it on CentOS 6. :crazy:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 01:24):

I'm just looking at https://guide.munin-monitoring.org/en/latest/installation/install.html#redhat-centos-fedora

... do you have EPEL installed?

view this post on Zulip jamie jamison (Apr 30 2024 at 01:39):

I believe so.
Repo-id : epel/x86_64
Repo-name : Extra Packages for Enterprise Linux 7 - x86_64
Repo-revision: 1713925428
Repo-updated : Wed Apr 24 02:26:33 2024
Repo-pkgs : 13,604
Repo-size : 15 G
Repo-metalink: https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=x86_64&infra=$infra&content=$contentdir
Updated : Wed Apr 24 02:26:33 2024
Repo-baseurl : https://d2lzkl7pfhq30w.cloudfront.net/pub/epel/7/x86_64/ (64
: more)
Repo-expire : 21,600 second(s) (last: Thu Apr 25 21:40:15 2024)
Filter : read-only:present
Repo-excluded: 194
Repo-filename: /etc/yum.repos.d/epel.repo

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 17:59):

This is what I see on my Rocky 8 server when I run yum install munin: yum-install-munin.txt

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 18:39):

Bah, SELinux errors. I give up. :shrug:

view this post on Zulip jamie jamison (Apr 30 2024 at 21:03):

I'm going to try setting up the payara5 web UI again and see if that helps.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 21:07):

Sounds good. I did ask at standup today if anyone had any monitoring ideas but it was just crickets. :cricket:

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 30 2024 at 21:07):

You might want to ask on the google group. I'm sure someone has set up monitoring.

view this post on Zulip jamie jamison (Apr 30 2024 at 21:08):

:+1:

view this post on Zulip Notification Bot (May 16 2024 at 20:46):

Philip Durbin has marked this topic as resolved.


Last updated: Oct 30 2025 at 06:21 UTC