UCLA Dataverse has been down for about a day and we can't bring it up. It might be thread-pool related. Current setting:
<orb use-thread-pool-ids="thread-pool-1"></orb>
What are people setting the thread pool to?
Definately screaming time. At this point we can't start up Dataverse.
@jamie jamison sorry, I don't know what Harvard Dataverse has this set to.
And I don't see this in our guides.
In our Docker image we set this:
${ASADMIN} set default-config.thread-pools.thread-pool.thread-pool-1.max-thread-pool-size="250"
But I'm not sure if that helps.
You might want to post at https://groups.google.com/g/dataverse-community where you'll reach more people. Hang in there!
Guess I'm not the only one who works odd hours (9:42 in Los Angeles). As you suggested I posted the question to the google group. Tim was wondering if this relates to the past bot problem.
As always, thank you and I'll post what we change to fix this.
Thanks, yes, I see it: https://groups.google.com/g/dataverse-community/c/pD-TF7IJiQU/m/Tu7qhbExBAAJ
what symptoms are you seeing when you try to launch Dataverse? sustained CPU usage, or does it go idle? what does server.log look like when things stall out?
I should've asked first: how much RAM does the system have, and what's your Xmx setting in domain.xml?
Its a t2.xlarge instance, 4 cpu, 16 gib ram
Xms 4096m
Restarting seems to require I restart Shibboleth, httpd and payar5. I'll check with Tim Dennis for a better description but it seems to go down within about 5 minutes.
[#|2024-04-25T23:26:32.917+0000|SEVERE|Payara 5.2022.4|org.glassfish.jersey.server.ServerRuntime$Responder|_ThreadID=99;_ThreadName=http-thread-pool::jk-connector(5);_TimeMillis=1714087592917;_LevelValue=1000;| An I/O error has occurred while writing a response message entity to the container output stream. org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection is closed
thanks jamie for posting, any help would be appreciated.
try bumping your JVM heap to 8192m, and restarting Payara? if the problem is Payara, httpd/shibd shouldn't need a restart
also feel free to send any pertinent server.logs to support@dataverse.org , I'll be glad to take a look.
Thanks Don
I raised the heap to 8192 and restarted. Dataverse came up rather quickly, was able to log in. Cautious success!
So far system still up and running. I'm going to send the payara and access logs to the support email.
I really need to learn how to monitor this system and catch problems before they get critical. How to you guys monitor your installation? (Subject wasn't covered in Art School)
Ha, well we do have a page at https://guides.dataverse.org/en/6.2/admin/monitoring.html but it's sort of a stub. We should add more from what various installations actually do (we could start a new topic for this).
I suggest you take a look at the /metrics endpoint, ready to be digested by Prometheus. :hamburger:
You can have simple alerts with Prometheus, too.
/me looks at https://demo.dataverse.org/metrics
I have been to the monitoring page. So far trying to install munin. I seem to have depandancy problems (RedHat7.9). I might try compiling it on our test server and see how that runs.
Don Sizemore said:
try bumping your JVM heap to 8192m, and restarting Payara? if the problem is Payara, httpd/shibd shouldn't need a restart
Don - I've sent the payara5 server log to the support email.
Thank you,
Jamie
About munin. I've had no luck installing munin on RHEL7 either from rep or source. I seem to have a problem with perl dependencies.
Hmm, maybe we could start a new thread about Munin but for me the killer feature was that it was so easy to install. If that has changed, it's way less attractive!
Perhaps it's because I'm running an older RHEL - RHEL 7.9.
Or not old enough. I probably last used it on CentOS 6. :crazy:
I'm just looking at https://guide.munin-monitoring.org/en/latest/installation/install.html#redhat-centos-fedora
... do you have EPEL installed?
I believe so.
Repo-id : epel/x86_64
Repo-name : Extra Packages for Enterprise Linux 7 - x86_64
Repo-revision: 1713925428
Repo-updated : Wed Apr 24 02:26:33 2024
Repo-pkgs : 13,604
Repo-size : 15 G
Repo-metalink: https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=x86_64&infra=$infra&content=$contentdir
Updated : Wed Apr 24 02:26:33 2024
Repo-baseurl : https://d2lzkl7pfhq30w.cloudfront.net/pub/epel/7/x86_64/ (64
: more)
Repo-expire : 21,600 second(s) (last: Thu Apr 25 21:40:15 2024)
Filter : read-only:present
Repo-excluded: 194
Repo-filename: /etc/yum.repos.d/epel.repo
This is what I see on my Rocky 8 server when I run yum install munin: yum-install-munin.txt
Bah, SELinux errors. I give up. :shrug:
I'm going to try setting up the payara5 web UI again and see if that helps.
Sounds good. I did ask at standup today if anyone had any monitoring ideas but it was just crickets. :cricket:
You might want to ask on the google group. I'm sure someone has set up monitoring.
:+1:
Philip Durbin has marked this topic as resolved.
Last updated: Oct 30 2025 at 06:21 UTC