Stream: community

Topic: optimal architecture for Dataverse


view this post on Zulip jamie jamison (Mar 13 2024 at 23:33):

I'm not sure where this question would go. The system admin helping me with up grades asked this.

About system architecture:

view this post on Zulip jamie jamison (Mar 13 2024 at 23:37):

I'm not completely sure where this question should go. Its from the system admin who helps with our upgrades. The tipping point between on machine for everything and splitting into dedicated hosts doesn't seem to be covered in the documentation so I couldn't come close to answering his question.

About the system architecture:
what is the preferred and/or optimal architecture for supporting the Dataverse application?

all-in-one (where dataverse, solr, postgres reside on a single host)
or distributed (where dataverse, solr, postgres reside on their own separate dedicated hosts)

what are the metrics (e.g. system resource utilization, number of users, etc.) that indicate moving from an all-in-one to distributed architecture is recommended?
at what point does the community recommend an all-in-one architecture be adjusted to use a distributed architecture?

If we moved our architecture to a distributed model that includes a single dataverse node, a single solr node, and a single postgres node - does the community see any reason why this could be problematic from a configuration, complexity, or supportability standpoint?

Thank you,

Jamie Jamison / UCLA Dataverse

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 14 2024 at 00:11):

If it helps, here's the write up of the architecture for Harvard Dataverse (not all-in-one): https://guides.dataverse.org/en/6.1/installation/prep.html#hardware-requirements

view this post on Zulip Oliver Bertuch (Mar 14 2024 at 07:43):

I'm not an expert for classic installations. That said: these questions really are one of the reasons why containers are so strong. The different components of Dataverse are singled out to single containers, easy to move around etc. Single VM per function is a healthy pattern, but creates a lot of overhead for OS maintenance.

view this post on Zulip Oliver Bertuch (Mar 14 2024 at 07:44):

If you go for some tooling to deal with that (e.g. Terraform), it is getting cheaper to setup these things. But it will require learning new tools.

view this post on Zulip Oliver Bertuch (Mar 14 2024 at 07:45):

I suggest going for an AIO first and setting up proper monitoring to gather these metrics

view this post on Zulip Oliver Bertuch (Mar 14 2024 at 07:45):

RAM is probably the most pressing issue, depending on your workload.

view this post on Zulip Oliver Bertuch (Mar 14 2024 at 07:46):

I dunno about your environment, but maybe the already have a system like Prometheus / InfluxDB / ... in place where you could gather information and see if you're running short on resources.

view this post on Zulip Oliver Bertuch (Mar 14 2024 at 07:49):

The /metrics endpoint is already exposing a lot of details about resources Payara uses. Combining with metrics from the reverse proxy about the numbers of requests etc this might give you some real insight into how your folks use your instance and where bottlenecks are.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 14 2024 at 11:39):

These are the sorts of things I worry about with regard to the deployed architecture being distributed or centralized:

view this post on Zulip jamie jamison (Mar 19 2024 at 17:43):

Thank you both for the input. The question was really from the systems person who helps me with the updates and maintenance. I'll go over the guide and try to get back to him with an answer.
UCLA Dataverse started with a smaller system. AWS, on EC2 instance and s3 buckets for data. Depositing is limited to the UCLA community though others can download via federated login. At some point in the future it will have to be broken up but that is beyond my experience.

view this post on Zulip Philip Durbin ๐Ÿš€ (Mar 20 2024 at 01:20):

Ok, please keep us posted and let us know if we can answer any questions!


Last updated: Nov 01 2025 at 14:11 UTC