https://github.com/gdcc/dataverse-ansible/issues/268 has the details but in short:
@Ash Manda thanks for working on this! In the issue you asked for more details.
Yes, I think this should be an additional flag.
Here are the current flags:
% ./ec2-create-instance.sh -h
Usage: ./ec2-create-instance.sh -b <branch> -r <repo> -p <pem_path> -g <group_vars> -a <dataverse-ansible branch> -i aws_image -u aws_user -s aws_size -t aws_tag -f aws_security group -e aws_profile -l local_log_path -d -v
default branch is develop
default repo is https://github.com/IQSS/dataverse
default .pem location is /Users/pdurbin
example group_vars may be retrieved from https://raw.githubusercontent.com/GlobalDataverseCommunityConsortium/dataverse-ansible/develop/defaults/main.yml
default AWS AMI ID is ami-08ace1224b75d38c1, find the full list at https://rockylinux.org/ami/
default AWS user is rocky
default AWS instance size is t3a.large
default AWS security group is dataverse-sg
local log path will rsync Payara, Jacoco, Maven and other logs back to the specified path
-d will destroy (terminate) the AWS instance once testing and reporting completes
-v increases Ansible output verbosity
How about -c for "container mode"? (-d is already used for "destroy".)
Putting my shell dev hat on: -c sounds reasonable.
Putting my devops hat on: dropping the script and replacing it with OpenTofu and potentially some Ansible sounds like a better plan than maintaining a shell script.
(Ansible is probably no longer necessary when using OpenTofu to create all the AWS stuff and using cloud-init to bootstrap the node. But it may still come in handy.)
@Oliver Bertuch yeah? Are you thinking greenfield? :smile: Don't hack on the existing script? :thinking:
Here's where we document the existing script, by the way: https://guides.dataverse.org/en/6.10.1/developers/deployment.html#deploying-the-dataverse-software-to-amazon-web-services-aws
If we write a new script, we could just document it under a different heading, or otherwise tweak those docs so that they make sense.
OpenTofu != Script. Open Tofu means Infrastructure as Code, using a declarative language to define the infrastructure and then create it. It usually keeps a state, too, which might not be that relevant for ephemeral deployments.
But again: that's just me. If y'all feel just adding "-c" is good enough, do it!
@Don Sizemore can you please remind us... some of the environments we spin up with the existing script are not especially ephemeral, right? I'm think you may have created beta.dataverse.org with it. :thinking:
@Oliver Bertuch imagine if we spin up beta or demo.dataverse.org on Docker some day.
Please don't.
A mere docker thing is not very scalable
For anything that is meant for production, you need probably more sophisticated things.
sure
The primary use case is to demo features before they are merged. Maybe the instance stays up for two weeks or a month.
Also, seeing how the industry and tech stack changes, for a system that needs long term stability, I wouldn't recommend Docker. It has too many security problems. In addition, Docker itself is a company that needs to make money. It may be a good idea to use something with less dependencies on them. On the other hand, you will have to make some buy-in into a container runtime anyway... ![]()
Sure, for something like that, very close to dev, Docker or Podman can be a great fit, as it is well known by many devs already.
right
At some point it makes sense to try to rope in things like K8s (or similar orchestrators) into development as well. This makes development and production more aligned, making the ops side a lot easier for everyone. But that's another tale to tell :see_no_evil:
sure
I should stop babbling now and take the hint :face_with_peeking_eye:
If it helps, I these are the steps we want to automate with a -c flag:
docker compose upI think we should start with the first four steps and see where this goes! :crazy:
This sounds like a perfect usecase for Terraform
Or as @Oliver Bertuch rightly mentioned, OpenTofu
Are they both open source? Our existing bash script is. :smile:
OpenTofu is a fork of Terraform as HashiCorp is not very open for contributions to Terraform.
I love the idea @Philip Durbin ๐ and @Don Sizemore loves it too!
I am however of the mentality that Jenkins is a legacy project, and if you ask me we should shy away from adding more features to it and figure out more modern ways to do prototyping. Please do correct me if I'm wrong.
With OpenTofu, you clone the repo, you type in tofu init, and tofu apply and the EC2 instance will be up however long y'all want it to. tofu destroy and its gone.
Hmm, would you be able to put together a proof of concept I can try?
Absolutely
podman instead of Docker?
Just to make sure we're on the same page here.
Philip Durbin ๐ said:
- Spin up an empty EC2 instance
- Install Docker
That's the Tofu job.
- Download the compose file from https://guides.dataverse.org/en/6.10.1/container/running/demo.html#quickstart
This is part Tofu, part cloud-init
- Run
docker compose up- Install Traefik? Or nginx? https://github.com/IQSS/dataverse/issues/10359 . And configure it.
- Wait, what about the frontend? We need that too. I think it was added in https://github.com/gdcc/dataverse-ansible/pull/278
- And now we need Keycloak. Add it to the compose file above? Or create a new compose file? :thinking:
- etc., etc.
That's the part for a cloud-init script.
yep, that's what i was thinking too
I do something very similar in our Dataverse installations to renew certs without having to manually place them.
I'll come back with a PoC, and y'all can help me iterate on it
can I use the same creds that I currently use for ec2-create-instance on Harvard's AWS VPC?
The interesting part is always: how do I ship config etc into the spun up instance. This is where cloud-init and GitOps can shine.
If I would try to write this as a Tofu job, I'd try to ship the compose file as part of the cloud-init payload via Tofu. Then a service unit for systemd to spin up compose with it, enabled by cloud-init.
This keeps all the pieces like configuring the application etc as a job for compose, making it very flexible what to include etc.
One thing to keep note of: cloud-init may need a list of /etc/hosts entries, depending on how the composed services are exposed via AWS.
well only one way to find out!
Ash Manda said:
can I use the same creds that I currently use for ec2-create-instance on Harvard's AWS VPC?
?
Ash Manda said:
Ash Manda said:
can I use the same creds that I currently use for ec2-create-instance on Harvard's AWS VPC?
?
I suppose only @Philip Durbin ๐ can answer that. I don't have procura to spend money for IQSS :see_no_evil: They have standup now
okee thank you
Should we do a zoom call? :sweat_smile:
Love to, but I need to run errands with my kids and will be off in ~10
we can set up a call later at a time that works for everyone. I'll get back to you with a PoC and my findings in the meanwhile
19:00 my time could work for me
@Oliver Bertuch wanna try the clock button? :smile:
![]()
Hell yeah
would work for me!
so much easier, thanks!
Could work for me. What do you think, @Ash Manda?
Sure, as long as it is a quick call! I'll be off the clock this week and come back on Monday. Since I'm a GRA I can only work 20hrs :(
Great, we'll make it quick. We can use my Zoom at https://harvard.zoom.us/my/pdurbin?pwd=em1WNUZGbnY2YjhxNEdSbjJJMXNSUT09 (in ~90 minutes)
Philip Durbin ๐ said:
https://harvard.zoom.us/my/pdurbin?pwd=em1WNUZGbnY2YjhxNEdSbjJJMXNSUT09
Whatever works!
Ash Manda said:
can I use the same creds that I currently use for ec2-create-instance on Harvard's AWS VPC?
yes
talk soon!
I created an issue we can discuss and edit:
Dataverse in Docker in the cloud (with Keycloak)ย #12369
@Oliver Bertuch @Ash Manda ready? https://harvard.zoom.us/my/pdurbin?pwd=em1WNUZGbnY2YjhxNEdSbjJJMXNSUT09
Almost there
do we even want global state for a use case like this?
Not necessarily. I was asking mostly because we will need ways to keep track of individual state to enable tearing down things again.
Otherwise keeping all these EC2s around is gonna cost IQSS :money:
probably a lifecycle block with local state?
I dunno, we'll figure smth out
It would probably be good to have a spinup environments by command or trigger (using a CI job to execute) that always gets an expiry date. Expiry can be extended by updates (CI trigger) or commands (triggering a CI job).
On creation, the resources created become part of a persistent state, keeping track of the inventory.
A scheduled janitor job removes any expired environments, leaves comments/warning/notes, etc.
If it helps, in the bash script there's a -d flag for delete that we use when spinning up EC2 instances from Jenkins. Plus -t that we use to tag images with "jenkins_delete_me". I think @Don Sizemore has something set up on the AWS side to delete them automatically.
@Oliver Bertuch Hello Oliver! Good evening! If you had the time I wanted to pick your brain about some concerns I have regarding using any state driven system like Tofu or Pulumi or CloudFormation for this specific usecase. Would you prefer I dump them here or write a clean proposal sorta document that outlines my main concerns?
Whatever works for you. If you feel like this needs a more formal discussion with comments and all, a document may be better.
https://docs.google.com/document/d/1kQkYeW3FMCrpnDkWmBGq1ePeFR8aIrPC9OtHfcmj5fY/edit?usp=sharing
These are some thoughts I had when thinking about this workflow. Please correct me wherever I'm wrong!
I completely agree with your analysis: using / involving too much IaC for a developer is a nightmare waiting to happen.
Where should I put more thoughts?
Comments should be enabled. Or I can enable edits too
Done
OK will do - need to continue hacking for now.
Oliver Bertuch said:
OK will do - need to continue hacking for now.
Thank you! I appreciate your time and thoughts on this a lot. It's really good to have someone around to guide me who knows these systems well
When it comes to IaC, I'm also still learning. The ecosystem is so vast and you can do things on so many different levels, sometimes it's hard to see the forest for the trees.
Wow, this is getting a bit complicated! :sweat_smile: Thanks for all the work so far!
Last updated: May 30 2026 at 09:11 UTC