Stream: containers

Topic: speedup ARM64 builds


view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:39):

The ARM64 builds using QEMU on AMD64 Github Runners are a pain...

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:39):

@Don Sizemore @Philip Durbin how about we use an AWS T4g or M6g machine as a remote builder?

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:39):

(Not as a Github runner!)

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 11 2024 at 14:42):

If you look at https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/2f605efb9ac22614bce7cc4b9acf718ba2d66b3b/ec2/ec2-create-instance.sh#L213C1-L213C315 you can see how we spin up images:

INSTANCE_ID=$(aws $PROFILE ec2 run-instances --image-id $AMI_ID --security-groups $AWS_SG $TAGARG --count 1 --instance-type $SIZE --key-name $KEY_NAME --query 'Instances[0].InstanceId' --block-device-mappings '[ { "DeviceName": "/dev/sda1", "Ebs": { "DeleteOnTermination": true, "VolumeSize": 20 } } ]' | tr -d \")

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:42):

I see!

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 11 2024 at 14:42):

So maybe you could give us an AMI id, etc.

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:43):

It looks like you folks already gave me an AWS access back in the day...

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 11 2024 at 14:43):

oh dear!

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:44):

The account is in the name of one Matthew Dunlap...

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 14:44):

Does that seem right????

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 11 2024 at 14:45):

Hmm, we should update that. Thanks.

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 15:27):

OK I created https://us-west-2.console.aws.amazon.com/ec2/home?region=us-west-2#InstanceDetails:instanceId=i-0d27e1b1fc1c0a8db. Named it "github-docker-buildx-arm64"

view this post on Zulip Oliver Bertuch (Apr 11 2024 at 15:28):

I'm heading out now, will play with it tomorrow morning :smile:

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:16):

Wow this is really hacky... I need to manually give DMP a buildx instance configuration to make it pick up the remote builder...

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:18):

I probably need to develop another extension for the plugin to push nodes config from the outside...

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:27):

It takes about 500 MB of RAM peak when the base image starts the Payara instance. Almost 2 CPU cores are used

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:31):

image.png

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:34):

Someone definitely needs to explain to me how the AWS flavor says t4g.micro has 1 GB of RAM and it ends up with 815 MB

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:39):

Oh wow it looks like we can even scale up one more and get a machine for free till 2024-12-31: https://aws.amazon.com/ec2/faqs//#t4g-instances

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:39):

I can definitely say that this is MUCH faster

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 06:39):

Let me try inject this into a CI config

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 08:39):

Seems like we can cut the runtime to at least half:
image.png

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 08:50):

(Both v6.2 and develop do not have the injected builder config because the feature branch isn't merged and obviously for the tag this will never happen. But for scheduled rebuilds I don't care about build time in the middle of the night. This is important for pushes to develop only. v6.0 and v6.1 base image builds are skipped on purpose)

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 09:48):

I changed the setup on purpose to hit the builder with 3 jobs at once...

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 09:49):

That made it go cry for resources and two of three jobs failed (OOM killed the processes). So let's see what happens if we double the RAM

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 10:45):

With a t4g.small this works perfect for 3 parallel jobs!

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 10:46):

We can still use the QEMU variant when doing scheduled builds (because we don't need to care about buildtime on sunday nights), but there is a chance multiple PRs come in at the same time and require the builder

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 12 2024 at 11:32):

Twice as fast? Great! :tada:

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 11:33):

Even a bit more than twice :smile_cat:

view this post on Zulip Oliver Bertuch (Apr 12 2024 at 14:30):

TODO for the runner: add a cronjob that on every day stops the builder process and deletes the volumes (docker volume prune -a), as we don't have much space on the disk

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 29 2024 at 13:31):

https://github.blog/2024-04-26-github-actions-arm64-and-the-future-of-automotive-software-development/ seems related at least. I didn't really did into the details.

view this post on Zulip Oliver Bertuch (Apr 29 2024 at 13:45):

It's going on and on about how they now provide ARM based runners and GPU enabled runners in lots of industry/sales buzzwords.

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 29 2024 at 13:46):

That was my sense too. :grinning:

view this post on Zulip Oliver Bertuch (Apr 29 2024 at 13:46):

It's kind of non-related to us if we're not willing to make the pipeline a lot more complex to ship the images built on ARM64 and glue them together manually

view this post on Zulip Oliver Bertuch (Apr 29 2024 at 13:47):

It would be an awesome feature to have a sidecar runner available that you can interact with from a job

view this post on Zulip Philip Durbin ๐Ÿš€ (Apr 29 2024 at 13:47):

Right. Sounds like we should stay the course. Sorry for the noise!


Last updated: Oct 30 2025 at 05:14 UTC