Tag Archives: docker

What Migrating My AWS Fleet to Graviton Taught Me

Over the last while I have moved everything I run on AWS to Graviton, the ARM CPUs AWS designs in house: the EC2 instances behind my services and the ECS containers that make up most of the actual workload. The short version is that it was worth it, the bill went down, and almost nothing about how I write or run code had to change. The longer version has a few sharp edges worth writing down, plus some genuinely interesting silicon arriving now in Graviton5.

Why ARM in the Cloud

The pitch is not “ARM is faster,” it is price/performance. For the same dollar you generally get more useful work out of a Graviton instance than the equivalent Intel or AMD one, and AWS prices the instances lower on top of that. Set expectations correctly, though: a single Graviton core will not smoke a top-bin x86 core on a single-threaded benchmark. The win shows up across a fleet of web servers, API workers, queue consumers, and databases: more cores per dollar and steady throughput under real concurrent load. If you are chasing the lowest latency on one hot thread, measure first.

ARM vs Intel vs AMD in Practice

  • Graviton is my default: best price/performance for general compute, web tiers, microservices, and most databases, at the lowest cost and power.
  • AMD is the value x86 option when something genuinely needs x86 but not Intel specifically.
  • Intel is a deliberate choice for AVX-512, Intel-specific acceleration, or vendors that only certify on Intel. It is the priciest of the three.

What actually decides it is rarely the chip and almost always the software. If everything you run is open source or built from your own source, ARM is a non issue. The moment a proprietary, x86 only binary shows up, that workload stays on x86 until the vendor ships an ARM build. So the real question per service is not “is ARM fast enough,” it is “does every binary in this container have an arm64 version.”

What Actually Bit Me

None of these were dealbreakers, but I wish I had batched them up front instead of finding them one service at a time.

The first one is the biggest mindset shift: your images must be multi-arch. Your base images, every layer, and your own build all need linux/arm64. I moved to docker buildx and publish multi-arch manifests so one tag resolves correctly wherever it runs:

docker buildx build \
    --platform linux/amd64,linux/arm64 \
    -t myrepo/myservice:latest --push .

Then ECS has to be told which architecture to use. The task definition’s runtimePlatform defaults to x86, so set it to ARM64 and run on Graviton capacity or Fargate:

"runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
}

When a service misbehaves after the switch, the first thing I check is whether the published tag actually contains both architectures. This one-liner saves a lot of guessing:

docker buildx imagetools inspect myrepo/myservice:latest
# look for linux/amd64 and linux/arm64 in the platform list

The rest are smaller traps worth knowing up front:

  • Emulated CI is slow. Building arm64 on an x86 runner means QEMU, which is painful for anything compiled. You no longer have to: GitHub-hosted native arm64 runners are generally available, free for public repositories and now offered for private ones too, so the pipeline builds on real ARM hardware instead of emulating it.
  • Native dependencies are the long tail. Interpreted code moves for free; compiled extensions (native npm modules, Python wheels, cgo, old vendored binaries) each need an arm64 build.
  • Check your sidecars. Logging, metrics, and security agents need arm64 images too. The mainstream ones have them, but a sidecar that silently fails to start will ruin your afternoon.
  • Managed services are the easy win. RDS, ElastiCache, and Lambda offer Graviton with no code change. Start there to build confidence.

Graviton5: What Changed

While I was mid migration, AWS moved the goalposts. The lineage shows the trajectory:

Gen Cores Microarch ISA Clock Memory
Graviton2 64 Neoverse N1 Armv8.2-A 2.5 GHz DDR4
Graviton3 64 Neoverse V1 Armv8.4-A 2.6 GHz DDR5-4800
Graviton4 96 Neoverse V2 Armv9.0-A 2.8 GHz DDR5-5600
Graviton5 192 Neoverse V3 Armv9.2-A 3.3 GHz DDR5-8800

Graviton5, announced in December 2025 with M9g instances reaching general availability in mid 2026, is a bigger step than the version bump suggests. It doubles Graviton4 to 192 Neoverse V3 cores, built on a 3nm process as four chiplets of 48 cores each with up to ~420 GB/s between chiplets. It carries 192 MB of L3 cache (around five times the prior generation), 2 MB of L2 per core, DDR5-8800, and PCIe Gen6. It also ships a new Nitro Isolation Engine, which AWS describes as a formally verified hypervisor: the VM isolation property is mathematically proven, not just tested.

AWS quotes roughly 25% better compute than Graviton4 per core, with up to ~35% on web apps, up to ~35% on ML inference, and ~30% on databases from improved branch prediction. Take vendor percentages with the usual grain of salt and benchmark your own workload, but the architecture behind them is real. For densely packed container fleets the per core number matters less than doubling the cores while growing cache and memory bandwidth. The general purpose M9g and M9gd are out now; the compute optimized C9g and memory optimized R9g are the variants I am waiting on, since most of my fleet maps to those shapes.

Is It Worth It

For me, clearly yes. If your stack is containers and managed services built from source or mainstream images, the migration is mostly mechanical: make images multi-arch, set the architecture on your tasks, fix the native dependency stragglers. Start with managed services for an easy win, then move containers one service at a time so you can roll back cleanly. The one piece of advice I would give my past self: audit dependencies and sidecars for arm64 support before flipping anything. Ninety percent of the work is trivial. The other ten percent is one unmaintained binary you forgot about, and it is much nicer to find it on a spreadsheet than in a failing deployment.

Monitoring Internal and Private CA Certificates with Generator Labs

External certificate monitoring works well for public-facing infrastructure, but it has an obvious blind spot: it can’t reach anything inside your private network. Internal APIs, databases with TLS-encrypted connections, mail servers on non-public ports, self-signed certificates, and infrastructure issued by a private CA all go completely unmonitored. Those certificates still expire. When they do, the failures tend to be worse, because internal services rarely have the same visibility as public ones.

Generator Labs internal certificate monitoring solves this with a lightweight on-premise agent you deploy as a Docker container inside your network.

How It Works

Diagram showing the Generator Labs private monitoring agent connecting internal hosts to the platform over outbound HTTPS

The agent runs inside your private network, connects to your internal hosts, retrieves their certificates, and reports the data back to the Generator Labs platform over outbound HTTPS. No inbound firewall rules are required. Private keys never leave your network. From the platform’s side, internal monitors look and behave exactly like external ones.

What It Can Monitor

The agent connects to any TLS endpoint your network can reach:

  • Internal web servers and APIs
  • Databases with TLS connections (PostgreSQL, MySQL, MongoDB, Redis)
  • Internal mail servers (SMTP, IMAP, POP3 with STARTTLS or implicit TLS)
  • IoT devices and embedded systems serving TLS on custom ports
  • Any service running TLS on any port

It runs the same eight checks as external monitoring: expiration, chain integrity, hostname validation, CA trust, revocation, fingerprint changes, flapping, and CAA records.

Private CA Support

If your internal certificates are issued by a private CA, you can import that CA’s root certificate into the platform. The agent then validates certificate chains all the way to your private root, so chain integrity checks work correctly for internally-issued certificates, not just publicly-trusted ones.

Alerts

All the same notification channels are available: email, Slack, PagerDuty, Discord, webhooks, AWS SNS, and more. Internal certificate expiration or chain failures trigger the same alert pipeline as any other monitoring event.

Getting Started

The agent is open source and available at github.com/generator-labs/agent. Deploying it takes a few minutes: pull the Docker image, set your API credentials as environment variables, and configure the hosts you want to monitor. Full setup instructions are on the internal certificate monitoring page.