When you need to encrypt data in transit — whether for a TLS handshake, a firmware update payload, or a sensor telemetry stream — two authenticated encryption (AEAD) schemes dominate the conversation: AES-256-GCM and ChaCha20-Poly1305. Both are battle-tested, both offer 256-bit security, and both are mandated options in TLS 1.3. But they achieve their guarantees through fundamentally different design philosophies, and the right choice depends heavily on your hardware platform, performance requirements, and threat model.
This article provides a head-to-head engineering comparison of AES-256-GCM and ChaCha20-Poly1305, covering their internal mechanics, real-world performance on various CPU architectures, security properties, and practical guidance for IoT and embedded systems engineers.
Authenticated Encryption Algorithms: An AEAD Primer
Both AES-256-GCM and ChaCha20-Poly1305 are Authenticated Encryption with Associated Data (AEAD) ciphers. This means they provide three guarantees in a single operation: confidentiality (the data cannot be read without the key), integrity (the data has not been tampered with), and authenticity (the data originated from a party holding the key). An AEAD cipher takes a plaintext, a key, a nonce (number-used-once), and optionally some associated data (such as packet headers that must be authenticated but not encrypted), and outputs ciphertext plus an authentication tag. The receiver re-computes the tag and rejects the message if it does not match — silently, without leaking any information about the plaintext.
This is the cryptographic primitive behind nearly all modern secure transport protocols, and choosing the wrong one for your workload can mean leaving performance on the table — or worse, deploying insecure nonce handling.
AES-256-GCM: The Established Standard
AES (Advanced Encryption Standard) is a block cipher operating on 128-bit blocks, established by NIST in 2001. AES-256-GCM uses a 256-bit key and performs 14 rounds of substitution-permutation operations. GCM (Galois/Counter Mode) turns AES into a stream cipher by encrypting an incrementing counter and XORing the keystream with the plaintext, while simultaneously computing a GHASH authentication tag over Galois field multiplications.
Performance Characteristics
AES-GCM's critical dependency is AES-NI (AES New Instructions) — a set of x86 CPU instructions introduced by Intel in 2010 and later adopted by AMD. AES-NI accelerates the AES round transformations and the GHASH multiplications directly in hardware. On modern server CPUs with AES-NI, AES-256-GCM can encrypt data at speeds exceeding 1 GB/s per core, often outperforming ChaCha20-Poly1305 by a factor of 2–3x. This makes it the undisputed throughput champion on x86 platforms.
However, without AES-NI — on older ARM Cortex-M microcontrollers, low-power RISC-V cores, or soft CPUs in FPGAs — AES-256-GCM degrades dramatically. Software implementations struggle because AES's S-box substitution layer is not naturally constant-time without careful implementation, and the GHASH multiplication is expensive in pure software. On these platforms, throughput can drop to 20–60 MB/s, and the implementation risk of timing side-channels increases.
Security and Nonce Considerations
Standard AES-GCM uses a 96-bit (12-byte) nonce. Reusing the same nonce with the same key is catastrophic: the keystream can be recovered, and the GHASH authentication key can be computed — allowing both decryption and forgery of arbitrary messages. With only 96 bits of nonce space, the safe limit under random nonce generation is approximately 232 messages (birthday bound). For high-throughput servers encrypting billions of messages, this requires disciplined nonce management (e.g., using deterministic counters or a dedicated nonce generation service).
No practical attacks exist against AES-256-GCM. The most relevant attack is the nonce reuse (already discussed), and some GHASH truncation attacks when the authentication tag is shortened below 128 bits — but with full 128-bit tags and proper nonce handling, AES-256-GCM is provably secure.
ChaCha20-Poly1305: The Modern Contender
ChaCha20 is a stream cipher designed by Daniel J. Bernstein in 2008 as a variant of Salsa20. It operates on 512-bit state blocks using only ARX operations: Addition, Rotation, and XOR — no S-boxes, no lookup tables, no Galois field multiplications. Poly1305 is a Wegman-Carter MAC that authenticates the ciphertext using the same ARX-friendly primitives.
Performance Characteristics
ChaCha20-Poly1305 is designed to be fast in software. Because it uses only simple integer operations with no hardware dependency, it performs consistently well across virtually any CPU architecture. On ARM Cortex-M4 and Cortex-M7 microcontrollers, ChaCha20-Poly1305 can outperform AES-256-GCM by 100–300%, achieving 60–120 MB/s depending on clock speed. On ARMv8 processors with dedicated ChaCha instructions, the gap narrows but ChaCha remains highly competitive.
This consistent performance profile is ChaCha's killer feature: your encryption throughput does not collapse when you move from a development workstation to an embedded target. For IoT firmware engineers, this predictability is often more valuable than raw peak performance.
Key insight: On systems without hardware AES acceleration, ChaCha20-Poly1305 is typically 2–4x faster than AES-256-GCM. On systems with AES-NI, AES-256-GCM is typically 2–3x faster than ChaCha20-Poly1305.
Security and Nonce Considerations
The standard ChaCha20-Poly1305 construction in TLS 1.3 (RFC 8439) uses a 96-bit nonce, combined with a 32-bit block counter. The nonce reuse consequences are equally catastrophic as AES-GCM. However, the extended variant XChaCha20 (used in applications like WireGuard and libsodium's default AEAD) uses a 192-bit nonce — large enough that random nonce generation is safe even for extremely large message counts. If you control the protocol design, XChaCha20's 192-bit nonce eliminates nonce management as a practical concern.
ChaCha20's ARX-based design is naturally resistant to timing side-channels — no data-dependent memory accesses or table lookups that could leak key material through cache timing. This is a significant advantage in embedded environments where constant-time implementations can be difficult to verify.
AES-256-GCM vs ChaCha20-Poly1305: Side-by-Side Comparison
| Property | AES-256-GCM | ChaCha20-Poly1305 |
|---|---|---|
| Type | Block cipher (AES) in counter mode | Stream cipher (ChaCha20) + Poly1305 MAC |
| Designer | NIST / Daemen-Rijmen | Daniel J. Bernstein |
| Core operations | S-box substitution + Galois field | ARX (Add, Rotate, XOR) |
| Hardware acceleration | AES-NI (x86), ARMv8 Crypto Ext. | ARMv8 ChaCha (optional), none needed |
| Software throughput (no HW accel) | 20–60 MB/s | 60–120 MB/s |
| Software throughput (with HW accel) | 1–3 GB/s | 0.5–1.5 GB/s |
| Nonce size (standard) | 96 bits | 96 bits (192-bit with XChaCha20) |
| Side-channel resistance | Requires constant-time impl. | Natural (table-free) |
| FIPS 140 compliance | Yes (NIST standard) | No (limited support) |
| TLS 1.3 support | Mandatory (TLS_AES_256_GCM_SHA384) | Mandatory (TLS_CHACHA20_POLY1305_SHA256) |
When to Use AES-256-GCM
- Server-side workloads on x86 — With AES-NI present on virtually all modern server CPUs, AES-256-GCM delivers the highest throughput per watt. Data centers encrypting bulk traffic (cloud storage, CDN edge, database encryption) will benefit from the 2–3x speed advantage.
- FIPS compliance required — AES-GCM is a NIST-approved algorithm and is mandatory in FIPS 140-validated cryptographic modules. If your deployment requires FIPS certification (government, defense, healthcare), AES-256-GCM is the only choice on this list. For more on securing IoT devices end-to-end, see our IoT security mistakes guide.
- High-throughput telemetry pipelines — When ingesting millions of sensor readings per second on server-class hardware, AES-GCM's hardware-accelerated throughput translates directly to lower latency and fewer CPU cycles consumed.
- ARMv8 application processors — Modern ARM application cores (Cortex-A72 and above) include ARMv8 Cryptographic Extensions with AES instructions, narrowing or eliminating ChaCha's software advantage.
When to Use ChaCha20-Poly1305
- Microcontrollers and embedded MCUs — ARM Cortex-M0/M3/M4/M7, ESP32, RISC-V, and similar platforms lack hardware AES. ChaCha20-Poly1305 is faster, simpler to implement in constant time, and consumes less flash and RAM for the cipher implementation.
- IoT and battery-powered devices — ChaCha's lower cycle count per byte means less CPU time and lower energy consumption for each encryption operation — critical for devices running on coin cells or harvesting energy.
- Side-channel sensitive environments — For smart cards, secure elements, or any device where an attacker may have physical access with sophisticated timing measurement, ChaCha20's table-free ARX design reduces the attack surface significantly.
- Protocols using XChaCha20 — If you control the protocol specification, XChaCha20's 192-bit nonce eliminates nonce management complexity. WireGuard, for example, uses XChaCha20-Poly1305 as its sole cipher for exactly this reason.
- Mobile devices without crypto extensions — Older or lower-tier mobile SoCs may lack dedicated AES or ChaCha instructions. ChaCha20-Poly1305 provides consistent, predictable performance across the entire device fleet.
Internet Standards and TLS 1.3
Both ciphers are mandatory cipher suites in TLS 1.3 (RFC 8446). The two relevant suites are TLS_AES_256_GCM_SHA384 and TLS_CHACHA20_POLY1305_SHA256. Most modern TLS stacks — including OpenSSL, BoringSSL, wolfSSL, and mbed TLS — implement both and negotiate the best available cipher based on the server's advertised list. A common deployment pattern is to prefer AES-256-GCM on servers with AES-NI and ChaCha20-Poly1305 as a fallback for clients that lack hardware acceleration.
In the TLS context, the performance difference is most noticeable at the connection level: servers handling tens of thousands of connections per second will see measurable CPU savings from preferring AES-GCM when hardware acceleration is available.
IoT Encryption Recommendation for Embedded Systems
For the majority of embedded IoT deployments — ESP32, STM32, nRF52, and similar MCU-class devices — ChaCha20-Poly1305 is the recommended choice. It delivers superior software throughput, natural side-channel resistance, and a simpler implementation that is less prone to subtle implementation errors. If your IoT device runs on an ARM Cortex-A processor with cryptographic extensions or an x86 CPU with AES-NI (e.g., an industrial gateway running Linux on an x86 SoC), AES-256-GCM is appropriate and will deliver higher throughput.
The decision matrix can be summarized in three questions:
- Does your target CPU have hardware AES acceleration? If yes, use AES-256-GCM. If no, use ChaCha20-Poly1305.
- Do you need FIPS 140 compliance? If yes, AES-256-GCM is your only option.
- Do you want to eliminate nonce management as a concern? If yes, use XChaCha20-Poly1305 (available in libsodium, Monocypher, and WireGuard).
Conclusion: Choosing Your Authenticated Encryption Algorithm
AES-256-GCM and ChaCha20-Poly1305 are both excellent choices for authenticated encryption. Neither has been practically broken. Neither is objectively "better" in all contexts. The engineering decision comes down to a single factor: what hardware is running your encryption workload?
On server-class x86 hardware with AES-NI, AES-256-GCM is the throughput king by a wide margin. On embedded microcontrollers and mobile devices without cryptographic extensions, ChaCha20-Poly1305 dominates with consistently fast software performance and inherent side-channel resistance. Both are production-proven, both are mandated in TLS 1.3, and both will serve you well — provided you respect their nonce requirements and deploy them on the hardware they were designed to use.
When in doubt for IoT deployments, start with ChaCha20-Poly1305. Your firmware will be simpler, your timing side-channels will be fewer, and your performance will be predictable across the full range of hardware your devices ship with.