MQTT Production Infrastructure: From Prototype to Production
MQTT (Message Queuing Telemetry Transport) has become the de facto standard for IoT messaging. Its lightweight publish-subscribe model, minimal bandwidth overhead, and persistent connection semantics make it ideal for constrained devices. But there is a vast gulf between wiring a Mosquitto broker on a Raspberry Pi for a proof of concept and deploying a production MQTT infrastructure capable of handling tens of thousands of concurrent devices.
At Aletheia Tech, we have designed and operated MQTT infrastructure for clients in industrial automation, smart building, and logistics tracking. This article consolidates that experience into a practical reference for teams building production-grade MQTT systems. Whether you are migrating from a prototype or designing a new system from scratch, the decisions you make about topic structure, quality of service, session management, and broker topology will determine whether your system scales gracefully or collapses under load.
Topic Hierarchy Design
A well-structured topic namespace is the single most important architectural decision in any MQTT system. Poor hierarchy leads to security vulnerabilities, unmanageable ACLs, and routing inefficiencies. We recommend a hierarchical pattern that mirrors your physical or organisational topology:
org/site/zone/device-type/device-id/data-type
For example, a temperature sensor on the third floor of a factory in Malta would publish to:
aletheiatech/malta-factory/floor-03/temperature-sensor/ts-4427/temperature
This hierarchy delivers several benefits. First, it enables granular ACL enforcement at every level of the tree. Second, it allows subscribers to use wildcards at the appropriate granularity. Third, it naturally partitions data for storage and analytics. Every level should be deliberate: org separates multi-tenant deployments, site enables geographic routing, zone allows facility-level filtering, device-type enables fleet-wide commands, and device-id provides unique identification. The data-type leaf distinguishes telemetry from commands, configuration, or diagnostics.
Wildcard Usage
MQTT defines two wildcard characters that subscribers use to match multiple topics. Understanding when to use each is essential for both performance and security.
Single-level wildcard (+): Matches exactly one topic level. A dashboard that displays all temperature readings across a site would subscribe to aletheiatech/malta-factory/+/temperature-sensor/+/temperature. The + wildcard is safe, explicit, and should be your default choice wherever possible because it constrains the subscription to a known depth.
Multi-level wildcard (#): Matches the remainder of the topic tree. A logging service that archives all messages from a site would subscribe to aletheiatech/malta-factory/#. The # wildcard is powerful but dangerous. It can only appear as the last character in a subscription filter and it bypasses topic-level structure entirely. Overuse of # subscriptions on busy brokers creates routing bottlenecks and can accidentally expose topics that should have remained private.
Our rule of thumb: prefer + wildcards with explicit level counts, restrict # subscriptions to administrative and logging services, and enforce these constraints through topic-level ACLs on the broker.
Production Rule
Never allow untrusted devices to subscribe with #. A compromised sensor subscribing to # could siphon every message on the broker. Use ACLs to restrict wildcard subscriptions to specific authorised clients.
Quality of Service Levels
MQTT defines three QoS levels, and choosing the right one for each message type directly impacts system reliability and throughput. The tendency of newcomers is to default to QoS 2 for everything, assuming more reliability is always better. In practice, this is a mistake.
QoS 0 (At most once): The message is delivered with no acknowledgement or retry. Use this for high-frequency telemetry where occasional drops are acceptable — temperature readings, power consumption, GPS coordinates. QoS 0 offers the lowest latency and highest throughput. For a broker publishing 100,000 messages per second, QoS 0 is the only realistic choice.
QoS 1 (At least once): The message is delivered with an acknowledgement and retried until confirmed. This is the right choice for device commands, configuration updates, and alarm events where delivery must be guaranteed but duplicates can be handled idempotently at the application layer. QoS 1 incurs approximately 50% more overhead than QoS 0 due to the PUBACK flow.
QoS 2 (Exactly once): The message is delivered with a four-way handshake guaranteeing exactly-once semantics. Use QoS 2 sparingly — for financial transactions, critical state transitions, or regulatory audit trails. The overhead is substantial: each QoS 2 message requires four control packets (PUBLISH, PUBREC, PUBREL, PUBCOMP) and doubles broker memory consumption for in-flight messages.
Performance Impact
In our benchmarks, a single EMQX node handles ~200,000 messages/second at QoS 0, ~120,000 at QoS 1, and ~60,000 at QoS 2. Choose QoS 2 only when the business cost of a duplicate message exceeds the infrastructure cost of the reduced throughput.
Clean vs. Persistent Sessions
Session management determines what happens when a client disconnects and reconnects. This is not a binary choice — MQTT 5.0 adds granular controls.
Clean sessions (Clean Start = true in MQTT 5.0): The broker discards all session state when the client disconnects. Subscriptions are lost, queued messages are dropped, and the client starts fresh on every connection. Use clean sessions for sensors that publish telemetry but do not subscribe to commands, or for any device where offline message queuing provides no value. Clean sessions minimise broker memory usage and are appropriate for the majority of IoT sensors.
Persistent sessions (Clean Start = false): The broker maintains subscription state and queued messages across disconnections. The client receives any messages published while it was offline when it reconnects. Persistent sessions are essential for command-and-control scenarios: actuators, gateway devices that aggregate downstream sensors, or mobile applications that need to receive notifications. The cost is broker-side memory and disk storage for queued messages, which must be provisioned accordingly.
With MQTT 5.0, session state can also be assigned a Session Expiry Interval. This allows the broker to clean up sessions after a configurable period rather than storing state indefinitely. Set a session expiry of a few hours for devices that reconnect periodically, and use infinite expiry only for critical infrastructure gateways.
MQTT 5.0 Features Worth Using
MQTT 5.0, standardised in 2019, introduced several features that materially improve production operations. If your broker and client libraries support it, MQTT 5.0 should be preferred over 3.1.1 for new deployments.
Session Expiry Interval: As mentioned above, this replaces the binary clean-session model with a configurable TTL for session state. Set it per-client based on expected reconnection patterns. A sensor that wakes every 15 minutes can have a 30-minute session expiry; a gateway that is always on can have a 24-hour or infinite expiry.
Message Expiry Interval: Every PUBLISH can carry a TTL. If the message cannot be delivered within that window, the broker discards it. This prevents stale commands from accumulating in queues and is invaluable for time-sensitive actuation.
User Properties: Custom key-value pairs attached to any MQTT packet. Use them for tracing, correlation IDs, tenant identifiers, or routing hints without polluting the topic namespace. A tracing middleware can inject a trace-id user property that propagates through the entire message pipeline.
Reason Codes: MQTT 5.0 replaces the binary ACK/NACK with descriptive reason codes in all acknowledgement packets. When a PUBLISH is rejected or a SUBSCRIBE fails, the broker now says why: the topic is unauthorised, the packet exceeds the maximum size, the client identifier is malformed. This alone is reason enough to upgrade — debugging MQTT 3.1.1 failures was like debugging HTTP without status codes.
MQTT Broker Selection: EMQX, Mosquitto, and VerneMQ
The open-source MQTT ecosystem offers several mature brokers, each with distinct trade-offs. Your choice depends on scale, operational maturity, and feature requirements.
Eclipse Mosquitto: The workhorse of the MQTT world. Mosquitto is a single-threaded, event-loop broker written in C. It is lightweight, memory-efficient, and trivially simple to configure. Maximum practical throughput is roughly 50,000–100,000 messages per second on modern hardware. Mosquitto is the best choice for edge gateways, development environments, and deployments under 10,000 clients where operational simplicity is paramount. It does not natively cluster, so high-availability deployments require a load balancer with active-passive failover.
EMQX: A distributed MQTT broker written in Erlang/OTP. EMQX is designed for massive scale from the ground up: a single EMQX node can handle 2 million concurrent connections, and a cluster can scale horizontally to tens of millions. It provides built-in clustering, hot upgrades, rule engine for data integration, and extensive plugin support. EMQX is our default recommendation for production deployments exceeding 10,000 devices, multi-tenant architectures, or any system requiring a high SLA.
VerneMQ: Another Erlang-based broker, VerneMQ distinguishes itself with a strong focus on multi-tenancy and fine-grained operational control. It uses a LevelDB-backed metadata store that enables extremely fast subscribe/unsubscribe operations under churn. VerneMQ's clustering model is fully distributed with no single point of failure. It is an excellent choice for IoT platforms serving multiple tenants where tenant isolation and per-tenant metrics are first-class requirements.
MQTT Broker Clustering & EMQX Kubernetes for High Availability
For production systems that cannot tolerate downtime, a single broker instance is insufficient. Clustering provides both high availability and horizontal scalability.
EMQX on Kubernetes represents the current state of the art for large-scale MQTT deployments. EMQX 5.0 introduced the core/replicant architecture: a small number of core nodes maintain the distributed database (routing table, session state, ACLs), while lightweight replicant nodes handle client connections and forward traffic to the core layer. Replicants are stateless and can be scaled in and out rapidly based on connection load, while the core layer provides data durability.
In a typical EMQX Kubernetes deployment, you might run three core nodes (for Raft-based consensus) and anywhere from three to thirty replicant nodes depending on client count. A Kubernetes Service of type LoadBalancer distributes MQTT and MQTTS connections across the replicant pool. Node failures are handled transparently: replicants restart in seconds, core nodes fail over via Raft, and persistent sessions survive individual node losses because session state is replicated across the core cluster.
Reference Architecture
For a deployment targeting 100,000 concurrent devices, we recommend: 3x EMQX core nodes (4 vCPU, 8 GB RAM each) on dedicated Kubernetes nodes, 6x EMQX replicant nodes (2 vCPU, 4 GB RAM), and a managed PostgreSQL or Redis instance for session persistence fallback. This configuration handles ~500,000 messages per second with sub-10ms P99 latency.
Secure MQTT IoT: Security Best Practices
MQTT security is multi-layered, and each layer addresses a different threat model. We implement all of the following in production.
TLS 1.3 (mandatory): All MQTT traffic should be encrypted with TLS 1.3. There is no acceptable reason to run plain TCP MQTT in production. Use MQTTS (port 8883) or secure WebSockets (port 443). Certificate validation on both sides prevents man-in-the-middle attacks. For constrained devices that cannot handle full TLS, consider a secure tunnel (WireGuard or similar) terminating at an edge gateway.
mTLS (mutual TLS): For high-security deployments, require clients to present X.509 certificates that the broker validates against a trusted CA. mTLS provides device identity without shared secrets and is the strongest authentication mechanism available. The Common Name (CN) or Subject Alternative Name (SAN) in the certificate can encode the device ID and even the authorised topic prefix, enabling certificate-bound topic permissions.
Username/Password Authentication: When mTLS is not feasible, use strong password authentication with bcrypt or SCRAM-based credential verification. Never store passwords in plaintext. EMQX and VerneMQ both support extensible authentication via HTTP or LDAP backends, enabling integration with existing identity providers.
ACLs and Topic-Level Authorization: ACLs are the enforcement mechanism for the principle of least privilege. Each client should be permitted to publish and subscribe to only the topics it requires. A temperature sensor should not be able to subscribe to actuator commands. EMQX's built-in ACL system supports file-based, database-backed, and HTTP-driven ACLs. We recommend HTTP ACLs backed by a REST API that queries a centralised policy database — this enables dynamic ACL updates without broker restarts.
Monitoring & Observability
An unmonitored MQTT broker will fail silently. A production monitoring stack should cover four dimensions: broker health, connection metrics, message throughput, and system resources.
Prometheus metrics: All major MQTT brokers expose Prometheus-compatible metrics. Key metrics to alert on include: mqtt_sessions_count (total connected clients), mqtt_messages_received and mqtt_messages_sent (throughput), mqtt_subscriptions_count (subscription table size), and mqtt_delivery_dropped (messages lost due to queue overflow). EMQX additionally exposes per-topic and per-client metrics through its dashboard and REST API.
Broker health checks: Configure TCP health checks on the MQTT port (1883 or 8883) at the load balancer level. More sophisticated checks can connect, subscribe to an internal health topic, and verify that the broker responds within a threshold. EMQX provides a /status endpoint and a /api/v5/health API for Kubernetes liveness and readiness probes.
Connection monitoring: Track session creation rates, authentication failures, and unexpected disconnections. A sudden spike in authentication failures may indicate a brute-force attack. A surge of clean-session connections in rapid succession may point to a firmware bug causing device reboot loops. Grafana dashboards over Prometheus data provide real-time visibility, while structured logs forwarded to Loki or Elasticsearch enable forensic analysis.
Last Will Testament and Retained Messages
Two often-overlooked MQTT features become critical at scale: Last Will Testament (LWT) and retained messages.
LWT: Every client connection can specify a will message that the broker publishes if the client disconnects unexpectedly. This is the IoT equivalent of a dead-man's switch. A gateway device publishing its online/offline status via LWT enables downstream systems to detect connectivity loss within a few seconds. Design your LWT topic as org/site/zone/device-id/status with a simple payload like {"state":"offline","timestamp":"..."} and subscribe your orchestration layer to +/+/+/+/status.
Retained messages: A publisher can mark a message as retained, instructing the broker to store the last value on the topic and deliver it immediately to any new subscriber. This is ideal for device state, configuration versions, or firmware versions. For example, a smart thermostat can retain its current target temperature so that a newly connected dashboard instantly displays the current setpoint without waiting for the next telemetry cycle. Use retained messages sparingly, as each retained message consumes broker memory proportional to the payload size. Monitor retained message count as part of your regular housekeeping.
Scaling MQTT Production Infrastructure
Scaling MQTT infrastructure requires planning in three dimensions: connections, message throughput, and storage.
Connections: Each MQTT connection consumes a TCP socket, a small amount of RAM for connection state, and (for persistent sessions) session storage. On modern hardware with tuned kernel parameters, a single EMQX node handles 1–2 million concurrent TCP connections. Beyond that, you need horizontal scaling. EMQX's core/replicant architecture is purpose-built for this: replicant nodes handle the TCP termination, while a consistent hash ring distributes sessions across the core cluster. Connection distribution is handled automatically by the load balancer.
Message throughput: Throughput is bound by CPU (packet parsing and routing) and network bandwidth. At high throughput, the bottleneck shifts from the broker to the subscribers. If a subscriber cannot keep up with the publish rate, the broker either drops messages (QoS 0) or queues them (QoS 1/2), potentially exhausting memory. Implement backpressure mechanisms: use MQTT 5.0's Receive Maximum to limit the number of in-flight messages per client, configure per-client queue limits, and use a message bridge to route high-volume telemetry streams to a streaming platform like Apache Kafka for asynchronous consumption.
Storage: Message brokers are not databases. Configure retention policies that purge messages from queues after a reasonable interval. EMQX's built-in rule engine can forward all telemetry to a time-series database, enabling long-term storage and analytics without burdening the broker. For audit trails, forward QoS 2 command messages to an immutable log store. Plan for at least 2x the storage your current telemetry volume requires, accounting for growth.
Capacity Planning Formula
Estimate broker memory as: base_memory + (connections × 8 KB) + (subscriptions × 0.5 KB) + (retained_messages × avg_payload_bytes) + (in_flight_qos1_2 × avg_payload_bytes × 2). For 100,000 connections, 500,000 subscriptions, 10,000 retained messages at 256 bytes, and 5,000 in-flight QoS 1 messages, plan for approximately 2.5–3 GB of broker RAM.
Conclusion
Architecting a production MQTT infrastructure requires deliberate decisions at every layer of the stack. Topic hierarchy determines whether your system is manageable or chaotic. QoS levels determine the performance envelope. Session management determines reliability under disconnection. Broker selection and clustering determine the ceiling on scale. And security determines whether any of it matters at all.
The common thread across every production deployment we have operated is this: plan for scale from day one. A topic hierarchy designed for five devices will not serve 5,000. An ACL model absent from v1 will be painful to retrofit in v3. A single-node Mosquitto chosen for simplicity in the prototype will become a bottleneck in production, and migrating to a clustered EMQX deployment mid-project is a costly exercise.
Start with a well-structured topic namespace, choose the broker that matches your scale requirements, implement TLS and ACLs before the first device connects, and instrument everything. For complementary guidance on securing your IoT backend, read our IoT security mistakes guide. Your future self — and the operations team on call at 3 AM — will thank you.
If your team is designing or scaling an MQTT infrastructure and would like an architectural review, we at Aletheia Tech offer production-readiness assessments for IoT messaging systems. Get in touch.