Multi-Tenancy and Dynamic Workload Distribution in Messaging Systems
Published on 15.12.2025
Multi-Tenancy and Dynamic Messaging Workload Distribution
TLDR: This article explores architectural patterns for dynamically distributing messaging workloads in multi-tenant systems. It contrasts the classic, static approach with dynamic strategies required for SaaS products, weighing the trade-offs between horizontal scaling with physical shards and vertical scaling with logical partitions inside a single service.
Summary:
The author delves into the complexities of distributing messaging workloads, moving beyond simple, static event filtering. In a typical scenario, a consumer knows which event types to process for its projections. However, in a multi-tenant SaaS application, tenants are not known upfront, which necessitates a dynamic approach to workload distribution. The simplest solution involves creating dedicated consumers per tenant, filtering events by a tenant property in the event metadata. This can be managed at the infrastructure level by spinning up new containers (e.g., Kubernetes pods) for each tenant, providing physical separation and independent scaling.
This horizontal scaling approach, or physical sharding, offers strong isolation and the ability to tailor resources to a specific tenant's workload. However, it can be more costly and complex to manage, especially with a large number of tenants. Spinning up a thousand containers for a thousand tenants is often impractical due to coordination overhead. An alternative is vertical scaling, or logical partitioning, where a single service instance handles multiple tenants. In this model, the consumer processes messages for all tenants but internally spins up dedicated workers (threads, processes) for each, using consistent hashing to distribute the load. This is generally cheaper and easier to manage but risks the "noisy neighbor" problem, where a single busy tenant can impact the performance of others.
For architects designing such systems, the choice between these two patterns is not always clear-cut, and often a hybrid solution is best. Smaller tenants can be grouped together within a single service, while larger or more security-sensitive tenants can be given their own dedicated, isolated infrastructure. The article also touches upon the role of mature messaging systems like Kafka, which have built-in mechanisms for partitioning and consumer groups. Kafka partitions data on the producer side, using a consistent hash of a key to assign a message to a specific partition, which is then consumed by a single consumer within a group. This strategy is efficient but less flexible if you need to partition based on different criteria dynamically on the consumer side.
The author proposes a path forward for the Emmett framework that favors a partitioned producer model. A projector could be configured to partition work by a specific event property, like the tenant ID. The consumer would then handle the internal distribution to worker threads, using distributed locking to ensure that only one worker instance processes messages for a given partition at a time. This approach aims to provide a good balance of flexibility and performance for the majority of use cases, without trying to reinvent the complex distributed consensus algorithms that mature messaging brokers already provide.
Key takeaways:
- Dynamic workload distribution is essential for multi-tenant messaging systems where tenants are not known in advance.
- Horizontal scaling (physical sharding per tenant) provides strong isolation but can be costly and complex at scale.
- Vertical scaling (logical partitioning within a service) is cheaper but can lead to noisy neighbor problems.
- A hybrid approach, combining both strategies based on tenant size and needs, is often the most practical solution.
- Leveraging the built-in partitioning mechanisms of mature messaging systems like Kafka is often preferable to building custom distribution logic.
Tradeoffs:
- Gain: Better scalability and resource alignment with physical sharding. Sacrifice: Increased operational cost and complexity.
- Gain: Lower cost and simpler management with logical partitioning. Sacrifice: Weaker isolation and potential for resource contention between tenants.
Link: Multi-tenancy and dynamic messaging workload distribution