Development
8 min readApache Kafka powers event-driven architectures across countless organizations, handling everything from real-time analytics to microservices communication. As the Kafka landscape evolves, especially with the shift from ZooKeeper to KRaft, teams face important decisions about their infrastructure strategy.
As we add Strimzi Operator support in this article, which is applicable to the 2.3+ GA release trains, we want to explain what this technology does and why it matters for new and existing deployments.
For years, Kafka relied on Apache ZooKeeper for cluster metadata and coordination. The architecture works reliably, and our current ZooKeeper-based solution continues serving customers well.
However, the Kafka community has chosen a clear path forward with KRaft (Kafka Raft metadata mode), which eliminates ZooKeeper entirely. KRaft simplifies Kafka's architecture by moving metadata management directly into Kafka brokers running in controller mode. Beyond removing a component, it reduces operational complexity, improves scalability, and aligns with Kafka's long-term roadmap.
Strimzi is a Cloud Native Computing Foundation (CNCF) sandbox project that provides Kubernetes operators for managing Apache Kafka. Instead of manually configuring clusters through Helm charts and scripts, Strimzi uses Custom Resource Definitions (CRDs) to represent Kafka infrastructure as native Kubernetes resources.
You declare your desired Kafka cluster state in YAML manifests, and Strimzi's operators handle the details: provisioning, configuration, security, upgrades, and operational tasks that traditionally required manual work. For organizations managing Kafka on Kubernetes, the declarative approach changes how operations work.
The Strimzi Operator continuously reconciles the actual cluster state with your declared desired state, providing self-healing capabilities and preventing configuration drift. The automation becomes especially valuable when managing KRaft deployments across multiple environments.
Strimzi deployments use several components working together.
The Cluster Operator manages Kafka clusters, Kafka Connect, and MirrorMaker deployments. It watches for Kafka and KafkaNodePool custom resources, then creates and maintains the necessary Kubernetes StatefulSets, Services, and configurations.
The Entity Operator includes two sub-components: the Topic Operator manages Kafka topics declaratively, while the User Operator handles user credentials and authorization. Topics and access control are managed through version-controlled manifests alongside other infrastructure.
For KRaft deployments, KafkaNodePool resources define separate pools for controller nodes (managing metadata consensus) and broker nodes (handling client connections and data). The separation is necessary for production KRaft architecture and enables independent scaling.
One of Strimzi's best features is bringing Kafka into standard Kubernetes workflows. Your entire Kafka configuration (clusters, topics, users, ACLs) lives in version-controlled YAML manifests deployed through existing CI/CD pipelines.
The declarative approach has clear advantages. Configuration drift becomes detectable and automatically correctable. Environments stay consistent because the same manifests work across development, staging, and production. Disaster recovery improves because configurations are version-controlled artifacts that can quickly recreate infrastructure.
For teams already managing Kubernetes workloads, the consistency helps. Kafka uses the same operational patterns as other services. No special tooling or processes required.
Security is one of the more complex aspects of Kafka operations. Strimzi offers flexibility in how security artifacts are managed, and the flexibility matters for enterprise deployments.
Strimzi can automatically generate and manage TLS certificates, keystores, and truststores. The operator handles certificate rotation, distribution to all cluster components, and integration with cert-manager for enterprise certificate management. Automation eliminates the real operational burden.
However, organizations with established security infrastructure may prefer custom-generated artifacts. Maybe you have existing PKI systems, specific compliance requirements, or security tooling that generates standardized certificates. Strimzi supports custom certificate configurations, allowing integration with existing security workflows.
The choice involves balancing automation benefits against control and compliance requirements. We're ensuring both approaches work well, providing flexibility for different customer needs.
Our implementation follows a logical progression: Docker Compose for local development, then moving through validation environments to production Kubernetes clusters. Strimzi supports the progression naturally.
Maintaining consistent configurations that mirror production deployment structures gives developers predictable Kafka behavior between local and cloud environments. The consistency reduces the "works on my machine" problem that affects complex distributed systems.
The same Kubernetes custom resources that work in testing environments work in production. Infrastructure capacity scales, but configuration patterns stay consistent.
Enterprise deployments often need custom Kafka images for compliance, custom patches, performance optimizations, or internal enhancements. Strimzi fully supports custom images and private container registries.
Organizations can maintain their own Kafka distributions while using Strimzi's operational automation. You control the image contents and versioning. Strimzi handles orchestration and lifecycle management.
Moving to Strimzi affects monitoring and management tooling. Strimzi exposes Prometheus metrics for all Kafka components: brokers, controllers, topics, and consumer groups. The operator includes metric exporters that integrate with existing observability platforms.
Administrative tools require connectivity pattern adjustments when moving from ZooKeeper-based clusters. Instead of connecting to ZooKeeper for cluster metadata, tools connect to Kafka's controller endpoints. Strimzi's Service configurations expose the necessary endpoints, but validation ensures existing tooling continues working correctly.
The validation matters. Production Kafka deployments typically involve multiple systems: monitoring dashboards, alerting rules, administrative UIs, and operational scripts. Ensuring these integrations work with KRaft-based clusters managed by Strimzi requires thorough testing.
Kafka's performance depends heavily on storage configuration, and Strimzi provides the flexibility needed for production deployments.
Controllers need less storage since they only maintain cluster metadata. Brokers need substantially more storage for data retention based on your retention policies and throughput requirements.
Using JBOD (Just a Bunch of Disks) configurations with multiple persistent volumes per broker can maximize throughput for write-heavy workloads. Storage class selection impacts performance. Different Kubernetes storage providers offer different latency and throughput characteristics.
Kafka version upgrades have historically required careful manual orchestration. Strimzi automates much of the complexity through intelligent rolling updates.
When you update the Kafka version in your configuration, the operator performs controlled rolling restarts. It maintains controller quorum, respects minimum in-sync replicas for broker partitions, and validates version compatibility automatically.
Automation reduces the operational burden of keeping clusters current with security patches and feature updates. Configuration changes follow the same pattern. Modify the manifest, and Strimzi orchestrates changes safely across your cluster.
Production deployments typically involve multiple specialized node pools.
Controller pools with three or more replicas ensure high availability for metadata consensus. Controllers form a Raft quorum that must maintain majority agreement for cluster operations.
Broker pools scale according to throughput requirements and storage needs. Brokers handle client connections, data replication, and partition leadership.
Network listeners vary by access pattern. Internal listeners optimize performance for inter-cluster communication. Secure listeners provide encrypted application connectivity. External listeners enable clients outside Kubernetes to connect when needed.
Replication factors and minimum in-sync replica settings balance data durability with performance, tuned based on workload characteristics.
Adopting Strimzi and KRaft brings strategic infrastructure investment with real benefits rather than chasing trends.
Simpler new deployments: Starting with KRaft eliminates an entire distributed system (ZooKeeper) from the operational stack. Fewer moving parts means fewer failure modes and simpler operations.
Consistent operations: Managing Kafka through Kubernetes-native patterns reduces complexity. The same tools, workflows, and operational practices apply across all infrastructure.
Future alignment: The Kafka community has clearly chosen KRaft as the architectural future. ZooKeeper support will eventually be deprecated. Starting new installations with KRaft positions organizations well for what's coming.
Self-service: Declarative Kafka management enables teams to provision isolated clusters through GitOps workflows, accelerating development while maintaining governance.
Avoiding migration pain: For new projects, starting with KRaft avoids future migration complexity entirely. The value shows when considering the operational overhead of migrating live production clusters.
We're approaching the work pragmatically. Our current ZooKeeper-based solution works well and doesn't create immediate pressure. However, we want new customers to start with the best possible foundation rather than inheriting technical debt that requires future migration.
We're focused on thorough validation across increasingly complex environments. We're testing configurations through regular testing cycles to identify issues before they affect production deployments.
The timeline balances thoroughness against urgency. New customers benefit most from starting with KRaft, avoiding future migration complexity. Existing customers can maintain stable ZooKeeper-based deployments until natural infrastructure transitions (cloud migrations, major upgrades, or platform refreshes) provide logical migration opportunities.
As we finalize Strimzi implementation and prepare for release, the operational simplicity for Kafka management is clear. The combination of KRaft's architectural clarity with Strimzi's intelligent automation provides a solid foundation for event-driven systems.
Whether you're building event sourcing architectures, real-time analytics pipelines, or microservices communication layers, the modern approach to Kafka infrastructure delivers production-ready foundations without the operational complexity that has traditionally accompanied Kafka deployments.
For new projects, starting with KRaft and Strimzi means avoiding technical debt and migration complexity entirely. For existing deployments, the path forward is clearer: maintain stability now, plan migration during natural transition points, and benefit from improved architecture when the timing makes strategic sense.