Clients want to copy knowledge from their Apache Kafka clusters for quite a lot of causes, resembling compliance necessities, cluster migrations, and catastrophe restoration (DR) implementations. Nevertheless, the fitting replication technique can fluctuate relying on the appliance context. On this submit, we stroll by means of the completely different concerns for utilizing Amazon MSK Replicator over Apache Kafka’s MirrorMaker 2and assist you to select the fitting replication resolution in your use case. We additionally focus on the best way to make functions utilizing Amazon Managed Streaming for Apache Kafka (Amazon MSK) resilient to disasters utilizing a multi-Area Kafka structure utilizing MSK Replicator.
Challenges with selecting DR methods
Clients create enterprise continuity plans and DR methods to maximise resiliency for his or her functions, as a result of downtime or knowledge loss may end up in dropping income or halting operations. DR planning helps the enterprise proceed operating within the occasion of a catastrophe impacting a subset of their utility structure. For patrons utilizing Kafka as a core streaming and messaging service of their functions, planning for DR for his or her Kafka infrastructure is a necessary a part of assembly targets for his or her utility Restoration Time Goal (RTO) and Restoration Level Goal (RPO).
Amazon MSK is a completely managed service that makes it simple to construct and run Kafka to course of streaming knowledge. Amazon MSK supplies excessive availability by providing multi-AZ configurations to distribute brokers throughout a number of Availability Zones inside an AWS Area. A single MSK cluster deployment supplies message sturdiness by means of intra-cluster knowledge replication. Information replication with a replication issue of three and min-ISR worth of two together with the producer setting acks=all supplies the strongest availability ensures, as a result of it makes positive different brokers within the cluster acknowledge receiving the information earlier than the chief dealer responds to the producer. This design supplies sturdy safety towards single dealer failure in addition to single-AZ failure.
For enhanced resilience inside a single Area, Amazon MSK additionally gives Specific brokers, which considerably enhance Kafka cluster reliability, throughput, restoration occasions. Specific brokers embody pay-as-you-go storage, automated best-practice reliability configurations, no upkeep home windows, and quicker dealer scaling and restoration occasions. This structure reduces restoration time, minimizes the prospect of errors with misconfigurations, and will increase throughput, making your Kafka clusters extra resilient throughout Availability Zones.
Nevertheless, if an unlikely subject is impacting your functions or infrastructure throughout a couple of Availability Zone, the structure outlined on this submit will help you put together, reply, and get better from it.
For corporations that may face up to an extended RTO however require a decrease RPO on Amazon MSK, backing up knowledge to Amazon Easy Storage Service (Amazon S3) is ample as a DR plan. This strategy requires you to assume by means of the best way to deal with restarting the appliance after a DR failover. On this strategy, you construct a system to get better the information from Amazon S3 to Kafka subjects (as described in Again up and restore Kafka subject knowledge utilizing Amazon MSK Join). Relying on the quantity of information being restored, it would take a very long time to get better on this state of affairs. Moreover, you will need to think about the best way to deal with shopper group offsets, and whether or not to permit functions to eat from the most recent offset within the restored Kafka subjects. Because of the excessive RTO, in addition to the complexity and challenges related to this strategy, most streaming use instances depend on the provision of the MSK cluster itself for his or her enterprise continuity plan. In these instances, organising MSK clusters in a number of Areas and configuring knowledge replication between clusters supplies the required enterprise resilience and continuity.
Selecting the best replication resolution: MSK Replicator vs MirrorMaker 2
AWS recommends two main options for cross-Area Kafka replication: MSK Replicator and MirrorMaker 2. Understanding when to make use of every resolution is essential for designing an efficient DR technique.
MSK Replicator: For many MSK cluster replications in the identical account
MSK Replicator is a completely managed, serverless Kafka replication service that makes it simple to reliably replicate knowledge throughout MSK clusters in several Areas or throughout the similar Area. MSK Replicator is the beneficial resolution for utility situations replicating knowledge throughout the similar AWS account. MSK Replicator has the next advantages:
Replication between MSK clusters – It helps replicating between MSK clusters in the identical AWS account (together with active-active or active-passive DR architectures for Amazon MSK)
No infrastructure administration – It’s totally serverless with automated scaling and simple setup by means of the AWS Administration Console, AWS Command Line Interface (AWS CLI), or APIs
Constructed-in monitoring – It’s built-in with Amazon CloudWatch metrics and logging
Constructed-in excessive availability – As a managed service, it gives built-in fault tolerance throughout Availability Zones
MirrorMaker 2: For migrations and complicated and hybrid situations
MirrorMaker 2 (MM2) stays the popular resolution for particular use instances that require extra flexibility or contain non-Amazon MSK environments. MM2 is a utility bundled as a part of Kafka that helps replicate knowledge between Kafka clusters utilizing the Kafka Join framework.
We suggest MirrorMaker 2 for the next use instances:
Cross-account replication – Replicating knowledge between MSK clusters in several AWS accounts
Migrations to Amazon MSK – Migrating from current Kafka clusters on premises, in different clouds, or on self-managed Amazon Elastic Compute Cloud (Amazon EC2) deployments
Cross-cloud or hybrid cloud situations – Replicating between Kafka operating on-premises or on completely different cloud suppliers and Amazon MSK for catastrophe restoration or knowledge analytics use instances
Utilizing mTLS or SASL/SCRAM authentication – While you want mutual TLS certificate-based or SASL/SCRAM authentication and might’t allow AWS Identification and Entry Administration (IAM) authentication in your MSK cluster (for replication from one MSK cluster to a different in these situations, you possibly can nonetheless use MSK Replicator by enabling IAM authentication along with current authentication strategies)
Customized replication insurance policies – Superior subject naming or transformation necessities
Within the following sections, we focus on the structure and deployment approaches to be used instances the place MSK Replicator and MirrorMaker 2 are the suitable selections.
MSK Replicator resolution overview
The next diagram illustrates the structure for utilizing MSK Replicator.
We create two MSK clusters – one within the main Area, the opposite within the secondary Area as a standby cluster for catastrophe restoration. We deploy MSK Replicator within the secondary area to copy subjects, ACLs, knowledge, and shopper group offsets from the first cluster. On this resolution, we showcase a single-direction replication for active-passive catastrophe restoration. This resolution may also be prolonged for active-active catastrophe restoration situations. Our Kafka shoppers connect with the first cluster and could be configured to hook up with the secondary cluster within the occasion of a catastrophe restoration failover.
For particulars on implementation steps, discuss with Introducing Amazon MSK Replicator – Totally Managed Replication throughout MSK Clusters in Identical or Completely different AWS Areas. For particulars on catastrophe restoration situations, discuss with Use replication to extend the resiliency of a Kafka streaming utility throughout Areas. These sources present the next advantages:
Full deployment steps – Step-by-step deployment course of for MSK Replicator between areas
Complete examples – A number of deployment situations and configurations
Failover course of – Key steps in executing a catastrophe restoration failover when utilizing MSK Replicator
MirrorMaker2 resolution overview
The next diagram illustrates the structure for utilizing MirrorMaker 2.
We create an MSK cluster within the main Area, with the prevailing Kafka cluster on premises. This Kafka cluster is analogous to Kafka clusters operating in different clouds, or in self-managed Kafka clusters on Amazon EC2. On this resolution, we showcase a single-direction replication for cluster migration situations. Our Kafka shoppers work together with the on-premises Kafka cluster and could be migrated to run on AWS to work together with the MSK cluster.
Moderately than manually configuring every element, we suggest utilizing the automated deployment sources accessible within the following GitHub repository. For a step-by-step walkthrough of deploying MirrorMaker 2 on Amazon ECS with Fargate utilizing auto scaling, discuss with Amazon MSK Migration Workshop: Modernizing with Specific Brokers. These sources present the next advantages:
Infrastructure as code – Terraform for MSK clusters and supporting infrastructure
Containerized Kafka Join – Docker photos optimized for AWS
Amazon ECS with AWS Fargate deployment – Scalable, serverless container deployment utilizing Amazon Elastic Container Service (Amazon ECS) with AWS Fargate
Auto scaling configuration – Automated scaling based mostly on workload calls for
Complete examples – A number of deployment situations and configurations
Migration course of – Key steps in executing a Kafka migration utilizing MM2
Conclusion
Selecting the best replication resolution depends upon your particular necessities. We suggest utilizing MSK Replicator when replicating from one MSK cluster to a different and also you need a totally managed resolution for catastrophe restoration. MirrorMaker 2 is beneficial for migrations to Amazon MSK, hybrid environments, or once you want advanced customized replication insurance policies.
For MSK Replicator deployments, discuss with Introducing Amazon MSK Replicator – Totally Managed Replication throughout MSK Clusters in Identical or Completely different AWS Areas and Use replication to extend the resiliency of a Kafka streaming utility throughout Areas.
For MirrorMaker 2 deployments, discuss with the GitHub repository and Amazon MSK Migration Workshop to implement production-ready options with automated deployment, monitoring, and scaling capabilities.
These approaches present a customizable set of choices for knowledge redundancy and enterprise continuity capabilities wanted to satisfy regulatory compliance and catastrophe restoration necessities, whereas minimizing operational overhead by means of automation and finest practices.
In regards to the Creator
Mazrim a number of
Mazrim is a Sr. Specialist Options Architect for messaging and streaming workloads. Mazrim works with prospects to construct and assist programs that course of and analyze terabytes of streaming knowledge in actual time, run enterprise Machine Studying pipelines, and create programs to share knowledge throughout groups seamlessly with various knowledge toolsets and software program stacks.