Cluster Linking on Confluent Cloud

How it Works

  • Cluster Linking allows one Confluent cluster to mirror data directly from another. You can establish a cluster link between a source cluster and a destination cluster in a different region, cloud, line of business, or organization. You choose which topics to replicate from the source cluster to the destination. You can even mirror consumer offsets and ACLs, making it straightforward to move Kafka consumers from one cluster to another.

What is Cluster Linking?

  • Cluster Linking on Confluent Cloud is a fully-managed service for moving data from one Confluent cluster to another. Programmatically, it creates perfect copies of your topics and keeps data in sync across clusters. Cluster Linking is a powerful geo-replication technology for:

    • Multi-cloud and global architectures powered by real-time data in motion

    • Data sharing between different teams, lines of business, or organizations

    • High Availability (HA)/Disaster Recovery (DR) during an outage of a cloud provider’s region

    • Data and workload migration from a Apache Kafka® cluster to Confluent Cloud or Confluent Platform cluster to Confluent Cloud

    • Protect Tier 1, customer-facing applications and workloads from disruption by creating a read-replica cluster for lower-priority applications and workloads

    • Hybrid cloud architectures that supply real-time data to applications across on-premises datacenters and the cloud

    • Syncing data between production environments and staging or development environments

  • Cluster Linking is fully-managed in Confluent Cloud, so you don’t need to manage or tune data flows. Its usage-based pricing puts multi-cloud and multi-region costs into your control. Cluster Linking reduces operational burden and cloud egress fees while improving the performance and reliability of your cloud data pipelines.

Use Cases

Confluent provides multi-cloud, multi-region, and hybrid capabilities in Confluent Cloud. Many of these are demo’ed in the Tutorials, as well as in tutorials specific to each use case.

  • Global and Multi-Cloud Replication: Move and aggregate real-time data across regions and clouds. By making geo-local reads of real-time data possible, this can act like a content delivery network (CDN) for your Kafka events throughout the public cloud, private cloud, and at the edge.

  • Data Sharing - Share data in real-time with other teams, lines-of-business, or organizations.

  • Data Migration - Migrate data and workloads from one cluster to another.

  • Disaster Recovery and High Availability - Create a disaster recovery cluster, and fail over to it during an outage.

  • Tiered Separation of Critical Applications - Protect Tier 1, customer-facing applications and workloads from disruption by creating a read-replica cluster for lower-priority applications and workloads.

Cluster Linking mirroring throughput (the bandwidth used to read data or write data to your cluster) is counted against your Limits per CKU.

Supported Cluster Types

  • A cluster link sends data from a “source cluster” to a “destination cluster”. The supported cluster types are shown in the table below.

  • Unsupported cluster types and other limits are described in Limitations.

Source Cluster Options

Destination Cluster Options

Any Dedicated Confluent Cloud cluster

Dedicated Confluent Cloud cluster with private networking

Dedicated Confluent Cloud cluster under certain networking circumstances, see Cluster Linking with Private Networking on Confluent Cloud.

Apache Kafka® 2.4+ or Confluent Platform 5.4+ with public internet IP addresses on all brokers

Any Dedicated Confluent Cloud cluster

Confluent Platform 7.1+ (even behind a firewall)

Any Dedicated Confluent Cloud cluster under certain networking circumstances, see Cluster Linking with Private Networking on Confluent Cloud.

Any Dedicated Confluent Cloud cluster under certain networking circumstances, see Cluster Linking with Private Networking on Confluent Cloud.

Confluent Platform 7.0+ (even behind a firewall)

How to Check the Cluster Type

Pricing

  • Confluent Cloud clusters that use Cluster Linking are charged based on the number of cluster links and the volume of mirroring throughput to or from the cluster.

  • For a detailed breakdown of how Cluster Linking is billed, including guidelines for using metrics to track your costs, see Cluster Linking in Confluent Cloud Billing.

  • More general information regarding prices for Confluent Cloud are on the website on the Confluent Cloud pricing page.

  • Document page

Tutorials

To get started, try one or more tutorial, each of which maps to a use case.

Mirror Topics

Read-only, mirror topics that reflect the data in original (source) topics are the building blocks of Cluster Linking. For a deep dive on this specialized type of topic and how it works, see Mirror Topics.

Commands and Prerequisites

  • For cli approach documentation page can be viewed here

Limitations

  • This section details support and known limitations in terms of cluster types, cluster management, and performance.

    • Cluster Types and Networking

      • Currently supported cluster types are described in Supported Cluster Types.

      • A given cluster can only be the destination for five cluster links. Cluster Linking does not currently support aggregating data from more than five sources.

      • Cluster links between two Transit Gateway clusters in different regions or different Confluent Cloud networks must be created in the Confluent CLI, REST API, or Terraform. They cannot be created in the Confluent Cloud Console, yet.

    • Security

      • OAuth is not supported for the Cluster Linking credential on Confluent Cloud clusters.

      • To learn more, see information about the Security Model for Cluster Linking.

    • ACL Syncing

    • Management Limitations

    • Performance Limits

      • Throughput

        For Cluster Linking, throughput indicates bytes-per-second of data replication. The following performance factors and limitations apply.

        • Cluster Linking throughput (bytes-per-second of data replication) counts towards the destination cluster’s produce limits (also known as “ingress” or “write” limits). However, production from Kafka clients is prioritized over Cluster Linking writes; therefore, these are exposed as separate metrics in the Metrics API: Kafka client writes are received_bytes and Cluster Linking writes are cluster_link_destination_response_bytes

        • Cluster Linking consumes from the source cluster similar to Kafka consumers. Throughput (bytes-per-second of data replication) is treated the same as consumer throughput. Cluster Linking will contribute to any quotas and hard or soft limits on your source cluster. The Kafka client reads and Cluster Linking reads are therefore included in the same metric in the Metrics API: sent_bytes

        • Cluster Linking is able to max out the throughput of your CKUs. The physical distance between clusters is a factor of Cluster Linking performance. Confluent monitors cluster links and optimizes their performance. Unlike Replicator and Kafka MirrorMaker 2, Cluster Linking does not have a unique scaling (that is, tasks). You do not need to scale up or scale down your cluster links to increase performance.

      • Connections

        • Cluster Linking connections count towards any connection limits on your clusters.

      • Request rate

        • Cluster Linking contributes requests which count towards your source cluster’s request rate limits.

Documentation page from the confluent cloud:

Backup and Restore Google Cloud Storage Source Connector for Confluent Platform

The Kafka Connect Backup and Restore Google Cloud Storage (GCS) Source connector provides the capability to read data exported to GCS by the Kafka Connect GCS Sink connector and publish it back to a Kafka topic. Depending on the format and partitioner used to write the data to GCS, this connector can write to the destination topic using the same partitions as the original messages exported to GCS and maintain the same message order. The connector selects folders based on the partitioner configuration and reads each folders GCS objects in alphabetical order. Each record is read based on the format selected. Configuration is designed to mirror the Kafka Connect GCS Sink connector and should be possible to create source connector configs with only minor changes to the original sink configuration.

  • The recommended practice is to create topics manually in the destination Kafka cluster with the correct number of partitions before running the source connector. If the topics do not exist, Connect relies on configuring auto topic creation for source connectors and the number of partitions are based upon the Kafka broker defaults. If there are more partitions in the destination cluster, the extra partitions are not used. If there are fewer partitions in the destination cluster, the connector task throws an exception and stops the moment it tries to write to a Kafka partition that does not exist.

Attachments: