Confluent Cloud Migration Checklist

The migration of a Kafka cluster is a complex process. This checklist is a guide for migration into Confluent Cloud it is a starting point for discussions with development and operation teams.

Confluent Cloud Provisioning

Before migration, you need a Confluent Cloud instance to migrate into. This should be done with careful planning, ensuring access and network is correct.

  • Cluster Type
  • Networking
  • Provisioning
  • Connectors
    • Managed - Which will be managed by Confluent Cloud?
    • Self Hosted - Which need to be self-hosted?
      • Where will self-hosted connector be deployed?
      • How will it be secured?
  • Schema Registry
    • How many schemas do you anticipate having?
    • understanding logical and physical deletes
  • Developer Access
    • Confluent Cloud Control plane and data plane are separate.
    • Will the network of developer machines have access?
  • Environments
    • Number of environments
      • Naming standards to share confluent cloud environments could make migration harder.
    • Sharing (replicating) data between environments
      • Do you replicate production data to staging for testing?
      • PCI/PII concerns?
  • Network Validation
    • Can you produce to Confluent Cloud from the network of your source cluster?
    • Can you consume from the source cluster from the network you will be using for our confluent cloud applications?
  • Monitoring
    • How will you be monitoring your cluster?
    • Metric Understanding
      • Unlike typical Apache Kafka metrics collection, many metrics are “since the previous” and are per minute, not per second.
        • From documentation: “Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.”
        • Metrics should be reviewed and understood.

Client Migration

Prior to migration of data, a plan for client migration is also required. Apache Kafka provides a rich set of security options, but not all Kafka-API SAAS services support all of them. There is likely a need to change how your client applications connect between instances. Ensure each application instance can connect to either cluster with appropriate credentials. This flexibility makes migration smoother.

  • Will there need to be a change to the security.protocol and/or sasl.mechanism to the clients?
    • Migration from mTLS
    • Migration from IAM (MSK)
    • sasl mechanism change
  • Security configuration provisioning with clients
  • Serdes Migration
    • Migration from AWS Glue Schema Registry requires migration from schema as well as message changes.
  • Schema Registry Migration
  • consumer group.id reuse or rename?
  • kafka streams application.id reuse or rename?
  • Smoke Test
    • how deploy a consumer in both environments w/out impacting down-stream systems?

Replication

Replication Toolage

  • Cluster Linking
    • If you are migrating from confluent platform or another confluent cloud instance, use cluster linking.
    • Moving from a confluent instance to another confluent instance also means there isn’t a serdes concern, making things easier.
    • Schema Registry migration still has complexities, and needs to be addressed.
    • For the sake of this migration checklist, it is assumed that cluster linking is not an available option.
  • Replicator
    • How to handle schema migration?
    • How to handle offset cutover w/out loss of messages
  • Mirror Maker 2
    • How to handle schema migration?
    • How to handle offset cutover w/out loss of messages
  • Change Data Capture (CDC)
    • Instead of replicating data from one cluster to another, go to the source system and replicate it from there; such as CDC sources.
    • schema migration handled in the configuration of the CDC tool
  • Custom Applications
    • Consumer from one cluster and write to another cluster with full control of any serde changes or other data conversions necessary.
    • schema migration handled in the application code

Replication Client Security

The system replicating the topics needs authentication and authorization to both cluster.

  • source cluster credentials
  • source cluster authorization
  • destination cluster credentials
  • destination cluster authorization

Replication Infrastructure

How is the replication code going to be deployed?

  • Existing Connect cluster
  • New Connect cluster
  • Standalone deployment
  • Ccustom code

Inventory

Inventory of Kafka resources to be migrated group in units that can be migrated together. The goal is to minimize the complexity of any migration and big-bang migrations, which can be challenging to roll back.

  • For each Business Context
    • Topics
    • Schemas
      • Will schemas be created by the producer’s serde, or will they be created through CI/CD (REST API)?
    • Consumer Groups
    • ACLs
    • Producers
    • Consumers
    • Connectors
      • Managed
      • Self Hosted
    • Stream Processing
      • Kafka Streams
      • Flink
      • KSQL
      • Other

Data Migration

Data migration is typically done by a specific business context or domain. Orchestration depends on the complexity of the your streaming platform.

Key considerations

  • idempotency
    • if end systems are not idempotent, how to handle deduplication
  • compacted topics (kTables)
    • migration of data vs application recreating them
    • each application’s design to use compacted topics is different, need to understand business domain of that compacted topic.

Steps

These steps are a starting point. Once a set of topics, applications, and consumer groups are identified as a unit to be migrated; review and adjust accordingly.

  • Replicate Source Topics
  • Migrate Consumer Groups
    • evaluate offset migration
      • start at earliest, latest, setting offset by timestamp, etc.
  • Migrate Consumers
  • Migrate Sink Connectors
  • Migrate Streams
    • while streams are consumers and producers, they tent to consumer first and then produce, so they are considered a consumer application for migration, but additional analysis is required.
  • Migrate Source Connectors
  • Disable Original Producers
  • Disable Original Source Connectors
  • Disable Replication
  • Migrate Producers
    • can rollback be achieve by producers being reset?
    • if not consider dual producing to both clusters (hence why replication is disabled)?

Migration “Gotchas”

Some additional considerations and trouble scenarios that can happen with a migration.

  • Remove Application Cluster Connection Assumptions, if possible.
    • Kafka clients can be configured dynamically, allowing for authentication to be changed through configuration; ensure your development team hasn’t made configuration assumptions
    • For example, application assumes ‘mTLS’ authentication and fails if keystore is not provided, but you now want to connect to a cluster with SASL.
  • Timestamp Preservation
    • If a topic in the source system is created with Log Append Time, if the destination cluster also has topic configured with log append time, those timestamps will change when replicated. Consider using producer timestamp until replication is completed, and then migrate to LogAppendTime once producers are migrated to the new cluster.
  • Schema Preservation
    • Until Schema Registry 7.0, all schemas would have a unique id. This makes migration of data between clusters quite challenging, due to the destination cluster not having the same IDs, as IDs were created as schemas were added.
    • leverage Contexts and Import functionality.
    • byte-for-byte copy considerations when it comes to schema migration.
    • Compacted Topics can have old data (and schema ids)
  • Kafka Streams Migration
    • state stores (changelog topics)
      • how to hydrate changelog topics
      • will windowed state stores need to be migrated?
  • CDC Migration
    • cutover
  • retention.ms and old data.
    • example:
      • source topic has 7-day retention
      • intermediate stream topics have 5-day retention
      • replication of data starting from 7 days ago could lead to records dropped during stream processing.
  • increasing partitions at time of migration
    • replication method cannot be configured to preserve partition number for messages
    • could lead to out of order events
      • Kafka Stream processing with time semantics and grace periods can help with this.
    • verifying hashing algorithm (see below)
  • key hashing
    • kafka-clients default hashing algorithm, murmur2.
    • librdkafka default hashing algorithm, crc2.
    • when using migration tools on topics where partitioning is being preserved, force partition matching.
    • otherwise, verify hashing algorithms to avoid a replication tool using different algorithm than your applications.