For overall strategy on how we handle DLQ, please refer to DLQ Management & Error Handling in Kafka

Here is the original plan for implementing the overall DLQ strategy:

  1. Build DLQ common lib to be used by each micro-service:

    • Design a common library that provides functionality for sending messages to a DLQ and do auto-retry.

    • Integrate the DLQ library with each micro-service to enable seamless handling of failed messages.

    • Provide a flexible configuration interface to allow developers to customize the behavior of the DLQ library.

  2. Build DLQ topic creation scripts:

    • Develop scripts to create DLQ topics in the messaging system used by the micro-services.

    • Ensure that the topics are configured appropriately for DLQ functionality, such as message retention policies, message size limits, and access controls.

    • Integrate the topic creation scripts into the CI to ensure that DLQ topics are created automatically.

  3. Setup monitoring for DLQ topics:

    • Configure monitoring tools to collect metrics for each KPI, such as Prometheus or Confluent Cloud native monitoring.

    • Create alerting rules that trigger notifications to the appropriate team members when new DLQ messages are sent.

  4. Setup DLQ message handling process