For overall strategy on how we handle DLQ, please refer to DLQ Management & Error Handling in Kafka
Here is the original plan for implementing the overall DLQ strategy:
Build DLQ common lib to be used by each micro-service:
Design a common library that provides functionality for sending messages to a DLQ and do auto-retry.
Integrate the DLQ library with each micro-service to enable seamless handling of failed messages.
Provide a flexible configuration interface to allow developers to customize the behavior of the DLQ library.
Build DLQ topic creation scripts:
Develop scripts to create DLQ topics in the messaging system used by the micro-services.
Ensure that the topics are configured appropriately for DLQ functionality, such as message retention policies, message size limits, and access controls.
Integrate the topic creation scripts into the CI to ensure that DLQ topics are created automatically.
Setup monitoring for DLQ topics:
Configure monitoring tools to collect metrics for each KPI, such as Prometheus or Confluent Cloud native monitoring.
Create alerting rules that trigger notifications to the appropriate team members when new DLQ messages are sent.
Setup DLQ message handling process