What is distributed tracing ?
Micro-services is a typical distributed system. It has all the benefits from the distributed design. But it can be challenging to troubleshoot the request send to the each micro-services since the journey of the request may involve sequences of multiple service calls.
Distributed tracing is a method of tracking application requests as they flow from frontend devices to backend micro-services(including databases/message queue etc.). We can use distributed tracing to troubleshoot requests that exhibit high latency or errors and pinpoint any performance failures or bottlenecks that occurred along the way.
Overview of OpenTelemetry
OpenTelemetry is an open source observability framework. It offers vendor-agnostic or vendor-neutral APIs, software development kits (SDKs). Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help to understand the distributed system’s performance and behavior.
You could read more on this:
https://opentelemetry.io/docs/concepts/what-is-opentelemetry/
https://www.dynatrace.com/news/blog/what-is-opentelemetry-2/
Architecture of our solution
There are three steps in the whole tracing data flow:
Gathering/instrumenting traces data in the application side
Process / transform the traces data (optional)
Store the tracing data in the telemetry backend
Instrumenting
Our micro-services are built with Kotlin and run in jvm, and OpenTelemetry provides several options for instrumenting Java based application:
opentelemetry-java: Components for manual instrumentation including API and SDK as well as extensions, the OpenTracing shim.
opentelemetry-java-instrumentation: Built on top of opentelemetry-java and provides a Java agent JAR that can be attached to any Java 8+ application and dynamically injects bytecode to capture telemetry from a number of popular libraries and frameworks.
opentelemetry-java-contrib: Provides helpful libraries and standalone OpenTelemetry-based utilities that don’t fit the express scope of the OpenTelemetry Java or Java Instrumentation projects. For example, JMX metric gathering.
Our solution is based on using opentelemetry-java-instrumentation to provide auto-instrumenting and use opentelemetry-java to add more customized(manual) instruments when we want to customize it (add customized traces / metrics).
The core idea for the solution is to put the auto-instrumentation function inside a java agent and run alongside with our micro-service application. So there is no code change needed for our application, the tracing ability will be dynamically added by the OpenTelemetry java agent.
You could refer to https://www.baeldung.com/java-instrumentation for more details about java agent.
Store tracing data
We use https://grafana.com/oss/tempo/ to store our tracing data. It has native support for OpenTelemetry standard and protocol. And we could use Grafana for visualizing the tracing data stored in Tempo.
Here is a high level diagram about the solution architecture
Implementation details
OpenTelemetry Java Agent setup
We added the support in our based Kotlin Helm chart to have the option to turn on/off the auto-tracing ability for the micro-service application.
# -- Java Agent OpenTelemetry Integration tracing: # -- Enable creation of OTEL env variables enabled: true # -- OTEL trace exporter traces_exporter: otlp # -- OTEL metrics exporter metrics_exporter: none # -- OTEL exporter endpoint endpoint: https://tempo.monitoring.dev.safibank.online
Note that we have turned off the metrics exporter since we are doing the metrics gathering with https://micronaut-projects.github.io/micronaut-micrometer/latest/guide/
Request Header from front-end request
traceparent
for sending root traceId and spanId which generated from front-end app. Please refer to https://www.w3.org/TR/trace-context/#traceparent-header for more details about the format.tracestate
for sending customized data which can be propagated through the trace context in OpenTelemetry along the way. Please refer to https://www.w3.org/TR/trace-context/#tracestate-header for more details about the format
Logging with tracing
We have added the support for logging the tracing related data automatically in the common logger. And the way we are doing this is by using the MDC support in OpenTelemetry Java Agent. https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/logger-mdc-instrumentation.md.
Another thing we have added is to automatically log tracestate data which sent by front-end if any(such as customerId / accountId).
Implementation details
class GCPConsoleJsonLayout : StackdriverJsonLayout() { override fun addCustomDataToJsonMap(map: MutableMap<String, Any>, event: ILoggingEvent) { ... addTraceStateData(map) } private fun addTraceStateData(map: MutableMap<String, Any>) { Span.current().spanContext.traceState.forEach { _, value -> value.split(";").forEach { item -> val splits = item.split(":") val itemKey = splits.first() val itemValue = splits.last() map[itemKey] = itemValue } } } }
Please refer to the common logger for more details: https://github.com/SafiBank/SaFiMono/tree/main/common/utils
Attachments:
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
OpenTelemetry Java Agent (application/vnd.jgraph.mxfile)
OpenTelemetry Java Agent.png (image/png)
~drawio~557058:229b5867-a6cd-46a1-9572-1eb4b6e6294b~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
image-20221112-232151.png (image/png)
~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
~OpenTelemetry Java Agent.tmp (application/vnd.jgraph.mxfile)
OpenTelemetry Java Agent (application/vnd.jgraph.mxfile)
OpenTelemetry Java Agent.png (image/png)
Screen Shot 2022-11-14 at 14.52.14-20221114-065222.png (image/png)
Screen Shot 2022-11-14 at 14.55.28-20221114-065534.png (image/png)
Screen Shot 2022-11-14 at 14.58.58-20221114-065904.png (image/png)