Skip to content

Spring Cloud Sleuth#

What Is The Spring Cloud Sleuth?#

  • Spring Cloud Sleuth provides Spring Boot auto-configuration for Distributed Tracing. Underneath, Spring Cloud Sleuth is a layer over a Tracer library named Brave.

  • Sleuth configures everything you need to get started. This includes where trace data (spans) are reported to, how many traces to keep (sampling), if remote fields (baggage) are sent, and which libraries are traced.

  • Spring Cloud Sleuth is able to trace your requests and messages so that you can correlate that communication to corresponding log entries. You can also export the tracing information to an external system to visualize latency. Spring Cloud Sleuth supports OpenZipkin compatible systems directly.

  • More information

Why Spring Cloud Sleuth?#

  • Spring Cloud Sleuth provides Spring Boot auto-configuration for distributed tracing.
  • Specifically, Spring Cloud Sleuth…​

    • Adds trace and span ids to the Slf4J MDC, so we can extract all the logs from a given trace or span in a log aggregator.
    • Instruments common ingress and egress points from Spring applications (servlet filter, rest template, scheduled actions, message channels, feign client).
  • If spring-cloud-sleuth-zipkin is available then the app will generate and report Zipkin-compatible traces via HTTP. By default it sends them to a Zipkin collector service on localhost (port 9411). Configure the location of the service using spring.zipkin.baseUrl.

What Is The Zipkin?#

  • Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures. Features include both the collection and lookup of this data.

  • Applications need to be “instrumented” to report trace data to Zipkin. This usually means configuration of a tracer or instrumentation library. The most popular ways to report data to Zipkin are via HTTP or Kafka, though many other options exist, such as Apache ActiveMQ, gRPC and RabbitMQ. The data served to the UI are stored in-memory, or persistently with a supported backend such as Apache Cassandra or Elasticsearch.

  • More information

What Is The Brave Library?#

  • Brave is a distributed tracing instrumentation library. Brave typically intercepts production requests to gather timing data, correlate and propagate trace contexts. While typically trace data is sent to Zipkin server, third-party plugins are available to send to alternate services such as Amazon X-Ray.

How Does Spring Cloud Sleuth Work?#

Context Generation#

  • Firstly, let's take a look into the picture below, as you can see when the service A receive an request from an external system then the TraceContext will be generated and then will be propagated to other services in micro service system.
  • At this step, firstly spring cloud sleuth will check in the incoming headers that it contains the specific header X─B3─TraceId, so if the incoming headers doesn't contain this header then the spring cloud sleuth will knows that this is the first request to the micro service system then it will generate the spanId, traceId and create the TraceContext.

 #zoom

  • In the TraceContext we will have 3 identifiers which are traceId, spanId and parentSpanId and a sampling state which is sampled. We also have the headers from B3 Propagation, which is built-in to Brave and has implementations in many languages and frameworks. Let's check the table below for more details.
Fields Header Type Definition Example Value
TraceId X─B3─TraceId itendifier The TraceId is 64 or 128-bit in length and indicates the overall ID of the trace. Every span in a trace shares this ID. 80f198ee56343ba864fe8b2a57d3eff7 (128 bits), 05e3ac9a4f6e3b90 (64 bits)
SpanId X─B3─SpanId itendifier The SpanId is 64-bit in length and indicates the position of the current operation in the trace tree. The value should not be interpreted: it may or may not be derived from the value of the TraceId. 05e3ac9a4f6e3b90
ParentSpanId X─B3─ParentSpanId itendifier The ParentSpanId is 64-bit in length and indicates the position of the parent operation in the trace tree. When the span is the root of the trace tree, there is no ParentSpanId. 05e3ac9a4f6e3b90
Sampled X─B3─Sampled Sampling State Sampling is a mechanism to reduce the volume of data that ends up in the tracing system. In B3, sampling applies consistently per-trace: once the sampling decision is made, the same value should be consistently sent downstream. This means you will see all spans sharing a trace ID or none. true

Context Propagation#

  • B3 Propagation is a specification for the header "b3" and those that start with "x-b3-". These headers are used for the traceContext propagation across service boundaries.
  • Base on B3 , Sleuth propagates the traceContext across service boundaries. When a request enters a service, Sleuth extracts the traceContext from the incoming request headers. It then adds the trace and span information to the outgoing requests, ensuring that the traceId is propagated across multiple services. This allows for end-to-end tracing of requests.
  • For example, when a downstream HTTP call is made, its traceContext is encoded as request headers and sent along with it, as shown in the following images.

 #zoom

spring-cloud-sleuth-trace-context.excalidraw.png

  • Now, we need to understand when do spans are created, recorded and how does the traceContext is propagated.
  • Firstly, we have the concept Annotation/Event which is used to record the existence of an event in time. These events to highlight what kind of an action took place (it doesn’t mean that physically such an event will be set on a span).
Name Full Name Description
cs Client Send The client has made a request. This annotation indicates the start of the span.
sr Server Received The server side got the request and started processing it. Subtracting the cs timestamp from this timestamp reveals the network latency.
ss Server Sent Annotated upon completion of request processing (when the response got sent back to the client). Subtracting the sr timestamp from this timestamp reveals the time needed by the server side to process the request.
cr Client Received Signifies the end of the span. The client has successfully received the response from the server side. Subtracting the cs timestamp from this timestamp reveals the whole time needed by the client to receive the response from the server.
  • Now, let's take a look about how do the spanIds are generated and sent between Client and Server as in the image below.

 #zoom

  • As you can see in the image above, a new span will be created every time the server received, client send and server received events are determined and these span will be sent to the distributed tracing system like Zipkin when the events server sends, client received and server sends are determined respectively.
  • Beside it, the traceContext will be mapped and propagated as in the image below without shared spanIds. It means at steps server received, client send and server received, new spanIds are always generated.

spring-cloud-sleuth-trace-context.excalidraw.png

Sharing span IDs between Client and Server#

  • Okay, this part will be a little bit complicated so we will investigate step by step.
  • For the TraceContext Propagation without sharing spanIds, it will look like as in the image below. In which, when the Server Received, the traceContext is extracted from the incoming request headers and then it will be mapped to the new TraceContext with new spanId

spring-cloud-sleuth-trace-context.excalidraw.png

  • So on Zipkin, we will see a trace with 3 spans as in the image below.

 #zoom

  • Now, following the B3, by default the Spring Cloud Sleuth will enable sharing the spanId between Client and Server. In which, when the Server Received, the traceContext is extracted from the incoming request headers and then it will join to the new TraceContext without generating new spanId or we can say the traceContext of event Server Received will use all the information from extracted traceContext of incoming request.

 #zoom

  • Now, if we look into the image below, then there is only two Spans will be displayed to Zipkin server because the client Send span had joined with the server received span. They use the same the same traceId, spanId and parentId.

 #zoom

See Also#

References#