当前位置：网站首页>A detailed explanation of the implementation principle of go Distributed Link Tracking

A detailed explanation of the implementation principle of go Distributed Link Tracking

2022-06-24 19:00:00 【51CTO】

In distributed 、 Microservices architecture , The application of a request often runs through multiple distributed services , This gives application troubleshooting 、 Performance optimization brings new challenges . Distributed link tracing is an important technology to solve the observable problem of distributed applications , It has increasingly become an indispensable infrastructure for distributed applications . This article will introduce the core concept of distributed link in detail 、 Architecture principles and related open source standard protocols , And share what we are doing to achieve non intrusiveness Go collection Sdk Some practices in this field .

Why do I need a distributed link tracking system

Microservice architecture for operation and maintenance 、 Troubleshooting brings new challenges

Under distributed architecture , When a user initiates a request from a browser client , Back end processing logic often runs through multiple distributed services , At this time, many problems will arise , such as ：

The overall request takes a long time , Which service is slow ？
An error occurred during the request , Which service reported an error ？
What is the amount of requests for a service , How successful is the interface ？

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native

It's not easy to answer these questions , We don't just need to know the interface processing statistics of a service , You also need to understand the interface invocation dependencies between the two services , Only by establishing the temporal and spatial order of the entire request among multiple services , To better help us understand and position the problem , And this , This is what the distributed link tracking system can solve .

How can distributed link tracking systems help us

The core idea of distributed link tracking technology ： When the user makes a distributed request for service ⽤ In the process , Record the calling process and time-space relationship tracking of the request between all subsystems , Restore to the centralized display of the call link , The information includes the time consumption on each service node 、 Which machine does the request go to 、 The request status of each service node and so on .

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _02

As shown in the figure above , After the complete request link is built through distributed link tracking , You can intuitively see the service phase in which the request time is mainly spent , Help us focus more quickly . meanwhile , The collected link data can also be further analyzed , Thus, the dependency relationship between the services of the whole system can be established 、 And the flow , Help us better troubleshoot the system's circular dependencies 、 Hot services and other issues .

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _03

Overview of distributed link tracking system architecture

The core concept

In a distributed link tracking system , The core concept , This is the data model definition of link tracking , It mainly includes Trace and Span.

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Microservices _04

among ,Trace It 's a logical concept , Said a （ Distributed ） All local operations that the request goes through （Span） A complete directed acyclic graph , All of them Span Of TraceId identical . Span Is the real data entity model , Said a ( Distributed ) A step or operation in a request process , Represents a logical operation unit in the system ,Span To establish a causal relationship through nesting or sequencing .Span Data is generated at the acquisition end , After that, report to the server , Do further processing . It contains the following key attributes ：

Name： Operation name , Like a RPC Method name , A function name
StartTime/EndTime： Start time and end time , The life cycle of the operation
ParentSpanId： Parent Span Of ID
Attributes： attribute , A group of <K,V> A collection of key value pairs
Event： Events that occur during operation
SpanContext：Span Context content , Usually used in Span Spread between , Its core fields include TraceId、SpanId

General architecture

The core task of distributed link tracking system is ： around Span Generation 、 spread 、 collection 、 Handle 、 Storage 、 visualization 、 analysis , Build a distributed link tracking system . Its general architecture is as follows ：

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _05

We see , At the application end, it is necessary to use intrusive or non intrusive methods , Inject Tracing Sdk, To follow 、 Generate 、 Propagation and escalation requests invoke link data ;
Collect agent It is usually at an edge computing layer near the application side , Mainly used to improve Tracing Sdk Write performance , And reduction back-end The calculated pressure of ;
When the collected link tracking data is reported to the backend , First pass through Gateway Do an authentication , After entering kafka In this way MQ Buffer and store messages ;
Before data is written to the storage tier , We may need to clean and analyze the data in the message queue , Cleaning is to standardize and adapt the data reported by different data sources , Analysis is usually done to support more advanced business functions , Such as traffic statistics 、 Error analysis, etc , This part usually adopts flink This kind of flow processing framework to complete ;
The storage layer will be a key point in the design and selection of the server , Consider the data level and the characteristics of query scenarios to design and select models , Common choices include using Elasticsearch、Cassandra、 or Clickhouse Such open source products ;
Results of stream processing analysis , On the one hand, it is persistent as storage , On the other hand, it will also enter the alarm system , Notify users by proactively discovering problems , Such as the requirement that the error rate exceeds the specified threshold to send an alarm notification .

What I have just said , Is a common architecture , We did not cover the details of each module , Especially on the server side , It takes a lot of effort to explain each module in detail , Limited by space , We focus on the application side Tracing Sdk, Focus on how to track and collect link data on the application side .

Protocol standard and open source implementation

Just now we mentioned Tracing Sdk, In fact, this is just a concept , Specific to the realization , There may be many choices , The reason for this , Mainly because ：

Application of different programming languages , Different technical principles may be used to track the call chain
Different link tracking backend , Different data transmission protocols may be used

At present , The popular link tracing backend , such as Zipin、Jaeger、PinPoint、Skywalking、Erda, There are integrated sdk, As a result, we may need to make large adjustments on the application side when switching back ends . There have been different agreements in the community , Try to solve this mess on the acquisition side , such as OpenTracing、OpenCensus agreement , These two agreements are also followed up and supported by some large manufacturers , But in recent years , The two have moved towards integration and unification , A new standard has emerged OpenTelemetry, These two years have witnessed rapid development , It has gradually become the industry standard .

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _06

OpenTelemetry Define the standards for data collection api, It also provides a set of out of the box for multiple languages sdk Implementation tools , such , The application only needs to be connected with OpenTelemetry The core api Package strong coupling , There is no need to be strongly coupled to a particular implementation .

Overview of application side call chain tracking implementation scheme

Application side core tasks

Application side surround Span, There are three core tasks to complete ：

Generate Span： Operation starts to build Span And fill in StartTime, Fill in when the operation is completed EndTime Information , Period can be added Attributes、Event etc.
spread Span： Pass in process context.Context、 Between processes by request header As SpanContext The carrier of , The core message of communication is TraceId and ParentSpanId
Report Span： Generated Span adopt tracing exporter Send to collect agent / back-end server

To achieve Span Generation and dissemination of , It requires us to be able to intercept the key operations of the application （ function ） The process , And add Span Related logic . There are many ways to achieve this , however , Before listing these methods , Let's first look at OpenTelemetry Provided go sdk How to do it in .

be based on OTEL The library implements call interception

OpenTelemetry Of go sdk The basic idea of implementing call chain interception is ： be based on AOP Thought , Use decorator mode , Replace the target package with a wrapper （ Such as net/http） The core interface or component of , The implementation adds... Before and after the core call procedure Span Related logic . Of course , This approach is somewhat intrusive , You need to manually replace the code call using the original interface implementation to wrap the interface implementation . We start with a http server To illustrate , stay go In language , How to do it ：

Suppose there are two services serverA and serverB, among serverA After the interface of receives the request , The interior will pass httpclient Further initiate to serverB Request , that serverA The core code of may be as shown in the following figure ：

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Server side _07

With serverA Node as an example , stay serverA The node should produce at least two Span：

Span1, Record httpServer A time-consuming situation in the overall internal processing process after receiving a request
Span2, Record httpServer During request processing , Initiate another to serverB Of http Time consuming of the request
also Span1 Should be Span2 Of ParentSpan

We can use OpenTelemetry Provided sdk To achieve Span Generation 、 Dissemination and escalation , Due to space limitations, we will not elaborate on the logic of reporting , Let's focus on how to generate these two Span, And make these two Span Make a connection between , namely Span Generation and dissemination of .

HttpServer Handler Generate Span The process

about httpserver Speaking of , We know its core is http.Handler This interface . therefore , This can be achieved by implementing a http.Handler Interceptor of interface , To be responsible for Span Generation and dissemination of .

       
       package 
       
       http
       
       type 
       
       Handler 
       
       interface {
       
       ServeHTTP(
       
       ResponseWriter, 
       
       *
       
       Request)
       
}
       
       http
       
       .
       
       ListenAndServe(
       
       ":8090", 
       
       http
       
       .
       
       DefaultServeMux)
      
1.
2.
3.
4.
5.
6.
7.

To use OpenTelemetry Sdk Provided http.Handler Decorator , The following adjustments are required http.ListenAndServe Method ：

       
       import (
       
       "net/http"
       
       "go.opentelemetry.io/otel"
       
       "go.opentelemetry.io/otel/sdk/trace"
       
       "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
       
)
       
       wrappedHttpHandler :
       
       = 
       
       otelhttp
       
       .
       
       NewHandler(
       
       http
       
       .
       
       DefaultServeMux, 
       
       .
       
       .
       
       .)
       
       http
       
       .
       
       ListenAndServe(
       
       ":8090", 
       
       wrappedHttpHandler)
      
1.
2.
3.
4.
5.
6.
7.
8.
9.

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _08

As shown in the figure ,wrppedHttpHandler Will mainly implement the following logic （ Simplify your thinking , This part is pseudocode ）： ① ctx := tracer.Extract(r.ctx, r.Header)： From request header Extract from traceparent header And analyze , extract TraceId and SpanId, And then build SpanContext object , And finally stored in ctx in ;

② ctx, span := tracer.Start(ctx, genOperation(r))： Generate a that tracks the processing of the current request Span（ That is, as mentioned above Span1）, And record the start time , It's going to start from ctx Read from SpanContext, take SpanContext.TraceId As the present Span Of TraceId, take SpanContext.SpanId As the present Span Of ParentSpanId, Then treat yourself as a new SpanContext Write the returned ctx in ;

③ r.WithContext(ctx)： Will create a new SpanContext Add to request r Of context in , To be intercepted handler Internal during processing , It can be downloaded from r.ctx Get in the Span1 Of SpanId As its ParentSpanId attribute , So as to establish Span Father son relationship between ;

④ span.End()： When innerHttpHandler.ServeHTTP(w,r) After execution , You need to be right about Span1 Record the completion time of processing , Then send it to exporter Report to the server .

HttpClient Requests to generate Span The process

Let's go on to see serverA Request internally serverB At the time of the httpclient How the request is generated Span Of （ As mentioned earlier Span2）. We know ,httpclient The key operation of sending a request is http.RoundTriper Interface ：

       
       package 
       
       http
       
       type 
       
       RoundTripper 
       
       interface {
       
       RoundTrip(
       
       *
       
       Request) (
       
       *
       
       Response, 
       
       error)
       
}
      
1.
2.
3.
4.
5.

OpenTelemetry An interceptor implementation based on this interface is provided , We need to wrap it with this implementation httpclient Originally used RoundTripper Realization , The code is adjusted as follows ：

       
       import (
       
       "net/http"
       
       "go.opentelemetry.io/otel"
       
       "go.opentelemetry.io/otel/sdk/trace"
       
       "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
       
)
       
       wrappedTransport :
       
       = 
       
       otelhttp
       
       .
       
       NewTransport(
       
       http
       
       .
       
       DefaultTransport)
       
       client :
       
       = 
       
       http
       
       .
       
       Client{
       
       Transport: 
       
       wrappedTransport}
      
1.
2.
3.
4.
5.
6.
7.
8.
9.

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _09

As shown in the figure ,wrappedTransport The following tasks will be completed （ Simplify your thinking , This part is pseudocode ）： ① req, _ := http.NewRequestWithContext(r.ctx, “GET”,url, nil) ： Here we will take the previous step http.Handler Requested by ctx, Pass on to httpclient To be sent request in , Then we can start from request.Context() To extract from Span1 Information about , To establish the Span Relationship between ;

② ctx, span := tracer.Start(r.Context(), url)： perform client.Do() after , Will first enter WrappedTransport.RoundTrip() Method , Here a new Span（Span2）, Start recording httpclient Time consuming of the request , As before ,Start The method will start from r.Context() To extract from Span1 Of SpanContext, And its SpanId As the present Span（Span2） Of ParentSpanId, And thus established Span Nested relationship between , Back at the same time ctx Stored in the SpanContext Will be newly generated Span（Span2） Information about ;

③ tracer.Inject(ctx, r.Header)： The purpose of this step is to integrate the current SpanContext Medium TraceId and SpanId Wait for information to be written to r.Header in , So as to be able to follow http Request to serverB, After the serverB China and the present Span Establishing correlation ;

④ span.End()： wait for httpclient Request to serverB After receiving the response , Mark current Span Trace end , Set up EndTime And submit it to exporter The above check-in server .

be based on OTEL Library implementation call chain trace summary

We introduced the use of OpenTelemetry library , Is the key information about how to implement the link （TraceId、SpanId） How it propagates between and within processes , Let's make a brief summary of this tracking implementation ：

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Server side _10

As the above analysis shows , In this way , The code is still somewhat intrusive , And there is another requirement for the code , Is to keep context.Context The transfer of objects between operations , such as , Just now we were serverA Created in httpclient When asked , It uses http.NewRequestWithContext(r.ctx, ...) Instead of http.NewRequest(...) Method , In addition, open goroutine The asynchronous scenario also needs attention ctx The transfer .

A detailed explanation ｜Go Implementation principle of distributed link tracing _ link _11

Non intrusive call chain tracing implementation ideas

We have just shown in detail a somewhat intrusive implementation based on convention , Its invasiveness is mainly manifested in ： We need to explicitly add the code manually and wrap the source code with a component that has tracking capabilities , This further leads to the need for application code to explicitly reference specific versions of OpenTelemetry instrumentation package , This is not conducive to independent maintenance and upgrade of observable code . Do we have any options to implement non-invasive tracing of call chains ？ No invasion , In fact, it's just that the integration methods are different , The goal of integration is similar , In the end, it all has to be done in some way , Implement the interception of key calling functions , And add special logic , The point of no intrusion is that the code needs no or very little modification .

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _12

The figure above lists some possible implementation ideas of non intrusive integration , And .net、java There are IL The programming language of the language is different ,go Directly into machine code , As a result, the non intrusive scheme is relatively troublesome to implement , There are several specific ideas as follows ：

Compile time injection ： The compiler can be extended , Modify the ast, Insert trace code , Need to adapt to different compiler versions .
Start phase injection ： Modify the compiled machine code , Insert trace code , Need to adapt to different CPU framework . Such as monkey, gohook.
Run phase injection ： Provided through the kernel eBPF Ability , Monitor the execution of key functions of the program , Insert trace code , The future is bright ！ Such as ,tcpdump,bpftrace.

Go Implementation principle of non intrusive link tracing

Erda The core code of the project is mainly based on golang Compiling , We are based on the OpenTelemetry sdk, Adopt the method based on modifying the machine code , A non intrusive link tracing method is realized . As mentioned above , Use OpenTelemetry sdk You need to make some adjustments to the code , Let's see how these adjustments can be done automatically in a non intrusive way ：

A detailed explanation ｜Go Implementation principle of distributed link tracing _ Cloud native _13

We use httpclient For example , Give a brief explanation . gohook Framework provided hook The signature of the interface is as follows ：

       
       // target  want hook The objective function of 
       
       // replacement  The function to replace with 
       
       // trampoline  Copy the source function entry to the location , Can be used from replcement Jump back to original target
       
       func 
       
       Hook(
       
       target, 
       
       replacement, 
       
       trampoline 
       
       interface{}) 
       
       error
      
1.
2.
3.
4.
5.

about http.Client, We can choose hook DefaultTransport.RoundTrip() Method , When the method executes , We go through otelhttp.NewTransport() Original packaging DefaultTransport object , But it should be noted that , We can't put DefaultTransport Act directly as otelhttp.NewTransport() Parameters of , Because of its RoundTrip() Methods have been replaced by us , And the original real method was written trampoline in , So here we need an intermediate layer , To connect DefaultTransport With its original RoundTrip Method . The specific code is as follows ：

       
       //go:linkname RoundTrip net/http.(*Transport).RoundTrip
       
       //go:noinline
       
       // RoundTrip .
       
       func 
       
       RoundTrip(
       
       t 
       
       *
       
       http
       
       .
       
       Transport, 
       
       req 
       
       *
       
       http
       
       .
       
       Request) (
       
       *
       
       http
       
       .
       
       Response, 
       
       error)
       
       //go:noinline
       
       func 
       
       originalRoundTrip(
       
       t 
       
       *
       
       http
       
       .
       
       Transport, 
       
       req 
       
       *
       
       http
       
       .
       
       Request) (
       
       *
       
       http
       
       .
       
       Response, 
       
       error) {
       
       return 
       
       RoundTrip(
       
       t, 
       
       req)
       
}
       
       type 
       
       wrappedTransport 
       
       struct {
       
       t 
       
       *
       
       http
       
       .
       
       Transport
       
}
       
       //go:noinline
       
       func (
       
       t 
       
       *
       
       wrappedTransport) 
       
       RoundTrip(
       
       req 
       
       *
       
       http
       
       .
       
       Request) (
       
       *
       
       http
       
       .
       
       Response, 
       
       error) {
       
       return 
       
       originalRoundTrip(
       
       t
       
       .
       
       t, 
       
       req)
       
}
       
       //go:noinline
       
       func 
       
       tracedRoundTrip(
       
       t 
       
       *
       
       http
       
       .
       
       Transport, 
       
       req 
       
       *
       
       http
       
       .
       
       Request) (
       
       *
       
       http
       
       .
       
       Response, 
       
       error) {
       
       req 
       
       = 
       
       contextWithSpan(
       
       req)
       
       return 
       
       otelhttp
       
       .
       
       NewTransport(
       
       &
       
       wrappedTransport{
       
       t: 
       
       t})
       
       .
       
       RoundTrip(
       
       req)
       
}
       
       //go:noinline
       
       func 
       
       contextWithSpan(
       
       req 
       
       *
       
       http
       
       .
       
       Request) 
       
       *
       
       http
       
       .
       
       Request {
       
       ctx :
       
       = 
       
       req
       
       .
       
       Context()
       
       if 
       
       span :
       
       = 
       
       trace
       
       .
       
       SpanFromContext(
       
       ctx); 
       
       !
       
       span
       
       .
       
       SpanContext()
       
       .
       
       IsValid() {
       
       pctx :
       
       = 
       
       injectcontext
       
       .
       
       GetContext()
       
       if 
       
       pctx 
       
       != 
       
       nil {
       
       if 
       
       span :
       
       = 
       
       trace
       
       .
       
       SpanFromContext(
       
       pctx); 
       
       span
       
       .
       
       SpanContext()
       
       .
       
       IsValid() {
       
       ctx 
       
       = 
       
       trace
       
       .
       
       ContextWithSpan(
       
       ctx, 
       
       span)
       
       req 
       
       = 
       
       req
       
       .
       
       WithContext(
       
       ctx)
       
      }
       
    }
       
  }
       
       return 
       
       req
       
}
       
       func 
       
       init() {
       
       gohook
       
       .
       
       Hook(
       
       RoundTrip, 
       
       tracedRoundTrip, 
       
       originalRoundTrip)
       
}
      
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.

We use init() Function to automatically add hook, So the user program only needs to be in main In file import The package , To achieve non intrusive integration .

It is worth mentioning that req = contextWithSpan(req) function , The internal will try to start from req.Context() and We keep goroutineContext map Check whether it contains SpanContext, And assign it to req, This will remove the need to use http.NewRequestWithContext(...) Requirements for writing .

The detailed code can be viewed Erda Warehouse ： https://github.com/erda-project/erda-infra/tree/master/pkg/trace