当前位置:网站首页>A detailed explanation of the implementation principle of go Distributed Link Tracking

A detailed explanation of the implementation principle of go Distributed Link Tracking

2022-06-24 19:00:00 51CTO

In distributed 、 Microservices architecture , The application of a request often runs through multiple distributed services , This gives application troubleshooting 、 Performance optimization brings new challenges . Distributed link tracing is an important technology to solve the observable problem of distributed applications , It has increasingly become an indispensable infrastructure for distributed applications . This article will introduce the core concept of distributed link in detail 、 Architecture principles and related open source standard protocols , And share what we are doing to achieve non intrusiveness Go collection Sdk Some practices in this field .



Why do I need a distributed link tracking system

Microservice architecture for operation and maintenance 、 Troubleshooting brings new challenges

Under distributed architecture , When a user initiates a request from a browser client , Back end processing logic often runs through multiple distributed services , At this time, many problems will arise , such as :

  1. The overall request takes a long time , Which service is slow ?
  2. An error occurred during the request , Which service reported an error ?
  3. What is the amount of requests for a service , How successful is the interface ?

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native

It's not easy to answer these questions , We don't just need to know the interface processing statistics of a service , You also need to understand the interface invocation dependencies between the two services , Only by establishing the temporal and spatial order of the entire request among multiple services , To better help us understand and position the problem , And this , This is what the distributed link tracking system can solve .

How can distributed link tracking systems help us

The core idea of distributed link tracking technology : When the user makes a distributed request for service ⽤ In the process , Record the calling process and time-space relationship tracking of the request between all subsystems , Restore to the centralized display of the call link , The information includes the time consumption on each service node 、 Which machine does the request go to 、 The request status of each service node and so on .

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _02

As shown in the figure above , After the complete request link is built through distributed link tracking , You can intuitively see the service phase in which the request time is mainly spent , Help us focus more quickly . meanwhile , The collected link data can also be further analyzed , Thus, the dependency relationship between the services of the whole system can be established 、 And the flow , Help us better troubleshoot the system's circular dependencies 、 Hot services and other issues .

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _03

Overview of distributed link tracking system architecture

The core concept

In a distributed link tracking system , The core concept , This is the data model definition of link tracking , It mainly includes Trace and Span.

 A detailed explanation |Go Implementation principle of distributed link tracing _ Microservices _04

among ,Trace It 's a logical concept , Said a ( Distributed ) All local operations that the request goes through (Span) A complete directed acyclic graph , All of them Span Of TraceId identical . Span Is the real data entity model , Said a ( Distributed ) A step or operation in a request process , Represents a logical operation unit in the system ,Span To establish a causal relationship through nesting or sequencing .Span Data is generated at the acquisition end , After that, report to the server , Do further processing . It contains the following key attributes :


  • Name: Operation name , Like a RPC Method name , A function name
  • StartTime/EndTime: Start time and end time , The life cycle of the operation
  • ParentSpanId: Parent Span Of ID
  • Attributes: attribute , A group of <K,V> A collection of key value pairs
  • Event: Events that occur during operation
  • SpanContext:Span Context content , Usually used in Span Spread between , Its core fields include TraceId、SpanId

General architecture

The core task of distributed link tracking system is : around Span Generation 、 spread 、 collection 、 Handle 、 Storage 、 visualization 、 analysis , Build a distributed link tracking system . Its general architecture is as follows :

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _05


  • We see , At the application end, it is necessary to use intrusive or non intrusive methods , Inject Tracing Sdk, To follow 、 Generate 、 Propagation and escalation requests invoke link data ;
  • Collect agent It is usually at an edge computing layer near the application side , Mainly used to improve Tracing Sdk Write performance , And reduction back-end The calculated pressure of ;
  • When the collected link tracking data is reported to the backend , First pass through Gateway Do an authentication , After entering kafka In this way MQ Buffer and store messages ;
  • Before data is written to the storage tier , We may need to clean and analyze the data in the message queue , Cleaning is to standardize and adapt the data reported by different data sources , Analysis is usually done to support more advanced business functions , Such as traffic statistics 、 Error analysis, etc , This part usually adopts flink This kind of flow processing framework to complete ;
  • The storage layer will be a key point in the design and selection of the server , Consider the data level and the characteristics of query scenarios to design and select models , Common choices include using Elasticsearch、Cassandra、 or Clickhouse Such open source products ;
  • Results of stream processing analysis , On the one hand, it is persistent as storage , On the other hand, it will also enter the alarm system , Notify users by proactively discovering problems , Such as the requirement that the error rate exceeds the specified threshold to send an alarm notification .

What I have just said , Is a common architecture , We did not cover the details of each module , Especially on the server side , It takes a lot of effort to explain each module in detail , Limited by space , We focus on the application side Tracing Sdk, Focus on how to track and collect link data on the application side .


Protocol standard and open source implementation

Just now we mentioned Tracing Sdk, In fact, this is just a concept , Specific to the realization , There may be many choices , The reason for this , Mainly because :


  1. Application of different programming languages , Different technical principles may be used to track the call chain
  2. Different link tracking backend , Different data transmission protocols may be used

At present , The popular link tracing backend , such as Zipin、Jaeger、PinPoint、Skywalking、Erda, There are integrated sdk, As a result, we may need to make large adjustments on the application side when switching back ends . There have been different agreements in the community , Try to solve this mess on the acquisition side , such as OpenTracing、OpenCensus agreement , These two agreements are also followed up and supported by some large manufacturers , But in recent years , The two have moved towards integration and unification , A new standard has emerged OpenTelemetry, These two years have witnessed rapid development , It has gradually become the industry standard .

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _06


OpenTelemetry Define the standards for data collection api, It also provides a set of out of the box for multiple languages sdk Implementation tools , such , The application only needs to be connected with OpenTelemetry The core api Package strong coupling , There is no need to be strongly coupled to a particular implementation .


Overview of application side call chain tracking implementation scheme

Application side core tasks

Application side surround Span, There are three core tasks to complete :


  1. Generate Span: Operation starts to build Span And fill in StartTime, Fill in when the operation is completed EndTime Information , Period can be added Attributes、Event etc.
  2. spread Span: Pass in process context.Context、 Between processes by request header As SpanContext The carrier of , The core message of communication is TraceId and ParentSpanId
  3. Report Span: Generated Span adopt tracing exporter Send to collect agent / back-end server

To achieve Span Generation and dissemination of , It requires us to be able to intercept the key operations of the application ( function ) The process , And add Span Related logic . There are many ways to achieve this , however , Before listing these methods , Let's first look at OpenTelemetry Provided go sdk How to do it in .


be based on OTEL The library implements call interception

OpenTelemetry Of go sdk The basic idea of implementing call chain interception is : be based on AOP Thought , Use decorator mode , Replace the target package with a wrapper ( Such as net/http) The core interface or component of , The implementation adds... Before and after the core call procedure Span Related logic . Of course , This approach is somewhat intrusive , You need to manually replace the code call using the original interface implementation to wrap the interface implementation . We start with a http server To illustrate , stay go In language , How to do it :

Suppose there are two services serverA and serverB, among serverA After the interface of receives the request , The interior will pass httpclient Further initiate to serverB Request , that serverA The core code of may be as shown in the following figure :

 A detailed explanation |Go Implementation principle of distributed link tracing _ Server side _07


With serverA Node as an example , stay serverA The node should produce at least two Span:

  1. Span1, Record httpServer A time-consuming situation in the overall internal processing process after receiving a request
  2. Span2, Record httpServer During request processing , Initiate another to serverB Of http Time consuming of the request
  3. also Span1 Should be Span2 Of ParentSpan

We can use OpenTelemetry Provided sdk To achieve Span Generation 、 Dissemination and escalation , Due to space limitations, we will not elaborate on the logic of reporting , Let's focus on how to generate these two Span, And make these two Span Make a connection between , namely Span Generation and dissemination of .


HttpServer Handler Generate Span The process

about httpserver Speaking of , We know its core is http.Handler This interface . therefore , This can be achieved by implementing a http.Handler Interceptor of interface , To be responsible for Span Generation and dissemination of .

      
      
package http

type Handler interface {
ServeHTTP( ResponseWriter, * Request)
}

http . ListenAndServe( ":8090", http . DefaultServeMux)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.

To use OpenTelemetry Sdk Provided http.Handler Decorator , The following adjustments are required http.ListenAndServe Method :

      
      
import (
"net/http"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

wrappedHttpHandler : = otelhttp . NewHandler( http . DefaultServeMux, . . .)
http . ListenAndServe( ":8090", wrappedHttpHandler)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.


 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _08

As shown in the figure ,wrppedHttpHandler Will mainly implement the following logic ( Simplify your thinking , This part is pseudocode ): ① ​​ctx := tracer.Extract(r.ctx, r.Header)​​: From request header Extract from traceparent header And analyze , extract TraceId and SpanId, And then build SpanContext object , And finally stored in ctx in ;

② ​​ctx, span := tracer.Start(ctx, genOperation(r))​​: Generate a that tracks the processing of the current request Span( That is, as mentioned above Span1), And record the start time , It's going to start from ctx Read from SpanContext, take SpanContext.TraceId As the present Span Of TraceId, take SpanContext.SpanId As the present Span Of ParentSpanId, Then treat yourself as a new SpanContext Write the returned ctx in ;

③ ​​r.WithContext(ctx)​​: Will create a new SpanContext Add to request r Of context in , To be intercepted handler Internal during processing , It can be downloaded from r.ctx Get in the Span1 Of SpanId As its ParentSpanId attribute , So as to establish Span Father son relationship between ;

④ ​​span.End()​​: When innerHttpHandler.ServeHTTP(w,r) After execution , You need to be right about Span1 Record the completion time of processing , Then send it to exporter Report to the server .


HttpClient Requests to generate Span The process

Let's go on to see serverA Request internally serverB At the time of the httpclient How the request is generated Span Of ( As mentioned earlier Span2). We know ,httpclient The key operation of sending a request is http.RoundTriper Interface :

      
      
package http

type RoundTripper interface {
RoundTrip( * Request) ( * Response, error)
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

OpenTelemetry An interceptor implementation based on this interface is provided , We need to wrap it with this implementation httpclient Originally used RoundTripper Realization , The code is adjusted as follows :

      
      
import (
"net/http"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

wrappedTransport : = otelhttp . NewTransport( http . DefaultTransport)
client : = http . Client{ Transport: wrappedTransport}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.


 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _09

As shown in the figure ,wrappedTransport The following tasks will be completed ( Simplify your thinking , This part is pseudocode ): ① ​​req, _ := http.NewRequestWithContext(r.ctx, “GET”,url, nil)​​ : Here we will take the previous step http.Handler Requested by ctx, Pass on to httpclient To be sent request in , Then we can start from request.Context() To extract from Span1 Information about , To establish the Span Relationship between ;

② ​​ctx, span := tracer.Start(r.Context(), url)​​: perform client.Do() after , Will first enter WrappedTransport.RoundTrip() Method , Here a new Span(Span2), Start recording httpclient Time consuming of the request , As before ,Start The method will start from r.Context() To extract from Span1 Of SpanContext, And its SpanId As the present Span(Span2) Of ParentSpanId, And thus established Span Nested relationship between , Back at the same time ctx Stored in the SpanContext Will be newly generated Span(Span2) Information about ;

③ ​​tracer.Inject(ctx, r.Header)​​: The purpose of this step is to integrate the current SpanContext Medium TraceId and SpanId Wait for information to be written to r.Header in , So as to be able to follow http Request to serverB, After the serverB China and the present Span Establishing correlation ;

④ ​​span.End()​​: wait for httpclient Request to serverB After receiving the response , Mark current Span Trace end , Set up EndTime And submit it to exporter The above check-in server .


be based on OTEL Library implementation call chain trace summary

We introduced the use of OpenTelemetry library , Is the key information about how to implement the link (TraceId、SpanId) How it propagates between and within processes , Let's make a brief summary of this tracking implementation :

 A detailed explanation |Go Implementation principle of distributed link tracing _ Server side _10

As the above analysis shows , In this way , The code is still somewhat intrusive , And there is another requirement for the code , Is to keep context.Context The transfer of objects between operations , such as , Just now we were serverA Created in httpclient When asked , It uses ​​http.NewRequestWithContext(r.ctx, ...)​​​ Instead of ​​http.NewRequest(...)​​ Method , In addition, open goroutine The asynchronous scenario also needs attention ctx The transfer .

 A detailed explanation |Go Implementation principle of distributed link tracing _ link _11

Non intrusive call chain tracing implementation ideas

We have just shown in detail a somewhat intrusive implementation based on convention , Its invasiveness is mainly manifested in : We need to explicitly add the code manually and wrap the source code with a component that has tracking capabilities , This further leads to the need for application code to explicitly reference specific versions of OpenTelemetry instrumentation package , This is not conducive to independent maintenance and upgrade of observable code . Do we have any options to implement non-invasive tracing of call chains ? No invasion , In fact, it's just that the integration methods are different , The goal of integration is similar , In the end, it all has to be done in some way , Implement the interception of key calling functions , And add special logic , The point of no intrusion is that the code needs no or very little modification .

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _12

The figure above lists some possible implementation ideas of non intrusive integration , And .net、java There are IL The programming language of the language is different ,go Directly into machine code , As a result, the non intrusive scheme is relatively troublesome to implement , There are several specific ideas as follows :


  1. Compile time injection : The compiler can be extended , Modify the ast, Insert trace code , Need to adapt to different compiler versions .
  2. Start phase injection : Modify the compiled machine code , Insert trace code , Need to adapt to different CPU framework . Such as monkey, gohook.
  3. Run phase injection : Provided through the kernel eBPF Ability , Monitor the execution of key functions of the program , Insert trace code , The future is bright ! Such as ,tcpdump,bpftrace.

Go Implementation principle of non intrusive link tracing

Erda The core code of the project is mainly based on golang Compiling , We are based on the OpenTelemetry sdk, Adopt the method based on modifying the machine code , A non intrusive link tracing method is realized . As mentioned above , Use OpenTelemetry sdk You need to make some adjustments to the code , Let's see how these adjustments can be done automatically in a non intrusive way :

 A detailed explanation |Go Implementation principle of distributed link tracing _ Cloud native _13


We use httpclient For example , Give a brief explanation . gohook Framework provided hook The signature of the interface is as follows :

      
      
// target want hook The objective function of
// replacement The function to replace with
// trampoline Copy the source function entry to the location , Can be used from replcement Jump back to original target

func Hook( target, replacement, trampoline interface{}) error
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

about ​​http.Client​​, We can choose ​​hook DefaultTransport.RoundTrip()​​ Method , When the method executes , We go through ​​otelhttp.NewTransport()​​ Original packaging ​​DefaultTransport​​ object , But it should be noted that , We can't put ​​DefaultTransport​​ Act directly as ​​otelhttp.NewTransport()​​ Parameters of , Because of its ​​RoundTrip()​​ Methods have been replaced by us , And the original real method was written ​​trampoline​​ in , So here we need an intermediate layer , To connect ​​DefaultTransport​​ With its original ​​RoundTrip​​ Method . The specific code is as follows :


      
      
//go:linkname RoundTrip net/http.(*Transport).RoundTrip
//go:noinline
// RoundTrip .
func RoundTrip( t * http . Transport, req * http . Request) ( * http . Response, error)

//go:noinline
func originalRoundTrip( t * http . Transport, req * http . Request) ( * http . Response, error) {
return RoundTrip( t, req)
}

type wrappedTransport struct {
t * http . Transport
}

//go:noinline
func ( t * wrappedTransport) RoundTrip( req * http . Request) ( * http . Response, error) {
return originalRoundTrip( t . t, req)
}

//go:noinline
func tracedRoundTrip( t * http . Transport, req * http . Request) ( * http . Response, error) {
req = contextWithSpan( req)
return otelhttp . NewTransport( & wrappedTransport{ t: t}) . RoundTrip( req)
}

//go:noinline
func contextWithSpan( req * http . Request) * http . Request {
ctx : = req . Context()
if span : = trace . SpanFromContext( ctx); ! span . SpanContext() . IsValid() {
pctx : = injectcontext . GetContext()
if pctx != nil {
if span : = trace . SpanFromContext( pctx); span . SpanContext() . IsValid() {
ctx = trace . ContextWithSpan( ctx, span)
req = req . WithContext( ctx)
}
}
}
return req
}

func init() {
gohook . Hook( RoundTrip, tracedRoundTrip, originalRoundTrip)
}
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.

We use ​​init()​​ Function to automatically add hook, So the user program only needs to be in main In file import The package , To achieve non intrusive integration .

It is worth mentioning that ​​req = contextWithSpan(req)​​ function , The internal will try to start from ​​req.Context()​​ and We keep ​​goroutineContext map​​ Check whether it contains ​​SpanContext​​, And assign it to ​​req​​, This will remove the need to use ​​http.NewRequestWithContext(...)​​ Requirements for writing .

The detailed code can be viewed Erda Warehouse : https://github.com/erda-project/erda-infra/tree/master/pkg/trace

Reference link

原网站

版权声明
本文为[51CTO]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206241833131813.html