当前位置：网站首页>RPC kernel details you must know (worth collecting)!!!

RPC kernel details you must know (worth collecting)!!!

2022-06-23 09:46:00 【58 Shen Jian】

Microservice layered architecture , We talked a lot before , Microservices are inseparable from RPC frame ,RPC The principle of the frame 、 Practice and details , Let's talk to you today .
The article is longer ,1 Around the word , It is recommended to collect... In advance .

What are the benefits of servitization ？

One of the benefits of servitization is , It's not limited to what technology the service provider uses , It can realize the technology decoupling of large companies across teams , As shown in the figure below ：

（1） service A： The European team maintains , The technical background is Java;

（2） service B： The American team maintains , use C++ Realization ;

（3） service C： The Chinese team maintains , The technology stack is go;

The upstream caller of the service , According to the interface 、 The protocol can complete the call to the remote service .

But actually , Most Internet companies , The R & D team is limited , Most of them use the same set of technical system to realize services ：

In this case , Without a unified service framework , The service providers of each team need to implement a set of serialize 、 Deserialization 、 Network framework 、 Connection pool 、 Transceiver thread 、 timeout handler 、 State machine etc. “ Outside the business ” Repetitive technical labor , Cause overall inefficiency .

therefore , The unified service framework puts the above “ Outside the business ” The work of , It's the first problem to be solved by servitization .

What is? RPC？

Remote Procedure Call Protocol, Remote procedure call .

What is? “ long-range ”, Why? “ far ”？

Let's see what is “ near ”, namely “ Local function call ”.

When we write down ：

int result = Add(1, 2);

This line of code , What happened ？

（1） Pass two input parameters ;

（2） Called the function in the local code segment , Execution of operational logic ;

（3） Return to a reference ;

These three actions , All in the same process space , This is a Local function call .

Is there any way , Call a cross process function ？

Typical , This process is deployed on another server .

The easiest thing to think of , Two processes agree on a protocol format , Use Socket signal communication , To transmit ：

（1） Enter the reference ;

（2） Which function to call ;

（3） The ginseng ;

If it can be achieved , That this is “ long-range ” Procedure call .

Socket Communication can only pass a continuous stream of bytes , How to join 、 Functions are put in a continuous byte stream ？

hypothesis , To design a 11 Byte request message ：

（1） front 3 Bytes to fill in the function name “add”;

（2） middle 4 Bytes to fill in the first parameter “1”;

（3） At the end of 4 Bytes to fill in the second parameter “2”;

Empathy , You can design one 4 Byte response message ：

（1）4 Bytes to fill in the processing result “3”;

The caller's code may change to ：

request = MakePacket(“add”, 1, 2);
SendRequest_ToService_B(request);
response = RecieveRespnse_FromService_B();
int result = unMakePacket(respnse);

this 4 One step is ：

（1） Make the incoming parameter a byte stream ;

（2） Send byte stream to service B;

（3） From the service B Accept the return byte stream ;

（4） Change the return byte to an outgoing parameter ;

The server's code may change to ：

request = RecieveRequest();
args/function = unMakePacket(request);
result = Add(1, 2);
response = MakePacket(result);
SendResponse(response);

This 5 It's easy to understand ：

（1） The server receives the byte stream ;

（2） Flow bytes into function names and parameters ;

（3） Call the function locally to get the result ;

（4） Turn the result into a byte stream ;

（5） Send the byte stream to the caller ;

This process is described as follows with a picture ：

The processing steps of the caller and the server are very clear .

What's the biggest problem with this process ？

The caller is in too much trouble , Pay attention to a lot of underlying details every time ：

（1） Enter the conversion of byte stream , That is, serializing application layer protocol details ;

（2）socket send out , I.e. details of network transmission protocol ;

（3）socket receive ;

（4） Conversion of byte stream to output parameter , That is, deserializing application layer protocol details ;

Can call layer not pay attention to this detail ？

Sure ,RPC Framework is to solve this problem , It allows callers to “ Call a remote function as if it were a local function （ service ）”.

Here we are. , If it's RPC, I feel a little bit about serialization ？ To look down , There are more underlying details .

RPC What is the responsibility of the framework ？

RPC frame , To mask complexity from the caller , We need to shield service providers from all kinds of complexity ：

（1） Service callers client It feels like calling a local function , To invoke the service ;

（2） service provider server It feels like implementing a local function , To implement services ;

So the whole thing RPC The framework is divided into client part And server part , Achieve the above goals , Shield complexity , Namely RPC The responsibility of the framework .

As shown in the figure above , Business party's responsibilities yes ：

（1） The caller A, Pass in the parameter , Execution call , Get the results ;

（2） Service provider B, Receive the parameters , Perform logical , Return results ;

RPC The responsibility of the framework yes , The big blue frame in the middle ：

（1）client End ： serialize 、 Deserialization 、 Connection pool management 、 Load balancing 、 Fail over 、 Queue management , Timeout Management 、 Asynchronous management and so on ;

（2）server End ： Server components 、 The server receives and sends the queue 、io Threads 、 The worker thread 、 Serialization, deserialization, etc ;

server We all know a lot about the end technology , Next, let's focus on client Technical details of the end .

First look at it. RPC-client Part of the “ Serialization deserialization ” part .

Why serialization ？

Engineers usually use “ object ” To manipulate data ：

class User{
         std::String user_name;
         uint64_t user_id;
         uint32_t user_age;
};
User u = new User(“shenjian”);
u.setUid(123);
u.setAge(35);

But when it comes to data Storage perhaps transmission when ,“ object ” It's not so easy to use , It is often necessary to transform data into continuous space “ Binary byte stream ”, Some typical scenes are ：

（1） database Disk storage for index ： The index of the database is in memory b+ Trees , But this format can't be directly stored on disk , So we need to b+ The tree is transformed into a stream of binary bytes in continuous space , To be stored on disk ;

（2） The cache KV Storage ：redis/memcache yes KV Cache of type , Cached stored value Must be a stream of binary bytes in contiguous space , It can't be User object ;

（3） Network transmission of data ：socket The data sent must be a stream of binary bytes in continuous space , It can't be the object ;

So-called serialize （Serialization）, Will be “ object ” The data of form is transformed into “ Continuous space binary byte stream ” The process of morphological data . The reverse process of this process is called Deserialization .

How to serialize ？

It's a very detailed question , If I let you do it “ object ” Convert to byte stream , What would you do ？ One easy way to think of it is xml（ perhaps json） This kind of markup language with self description features ：

<class name=”User”>
<element name=”user_name” type=”std::String” value=”shenjian” />
<element name=”user_id” type=”uint64_t” value=”123” />
<element name=”user_age” type=”uint32_t” value=”35” />
</class>

Set rules for conversion , It's easy for the sender to send User An object of class is serialized as xml, Service received xml After binary stream , It's also easy to serialize its scope into User object .

Voice over ： When the language supports reflection , The job is easy .

The second method is to implement binary protocol for serialization , Or on top User Object as an example , You can design a general protocol like this ：

（1） head 4 Bytes for sequence number ;

（2） After the serial number 4 Byte representation key The length of m;

（3） Next m Byte representation key Value ;

（4） Next 4 Byte representation value The length of n;

（5） Next n Byte representation value Value ;

（6） image xml Go on recursively , Until the entire object is described ;

above User object , It may be described in this Agreement as follows ：

（1） first line ： Serial number 4 Bytes （ set up 0 Represents the class name ）, Class name length 4 Bytes （ The length is 4）, Next 4 Bytes are class names （”User”）, common 12 byte ;

（2） The second line ： Serial number 4 Bytes （1 Represents the first attribute ）, Attribute length 4 Bytes （ The length is 9）, Next 9 Bytes are property names （”user_name”）, Property value length 4 Bytes （ The length is 8）, Property value 8 Bytes （ The value is ”shenjian”）, common 29 byte ;

（3） The third line ： Serial number 4 Bytes （2 Represents the second attribute ）, Attribute length 4 Bytes （ The length is 7）, Next 7 Bytes are property names （”user_id”）, Property value length 4 Bytes （ The length is 8）, Property value 8 Bytes （ The value is 123）, common 27 byte ;

（4） In the fourth row ： Serial number 4 Bytes （3 Represents the third attribute ）, Attribute length 4 Bytes （ The length is 8）, Next 8 Bytes are property names （”user_name”）, Property value length 4 Bytes （ The length is 4）, Property value 4 Bytes （ The value is 35）, common 24 byte ;

The whole binary byte stream has 12+29+27+24=92 byte .

The actual serialization protocol has a lot more details to consider , for example ： Strongly typed languages not only need to restore attribute names , Property value , Also restore the attribute type ; Complex objects should not only consider common types , Also consider object nesting types and so on . in any case , The idea of serialization is similar .

What factors should be considered in the serialization protocol ？

Regardless of the use of mature protocols xml/json, Or customize the binary protocol to serialize objects , These factors need to be considered in the design of serialization protocol .

（1） Parsing efficiency ： This should be the primary consideration of serialization protocol , image xml/json It takes time to parse , Need analysis doom Trees , Binary custom protocol parsing is very efficient ;

（2） compression ratio , Transmission validity ： The same object ,xml/json There's a lot of xml label , The effectiveness of information is low , Binary custom protocol takes up a lot less space ;

（3） Scalability and compatibility ： Is it convenient to add fields , Whether the old client needs to be forced to upgrade after adding fields , It's all questions to consider ,xml/json And the above binary protocol can be easily extended ;

（4） Readability and debuggability ： It's easy to understand ,xml/json Readability is much better than binary protocol ;

（5） Cross language ： Both of the above protocols are cross lingual , Some serialization protocols are closely related to the development language , for example dubbo The serialization protocol can only support Java Of RPC call ;

（6） generality ：xml/json Very versatile , There are good third-party parsing Libraries , Every language is easy to parse , Although the above custom binary protocol can be cross language , But every language has to write a simple protocol client ;

What are the common serialization methods ？

（1）xml/json： Parsing efficiency , The compression ratio is poor , Extensibility 、 Readability 、 Good versatility ;

（2）thrift;

（3）protobuf：Google Produce , It must be a boutique. , Every aspect is very good , Strongly recommend , It belongs to binary protocol , The readability is a bit poor , But there are similar ones to-string Protocol helps debug problems ;

（4）Avro;

（5）CORBA;

（6）mc_pack： Students who understand understand , What you don't understand is what you don't understand ,09 Used in , Legend goes beyond protobuf, Students who are knowledgeable can talk about the current situation ;

（7）…

RPC-client except ：

（1） Serialize the parts of deserialization （ In the picture above 1、4）

Also contains ：

（2） Send byte stream and receive byte stream （ In the picture above 2、3）

This part , It can be divided into synchronous call and asynchronous call , Let's talk about it .

Voice over ： Find out RPC-client It's not easy .

The code fragment of the synchronous call is ：

Result = Add(Obj1, Obj2);// obtain Result It was blocked before

The code fragment of the asynchronous call is ：

Add(Obj1, Obj2, callback);// Call and return to , Wait for the result

The result of processing is called ：

callback(Result){// The callback function will be called after the processing result is obtained
…
}

These two types of calls , stay RPC-client in , It's implemented in a completely different way .

RPC-client How about synchronous invocation Architecture ？

So called synchronous call , Before we get the result , It's stuck , Will always occupy a worker thread , The figure above simply illustrates the components 、 Interaction 、 Process steps ：

Big box on the left , Represents a worker thread of the caller
On the left Pink middle frame , On behalf of RPC-client Components
On the right Orange Box , On behalf of RPC-server
Two small blue frames , Represents synchronization RPC-client Two core components , Serialization components and connection pool components
White flow box , And the arrow number 1-10, Serial execution steps representing the entire worker thread ：

1） Business code origination RPC call ：

Result=Add(Obj1,Obj2)

2） Serialization component , Serialize object calls into binary byte streams , It can be understood as a packet to be sent packet1;

3） Get an available connection through the connection pool component connection;

4） By connecting connection Package packet1 Send to RPC-server;

5） Send packets over the network , issue RPC-server;

6） Response packets are transmitted over the network , Send back to RPC-client;

7） By connecting connection from RPC-server Collect response package packet2;

8） By connecting the pool components , take conneciont Put it back in the connection pool ;

9） Serialization component , take packet2 Fan serialized as Result Object returned to caller ;

10） Business code acquisition Result result , The worker thread continues down ;

Voice over ： Please refer to... In the architecture diagram 1-10 Step reading .

What is the role of the connection pool component ？

RPC Load balancing supported by frame lock 、 Fail over 、 Send timeout and other features , They are all implemented through the connection pool component .

The interface provided by the typical connection pool component is ：

int ConnectionPool::init(…);
Connection ConnectionPool::getConnection();
int ConnectionPool::putConnection(Connection t);

init What did you do ？

And downstream RPC-server（ It's usually a cluster ）, establish N individual tcp A long connection , The so-called connection “ pool ”.

getConnection What did you do ？

Connection from “ pool ” Take one of the links , Lock （ Set a flag ）, Return to caller .

putConnection What did you do ？

Put an assigned connection back in the connection “ pool ” in , Unlock （ Also set a flag ）.

How to realize load balancing ？

The connection pool is established with a RPC-server Cluster connectivity , When the connection pool returns to the connection , Need to be random .

How to achieve failover ？

The connection pool is established with a RPC-server Cluster connectivity , When the connection pool finds that the connection of a certain machine is abnormal , The connection of this machine needs to be removed , Back to normal connection , After the machine is restored , Add the connection back .

How to realize sending timeout ？

Because it's a synchronous blocking call , After getting a connection , Use the... With timeout send/recv It can realize sending and receiving with timeout .

in general , synchronous RPC-client The implementation of is relatively easy , Serialization component 、 The connection pool component matches the number of multi threads , Can be realized .

RPC-client How about the asynchronous callback architecture ？

So called asynchronous callback , Before we get the result , It won't be blocked , In theory, no thread is blocked at any time , So the asynchronous callback model , In theory, only a few worker threads and service connections are needed to achieve high throughput , As shown in the figure above ：

The frame on the left , It's a small number of worker threads （ Just a few ） Make calls and callbacks
The pink frame in the middle , On behalf of RPC-client Components
Orange box on the right , On behalf of RPC-server
Six little blue frames , It's asynchronous RPC-client Six core components ： Context manager , Timeout Manager , Serialization component , Downstream send and receive queue , Downstream transceiver thread , Connection pool components
White flow box , And the arrow number 1-17, Serial execution steps representing the entire worker thread ：

1） Business code initiates asynchrony RPC call ;

Add(Obj1,Obj2, callback)

2） Context manager , The request , Callback , The context is stored ;

3） Serialization component , Serialize object calls into binary byte streams , It can be understood as a packet to be sent packet1;

4） Downstream send and receive queue , Put the message in “ Queue to be sent ”, The call now returns , Does not block worker threads ;

5） Downstream transceiver thread , Send messages from “ Queue to be sent ” Remove from , Get an available connection through the connection pool component connection;

6） By connecting connection Package packet1 Send to RPC-server;

7） Send packets over the network , issue RPC-server;

8） Response packets are transmitted over the network , Send back to RPC-client;

9） By connecting connection from RPC-server Collect response package packet2;

10） Downstream transceiver thread , Put the message in “ Queue accepted ”, By connecting the pool components , take conneciont Put it back in the connection pool ;

11） In the downstream mail queue , The message is taken out , The callback is about to start , Does not block worker threads ;

12） Serialization component , take packet2 Fan serialized as Result object ;

13） Context manager , Will result in , Callback , Take out the context ;

14） adopt callback Callback business code , return Result result , The worker thread continues down ;

If the request does not return for a long time , The process is ：

15） Context manager , The request did not return for a long time ;

16） Timeout manager gets timeout context ;

17） adopt timeout_cb Callback business code , The worker thread continues down ;

Voice over ： Please go through this process several times in conjunction with the architecture diagram .

The serialization component and the connection pool component have been described above , It is easy to understand the receiving and sending queue and the receiving and sending thread . The following highlights Context manager And Timeout Manager These two general components .

Why need context manager ？

Due to the sending of the request package , The callbacks of response packages are asynchronous , Not even in the same worker thread , A component is required to record the context of a request , Put the request - Respond to - Callback and so on some information match .

How to request - Respond to - Callback this information to match ？

It's an interesting question , Sent... Via a link to the downstream service a,b,c Three request packages , Asynchronously received x,y,z Three response packages ：

How to know which request package corresponds to which response package ？

How to know which response package corresponds to which callback function ？

Can pass “ request id” To implement the request - Respond to - Series of callbacks .

The whole process is as follows , By request id, Context manager to respond to requests - Respond to -callback Mapping between ：

1） Generate request id;

2） Generate request context context, The context contains the sending time time, Callback function callback Etc ;

3） Context manager records req-id And context context The mapping relation of ;

4） take req-id Type it in the request bag and send it to RPC-server;

5）RPC-server take req-id Return... By typing in the response package ;

6） By... In the response package req-id, Find the original context through the context manager context;

7） From the context context Get the callback function callback;

8）callback take Result Bring back , Drive further execution of business ;

How to realize load balancing , Fail over ？

Similar to the idea of synchronous connection pool , The difference is ：

（1） The synchronous connection pool uses blocking mode to send and receive , Need a service with a ip Create multiple connections ;

（2） Asynchronous sending and receiving , One of the services ip Only a few connections need to be made （ for example , One tcp Connect ）;

How to realize overtime sending and receiving ？

Over time , It's different from the implementation of synchronous block transceiver ：

（1） Synchronization block timeout , You can directly use the send/recv To achieve ;

（2） Asynchronous non blocking nio Network message sending and receiving , Because the connection won't wait for a packet to be returned all the time , Timeout is implemented by the timeout manager ;

How to implement timeout management in timeout Manager ？

Timeout Manager , It is used to implement the callback processing of request packet return timeout .

Each request is sent to the downstream RPC-server, Will save... In context manager req-id Information with context , A lot of information about the request is stored in the context , for example req-id, Call back , Timeout callback , Sending time, etc .

Timeout manager start timer In the context manager context scan , See if the request in the context took too long to send , If it's too long , No longer waiting for the package to be returned , Direct timeout callback , Drive the business process down , And delete the context .

If the timeout callback is executed , The normal return package arrives , adopt req-id Context not found in context manager , Just drop the request .

Voice over ： Because it has timed out , Unable to recover context .

in any case , Asynchronous callbacks are compared to synchronous callbacks , In addition to serialization components and connection pool components , There will be more context managers , Timeout Manager , Downstream send and receive queue , Downstream transceiver thread and other components , And has an impact on the calling habits of the caller .

Voice over ： Programming habits , From synchronization to callback .

Asynchronous callback can improve the overall throughput of the system , Which way to realize RPC-client, You can combine business scenarios to select .

summary

What is? RPC call ？

Just like calling a local function , Call a remote service .

Why RPC frame ？

RPC The frame is used to shield RPC Serialization during the call , Network transmission and other technical details . Let the caller focus only on the call , The service side only focuses on implementation calls .

What is serialization ？ Why serialization is needed ？

The process of converting an object into a continuous binary stream , It's called serialization . Disk storage , Cache storage , Network transport can only operate on binary streams , So you have to serialize .

Sync RPC-client What are the core components of ？

Sync RPC-client The core component of is the serialization component 、 Connection pool components . It achieves load balancing and failover through connection pooling , Timeout processing is realized by blocking the receiving and sending .

asynchronous RPC-client What are the core components of ？

asynchronous RPC-client The core component of is the serialization component 、 Connection pool components 、 Send and receive queues 、 Transceiver thread 、 Context manager 、 Timeout Manager . It passes through “ request id” To associate request packages - Response package - Callback function , Use context manager to manage context , Use... In the timeout manager timer Trigger timeout callback , Push forward the timeout processing of business process .

Ideas More important than the conclusion .

Architect's way - Share technical ideas

research ：

Which have you read RPC Source code of framework ？

原网站

版权声明
本文为[58 Shen Jian]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/174/202206230933483340.html