当前位置:网站首页>RPC kernel details you must know (worth collecting)!!!

RPC kernel details you must know (worth collecting)!!!

2022-06-23 09:46:00 58 Shen Jian

Microservice layered architecture , We talked a lot before , Microservices are inseparable from RPC frame ,RPC The principle of the frame 、 Practice and details , Let's talk to you today .

The article is longer ,1 Around the word , It is recommended to collect... In advance .

What are the benefits of servitization ?

One of the benefits of servitization is , It's not limited to what technology the service provider uses , It can realize the technology decoupling of large companies across teams , As shown in the figure below :

0067ff6416a720b7cb31a83f15cbeaf4.png

(1) service A: The European team maintains , The technical background is Java;

(2) service B: The American team maintains , use C++ Realization ;

(3) service C: The Chinese team maintains , The technology stack is go;

The upstream caller of the service , According to the interface 、 The protocol can complete the call to the remote service .

But actually , Most Internet companies , The R & D team is limited , Most of them use the same set of technical system to realize services :

c49863c51617f8fadca0fef311a660e4.png
In this case , Without a unified service framework , The service providers of each team need to implement a set of serialize 、 Deserialization 、 Network framework 、 Connection pool 、 Transceiver thread 、 timeout handler 、 State machine etc. “ Outside the business ” Repetitive technical labor , Cause overall inefficiency .

therefore , The unified service framework puts the above “ Outside the business ” The work of , It's the first problem to be solved by servitization .

What is? RPC?

Remote Procedure Call Protocol, Remote procedure call .

What is? “ long-range ”, Why? “ far ”?

Let's see what is “ near ”, namely “ Local function call ”.

When we write down :

int result = Add(1, 2);

This line of code , What happened ?

70c035a48f96bc83e4f87565d6739451.png

(1) Pass two input parameters ;

(2) Called the function in the local code segment , Execution of operational logic ;

(3) Return to a reference ;

These three actions , All in the same process space , This is a Local function call .

Is there any way , Call a cross process function ?

Typical , This process is deployed on another server .

3746124c5e0e8cf19deb7320773f3026.png
The easiest thing to think of , Two processes agree on a protocol format , Use Socket signal communication , To transmit :

(1) Enter the reference ;

(2) Which function to call ;

(3) The ginseng ;

If it can be achieved , That this is “ long-range ” Procedure call .

Socket Communication can only pass a continuous stream of bytes , How to join 、 Functions are put in a continuous byte stream ?

hypothesis , To design a 11 Byte request message :

817697bae1fcb67f8a3de9e3e6f1e893.png

(1) front 3 Bytes to fill in the function name “add”;

(2) middle 4 Bytes to fill in the first parameter “1”;

(3) At the end of 4 Bytes to fill in the second parameter “2”;

Empathy , You can design one 4 Byte response message :

802f531daa9d9447a172bb0656154224.png

(1)4 Bytes to fill in the processing result “3”;

The caller's code may change to :

request = MakePacket(“add”, 1, 2);

SendRequest_ToService_B(request);

response = RecieveRespnse_FromService_B();

int result = unMakePacket(respnse);

this 4 One step is :

(1) Make the incoming parameter a byte stream ;

(2) Send byte stream to service B;

(3) From the service B Accept the return byte stream ;

(4) Change the return byte to an outgoing parameter ;

The server's code may change to :

request = RecieveRequest();

args/function = unMakePacket(request);

result = Add(1, 2);

response = MakePacket(result);

SendResponse(response);

This 5 It's easy to understand :

(1) The server receives the byte stream ;

(2) Flow bytes into function names and parameters ;

(3) Call the function locally to get the result ;

(4) Turn the result into a byte stream ;

(5) Send the byte stream to the caller ;

This process is described as follows with a picture :

dfa5863c272ccaebc962917444f4763a.png
The processing steps of the caller and the server are very clear .

What's the biggest problem with this process ?

The caller is in too much trouble , Pay attention to a lot of underlying details every time :

(1) Enter the conversion of byte stream , That is, serializing application layer protocol details ;

(2)socket send out , I.e. details of network transmission protocol ;

(3)socket receive ;

(4) Conversion of byte stream to output parameter , That is, deserializing application layer protocol details ;

Can call layer not pay attention to this detail ?

Sure ,RPC Framework is to solve this problem , It allows callers to “ Call a remote function as if it were a local function ( service )”.

Here we are. , If it's RPC, I feel a little bit about serialization ? To look down , There are more underlying details .

RPC What is the responsibility of the framework ?

RPC frame , To mask complexity from the caller , We need to shield service providers from all kinds of complexity :

(1) Service callers client It feels like calling a local function , To invoke the service ;

(2) service provider server It feels like implementing a local function , To implement services ;

So the whole thing RPC The framework is divided into client part And server part , Achieve the above goals , Shield complexity , Namely RPC The responsibility of the framework .

84b9c8cbefdccc381bf165d7f2ec20d0.png
As shown in the figure above , Business party's responsibilities yes :

(1) The caller A, Pass in the parameter , Execution call , Get the results ;

(2) Service provider B, Receive the parameters , Perform logical , Return results ;

RPC The responsibility of the framework yes , The big blue frame in the middle :

(1)client End : serialize 、 Deserialization 、 Connection pool management 、 Load balancing 、 Fail over 、 Queue management , Timeout Management 、 Asynchronous management and so on ;

(2)server End : Server components 、 The server receives and sends the queue 、io Threads 、 The worker thread 、 Serialization, deserialization, etc ;

server We all know a lot about the end technology , Next, let's focus on client Technical details of the end .

First look at it. RPC-client Part of the “ Serialization deserialization ” part .

Why serialization ?

Engineers usually use “ object ” To manipulate data :

class User{

         std::String user_name;

         uint64_t user_id;

         uint32_t user_age;

};

User u = new User(“shenjian”);

u.setUid(123);

u.setAge(35);

But when it comes to data Storage perhaps transmission when ,“ object ” It's not so easy to use , It is often necessary to transform data into continuous space “ Binary byte stream ”, Some typical scenes are :

(1) database Disk storage for index : The index of the database is in memory b+ Trees , But this format can't be directly stored on disk , So we need to b+ The tree is transformed into a stream of binary bytes in continuous space , To be stored on disk ;

(2) The cache KV Storage :redis/memcache yes KV Cache of type , Cached stored value Must be a stream of binary bytes in contiguous space , It can't be User object ;

(3) Network transmission of data :socket The data sent must be a stream of binary bytes in continuous space , It can't be the object ;

So-called serialize (Serialization), Will be “ object ” The data of form is transformed into “ Continuous space binary byte stream ” The process of morphological data . The reverse process of this process is called Deserialization .

How to serialize ?

It's a very detailed question , If I let you do it “ object ” Convert to byte stream , What would you do ? One easy way to think of it is xml( perhaps json) This kind of markup language with self description features :

<class name=”User”>

<element name=”user_name” type=”std::String” value=”shenjian” />

<element name=”user_id” type=”uint64_t” value=”123” />

<element name=”user_age” type=”uint32_t” value=”35” />

</class>

Set rules for conversion , It's easy for the sender to send User An object of class is serialized as xml, Service received xml After binary stream , It's also easy to serialize its scope into User object .

Voice over : When the language supports reflection , The job is easy .

The second method is to implement binary protocol for serialization , Or on top User Object as an example , You can design a general protocol like this :

6456a9b4cdee1aae189cd378fdca2075.png

(1) head 4 Bytes for sequence number ;

(2) After the serial number 4 Byte representation key The length of m;

(3) Next m Byte representation key Value ;

(4) Next 4 Byte representation value The length of n;

(5) Next n Byte representation value Value ;

(6) image xml Go on recursively , Until the entire object is described ;

above User object , It may be described in this Agreement as follows :

7cd34e0bb54c66f0fce445cbc244d3f8.png

(1) first line : Serial number 4 Bytes ( set up 0 Represents the class name ), Class name length 4 Bytes ( The length is 4), Next 4 Bytes are class names (”User”), common 12 byte ;

(2) The second line : Serial number 4 Bytes (1 Represents the first attribute ), Attribute length 4 Bytes ( The length is 9), Next 9 Bytes are property names (”user_name”), Property value length 4 Bytes ( The length is 8), Property value 8 Bytes ( The value is ”shenjian”), common 29 byte ;

(3) The third line : Serial number 4 Bytes (2 Represents the second attribute ), Attribute length 4 Bytes ( The length is 7), Next 7 Bytes are property names (”user_id”), Property value length 4 Bytes ( The length is 8), Property value 8 Bytes ( The value is 123), common 27 byte ;

(4) In the fourth row : Serial number 4 Bytes (3 Represents the third attribute ), Attribute length 4 Bytes ( The length is 8), Next 8 Bytes are property names (”user_name”), Property value length 4 Bytes ( The length is 4), Property value 4 Bytes ( The value is 35), common 24 byte ;

The whole binary byte stream has 12+29+27+24=92 byte .

The actual serialization protocol has a lot more details to consider , for example : Strongly typed languages not only need to restore attribute names , Property value , Also restore the attribute type ; Complex objects should not only consider common types , Also consider object nesting types and so on . in any case , The idea of serialization is similar .

What factors should be considered in the serialization protocol ?

Regardless of the use of mature protocols xml/json, Or customize the binary protocol to serialize objects , These factors need to be considered in the design of serialization protocol .

(1) Parsing efficiency : This should be the primary consideration of serialization protocol , image xml/json It takes time to parse , Need analysis doom Trees , Binary custom protocol parsing is very efficient ;

(2) compression ratio , Transmission validity : The same object ,xml/json There's a lot of xml label , The effectiveness of information is low , Binary custom protocol takes up a lot less space ;

(3) Scalability and compatibility : Is it convenient to add fields , Whether the old client needs to be forced to upgrade after adding fields , It's all questions to consider ,xml/json And the above binary protocol can be easily extended ;

(4) Readability and debuggability : It's easy to understand ,xml/json Readability is much better than binary protocol ;

(5) Cross language : Both of the above protocols are cross lingual , Some serialization protocols are closely related to the development language , for example dubbo The serialization protocol can only support Java Of RPC call ;

(6) generality :xml/json Very versatile , There are good third-party parsing Libraries , Every language is easy to parse , Although the above custom binary protocol can be cross language , But every language has to write a simple protocol client ;

What are the common serialization methods ?

(1)xml/json: Parsing efficiency , The compression ratio is poor , Extensibility 、 Readability 、 Good versatility ;

(2)thrift;

(3)protobuf:Google Produce , It must be a boutique. , Every aspect is very good , Strongly recommend , It belongs to binary protocol , The readability is a bit poor , But there are similar ones to-string Protocol helps debug problems ;

(4)Avro;

(5)CORBA;

(6)mc_pack: Students who understand understand , What you don't understand is what you don't understand ,09 Used in , Legend goes beyond protobuf, Students who are knowledgeable can talk about the current situation ;

(7)…

424b826f7ba44ae56901268fa3616d2e.png
RPC-client except :

(1) Serialize the parts of deserialization ( In the picture above 1、4)

Also contains :

(2) Send byte stream and receive byte stream ( In the picture above 2、3)

This part , It can be divided into synchronous call and asynchronous call , Let's talk about it .

Voice over : Find out RPC-client It's not easy .

The code fragment of the synchronous call is :

Result = Add(Obj1, Obj2);// obtain Result It was blocked before

The code fragment of the asynchronous call is :

Add(Obj1, Obj2, callback);// Call and return to , Wait for the result

The result of processing is called :

callback(Result){// The callback function will be called after the processing result is obtained

         …

}

These two types of calls , stay RPC-client in , It's implemented in a completely different way .

RPC-client How about synchronous invocation Architecture ?

e5d72e9ea7cad2080b02342c13c6441c.png
So called synchronous call , Before we get the result , It's stuck , Will always occupy a worker thread , The figure above simply illustrates the components 、 Interaction 、 Process steps :

  • Big box on the left , Represents a worker thread of the caller

  • On the left Pink middle frame , On behalf of RPC-client Components

  • On the right Orange Box , On behalf of RPC-server

  • Two small blue frames , Represents synchronization RPC-client Two core components , Serialization components and connection pool components

  • White flow box , And the arrow number 1-10, Serial execution steps representing the entire worker thread :

1) Business code origination RPC call :

Result=Add(Obj1,Obj2)

2) Serialization component , Serialize object calls into binary byte streams , It can be understood as a packet to be sent packet1;

3) Get an available connection through the connection pool component connection;

4) By connecting connection Package packet1 Send to RPC-server;

5) Send packets over the network , issue RPC-server;

6) Response packets are transmitted over the network , Send back to RPC-client;

7) By connecting connection from RPC-server Collect response package packet2;

8) By connecting the pool components , take conneciont Put it back in the connection pool ;

9) Serialization component , take packet2 Fan serialized as Result Object returned to caller ;

10) Business code acquisition Result result , The worker thread continues down ;

Voice over : Please refer to... In the architecture diagram 1-10 Step reading .

What is the role of the connection pool component ?

RPC Load balancing supported by frame lock 、 Fail over 、 Send timeout and other features , They are all implemented through the connection pool component .

fb4e66b6422bedb9a46588bb0248e43e.png
The interface provided by the typical connection pool component is :

int ConnectionPool::init(…);

Connection ConnectionPool::getConnection();

int ConnectionPool::putConnection(Connection t);

init What did you do ?

And downstream RPC-server( It's usually a cluster ), establish N individual tcp A long connection , The so-called connection “ pool ”.

getConnection What did you do ?

Connection from “ pool ” Take one of the links , Lock ( Set a flag ), Return to caller .

putConnection What did you do ?

Put an assigned connection back in the connection “ pool ” in , Unlock ( Also set a flag ).

How to realize load balancing ?

The connection pool is established with a RPC-server Cluster connectivity , When the connection pool returns to the connection , Need to be random .

How to achieve failover ?

The connection pool is established with a RPC-server Cluster connectivity , When the connection pool finds that the connection of a certain machine is abnormal , The connection of this machine needs to be removed , Back to normal connection , After the machine is restored , Add the connection back .

How to realize sending timeout ?

Because it's a synchronous blocking call , After getting a connection , Use the... With timeout send/recv It can realize sending and receiving with timeout .

in general , synchronous RPC-client The implementation of is relatively easy , Serialization component 、 The connection pool component matches the number of multi threads , Can be realized .

RPC-client How about the asynchronous callback architecture ?

8495d7d200a9f636a97dbc0c1e6c8620.png
So called asynchronous callback , Before we get the result , It won't be blocked , In theory, no thread is blocked at any time , So the asynchronous callback model , In theory, only a few worker threads and service connections are needed to achieve high throughput , As shown in the figure above :

  • The frame on the left , It's a small number of worker threads ( Just a few ) Make calls and callbacks

  • The pink frame in the middle , On behalf of RPC-client Components

  • Orange box on the right , On behalf of RPC-server

  • Six little blue frames , It's asynchronous RPC-client Six core components : Context manager , Timeout Manager , Serialization component , Downstream send and receive queue , Downstream transceiver thread , Connection pool components

  • White flow box , And the arrow number 1-17, Serial execution steps representing the entire worker thread :

1) Business code initiates asynchrony RPC call ;

Add(Obj1,Obj2, callback)

2) Context manager , The request , Callback , The context is stored ;

3) Serialization component , Serialize object calls into binary byte streams , It can be understood as a packet to be sent packet1;

4) Downstream send and receive queue , Put the message in “ Queue to be sent ”, The call now returns , Does not block worker threads ;

5) Downstream transceiver thread , Send messages from “ Queue to be sent ” Remove from , Get an available connection through the connection pool component connection;

6) By connecting connection Package packet1 Send to RPC-server;

7) Send packets over the network , issue RPC-server;

8) Response packets are transmitted over the network , Send back to RPC-client;

9) By connecting connection from RPC-server Collect response package packet2;

10) Downstream transceiver thread , Put the message in “ Queue accepted ”, By connecting the pool components , take conneciont Put it back in the connection pool ;

11) In the downstream mail queue , The message is taken out , The callback is about to start , Does not block worker threads ;

12) Serialization component , take packet2 Fan serialized as Result object ;

13) Context manager , Will result in , Callback , Take out the context ;

14) adopt callback Callback business code , return Result result , The worker thread continues down ;

If the request does not return for a long time , The process is :

15) Context manager , The request did not return for a long time ;

16) Timeout manager gets timeout context ;

17) adopt timeout_cb Callback business code , The worker thread continues down ;

Voice over : Please go through this process several times in conjunction with the architecture diagram .

The serialization component and the connection pool component have been described above , It is easy to understand the receiving and sending queue and the receiving and sending thread . The following highlights Context manager And Timeout Manager These two general components .

Why need context manager ?

Due to the sending of the request package , The callbacks of response packages are asynchronous , Not even in the same worker thread , A component is required to record the context of a request , Put the request - Respond to - Callback and so on some information match .

How to request - Respond to - Callback this information to match ?

It's an interesting question , Sent... Via a link to the downstream service a,b,c Three request packages , Asynchronously received x,y,z Three response packages :

5a85a6a56704fa6f38fd82c2a6d71db0.png
How to know which request package corresponds to which response package ?

How to know which response package corresponds to which callback function ?

Can pass “ request id” To implement the request - Respond to - Series of callbacks .

f1b8255dfa1ce503dbc06d9af6b5f2cc.png
The whole process is as follows , By request id, Context manager to respond to requests - Respond to -callback Mapping between :

1) Generate request id;

2) Generate request context context, The context contains the sending time time, Callback function callback Etc ;

3) Context manager records req-id And context context The mapping relation of ;

4) take req-id Type it in the request bag and send it to RPC-server;

5)RPC-server take req-id Return... By typing in the response package ;

6) By... In the response package req-id, Find the original context through the context manager context;

7) From the context context Get the callback function callback;

8)callback take Result Bring back , Drive further execution of business ;

How to realize load balancing , Fail over ?

Similar to the idea of synchronous connection pool , The difference is :

(1) The synchronous connection pool uses blocking mode to send and receive , Need a service with a ip Create multiple connections ;

(2) Asynchronous sending and receiving , One of the services ip Only a few connections need to be made ( for example , One tcp Connect );

How to realize overtime sending and receiving ?

Over time , It's different from the implementation of synchronous block transceiver :

(1) Synchronization block timeout , You can directly use the send/recv To achieve ;

(2) Asynchronous non blocking nio Network message sending and receiving , Because the connection won't wait for a packet to be returned all the time , Timeout is implemented by the timeout manager ;

How to implement timeout management in timeout Manager ?

95e3647f47652dbaa01a5037a6061789.png
Timeout Manager , It is used to implement the callback processing of request packet return timeout .

Each request is sent to the downstream RPC-server, Will save... In context manager req-id Information with context , A lot of information about the request is stored in the context , for example req-id, Call back , Timeout callback , Sending time, etc .

Timeout manager start timer In the context manager context scan , See if the request in the context took too long to send , If it's too long , No longer waiting for the package to be returned , Direct timeout callback , Drive the business process down , And delete the context .

If the timeout callback is executed , The normal return package arrives , adopt req-id Context not found in context manager , Just drop the request .

Voice over : Because it has timed out , Unable to recover context .

in any case , Asynchronous callbacks are compared to synchronous callbacks , In addition to serialization components and connection pool components , There will be more context managers , Timeout Manager , Downstream send and receive queue , Downstream transceiver thread and other components , And has an impact on the calling habits of the caller .

Voice over : Programming habits , From synchronization to callback .

Asynchronous callback can improve the overall throughput of the system , Which way to realize RPC-client, You can combine business scenarios to select .

summary

What is? RPC call ?

Just like calling a local function , Call a remote service .

Why RPC frame ?

RPC The frame is used to shield RPC Serialization during the call , Network transmission and other technical details . Let the caller focus only on the call , The service side only focuses on implementation calls .

What is serialization ? Why serialization is needed ?

The process of converting an object into a continuous binary stream , It's called serialization . Disk storage , Cache storage , Network transport can only operate on binary streams , So you have to serialize .

Sync RPC-client What are the core components of ?

Sync RPC-client The core component of is the serialization component 、 Connection pool components . It achieves load balancing and failover through connection pooling , Timeout processing is realized by blocking the receiving and sending .

asynchronous RPC-client What are the core components of ?

asynchronous RPC-client The core component of is the serialization component 、 Connection pool components 、 Send and receive queues 、 Transceiver thread 、 Context manager 、 Timeout Manager . It passes through “ request id” To associate request packages - Response package - Callback function , Use context manager to manage context , Use... In the timeout manager timer Trigger timeout callback , Push forward the timeout processing of business process .

Ideas More important than the conclusion .

Architect's way - Share technical ideas

research

Which have you read RPC Source code of framework ?

原网站

版权声明
本文为[58 Shen Jian]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206230933483340.html