当前位置：网站首页>O & M - unified gateway is very necessary

O & M - unified gateway is very necessary

2022-06-28 16:17:00 【dj1540225203】

Preface

Suppose you are developing an e-commerce website , Then there will be many back-end micro services involved , Like members 、 goods 、 Recommendation service and so on .

Then there will be a problem ,APP/Browser How to access these back-end services ? If the business is simple , Each business can be assigned an independent domain name (https://service.api.company.com), But there are a few problems with this approach :

Every business needs authentication 、 Current limiting 、 Logic such as permission verification , If every business has its own way , Make your own wheels and do it again , It's going to hurt , It can be pulled out , Put it in a unified place .
If the business volume is relatively simple , There won't be any problems in the early stage of this way , But as the business gets more complex , For example, Taobao. 、 Amazon's opening a page may involve hundreds of microservices working together , If every microservice is assigned a domain name , On the one hand, the client code will be difficult to maintain , Involving hundreds of domain names , On the other hand, the bottleneck of the number of connections , Imagine you open a APP, Hundreds of remote calls are involved in packet capturing , This will be very inefficient under the mobile end .
Every time a new service is launched , Need to be Operation and maintenance Participate in , Application domain name 、 To configure Nginx etc. , When online 、 When you log off the server , It also needs operation and maintenance participation , In addition, the domain name is used , Isolation from the environment is not very friendly , The caller needs to make his own judgment according to the domain name .
There is another problem , Each microservice on the back end may be written in a different language 、 Different protocols have been adopted , such as HTTP、Dubbo、GRPC etc. , But you can't ask the client to adapt to so many protocols , This is a very challenging job , Projects can become very complex and difficult to maintain .
If you need to refactor the microservices later , It's going to be a lot of trouble , Need client to cooperate with you to carry on Transformation , Such as goods and services , As the business becomes more and more complex , Later, it needs to be split into multiple microservices , At this time, the external services also need to be split into multiple , At the same time, we need the client to cooperate with you in the transformation , It's very painful .

API Gateway

A better way is to adopt API gateway , Achieve one API The gateway takes over all incoming traffic , similar Nginx The role of , Forward requests from all users to back-end servers , But the gateway does more than simply forward , We will also expand the traffic , Such as authentication 、 Current limiting 、 jurisdiction 、 Fuse 、 Protocol conversion 、 Error code unification 、 cache 、 journal 、 monitor 、 Alarm, etc , In this way, we can extract the general logic , It's done by the gateway , The business side can also focus more on business logic , Improve the efficiency of iteration . By introducing API gateway , The client only needs to be connected with API Gateway interaction , Instead of communicating separately with the interfaces of each business party , But the introduction of one more component leads to one more potential failure point , So to achieve a high performance 、 Stable gateway , There are many points involved .

API register

How does the business side access the gateway ? Generally speaking, there are several ways .

The first one uses plug-ins to scan the business side API, such as Spring MVC Annotations , And combine Swagger Annotations , So as to realize parameter verification 、 file &&SDK Generating and other functions , After scanning , Need to report to the storage service of the gateway .
Manually enter . For example, the path of the interface 、 Request parameters 、 Response parameter 、 Call mode and other information , But this is a relatively troublesome way , If there are too many parameters , Early entry will be time-consuming and laborious .

Profile import . Such as through Swagger\OpenAPI etc. , For example, Alibaba cloud's gateway :

Protocol conversion

Inside API It may be implemented by many different protocols , such as HTTP、Dubbo、GRPC etc. , But for users, many of them are not very friendly , Or there's no way to expose , such as Dubbo service , So we need to do a protocol conversion in the gateway layer , The user's HTTP Protocol request , In the gateway layer, the protocol corresponding to the underlying layer is converted , such as HTTP -> Dubbo, But there are a lot of things to pay attention to here , For example, parameter type , If the type is wrong , Cause the conversion to go wrong , And the log is not detailed enough , It's going to be hard to pinpoint .

Service discovery

Gateway as the entrance of traffic , Responsible for forwarding requests , But first you need to know who to forward , How to address , There are several ways :

Write dead in code / In the configuration file , Although this way is more frustrating , But it can also use , For example, physical machines are still used online ,IP It doesn't change very often , But expansion and contraction 、 Including the application up and down the line will be very troublesome , The gateway itself even needs to implement a set of health monitoring mechanism .
domain name . Using domain name is also a good solution , For all languages , But for internal services , It's inefficient to use domain names , In addition, environmental isolation is not very friendly , Such as advance delivery 、 Online is usually the same database , So the gateway may read the same domain name , At this time, the pre sent gateway calls the online service .
Registry Center . There won't be any of these problems with the use of a registry , Even in a container environment , Node IP Frequent changes , But the real-time maintenance of the node list will be done by the registry , Transparent to the gateway , In addition, the normal up and down line of application 、 Including abnormal downtime , It will also be detected by the registry's health check mechanism , And real-time feedback to the gateway . And there's no additional performance loss with registry performance , In the form of domain name , An extra walk is needed DNS analysis 、Nginx Forwarding, etc , There's a lot of jumping in the middle , There will be a big drop in performance , But with a registry , Gateway is a direct point-to-point communication with the business side , No additional losses .

The service call

Gateway is connected to many different protocols , So you may need to implement many ways of calling , such as HTTP、Dubbo etc. , For performance reasons , It's best to be asynchronous , and Http、Dubbo It's asynchronous , such as apache Based on NIO Asynchronous implementation HTTP client . Because the gateway involves many asynchronous calls , Like interceptors 、HTTP client 、dubbo、redis etc. , So we need to consider the way of asynchronous call , If based on callback or future Words , Code nesting can be very deep , Poor readability , You can refer to zuul and spring cloud gateway The plan , Based on the response type transformation .

Elegant offline

Elegant offline is also a problem that the gateway needs to pay attention to , There are many protocols involved in the bottom layer of gateway , such as HTTP、Dubbo, and HTTP Can continue to subdivide , For example, domain name 、 Registration Center, etc , Some of them support elegant offline , such as Nginx It supports the health monitoring mechanism , If a node is detected to have hung , I'll take this node off , For application normal offline , It needs to be combined with the distribution system , First, logic offline , And then for the follow-up Nginx The health monitoring request of failed directly ( For example, direct return 500), Then wait for a while ( according to Nginx Configuration decision ), Then the application will be offline . In addition, the registration center is similar , Generally, the registration center only supports manual logoff , You can call the interface of the registry in the logical logoff phase to logoff the node , And some of them don't support active offline , It needs to be combined with cache configuration , Let apps delay offline . And for other things like Dubbo The principle is similar .

performance

Gateway as the entrance of all traffic , Performance is the top priority , Most of the early gateways were built on synchronous blocking model , such as Zuul 1.x. But we all know about this synchronous model , Each request / All connections take up one thread , And the thread is JVM Is a very heavy resource , such as Tomcat The default is 200 Threads , If gateway isolation is not done well , When there is a network delay 、FullGC、 When the upstream service is delayed due to the slow service of the third party , It's easy to fill the thread pool , Cause a new request to be rejected , But at this time, the threads are all blocked IO On , The resources of the system are not fully utilized . Another point , Vulnerable to the Internet 、 disk IO Wait for the delay to affect . The timeout needs to be set carefully , If not set properly , And if the service isolation is not perfect , The gateway can easily be dragged down by a slow interface .

And the way of asynchronization is totally different , Usually a CPU The kernel starts a thread to process all requests 、 Respond to . The life cycle of a request is no longer fixed to a thread , Instead, it will be divided into different stages to be handled by different thread pools , The resources of the system can be used more fully . And because the thread is no longer exclusive to a connection , A connection will also take up much less system resources , It's just a file descriptor plus a few listeners and so on , And in the blocking model , Each connection will have one thread exclusive , And threads are a very heavy resource . For delays in upstream services , Can also get a lot of relief , Because in the blocking model , Slow requests will monopolize a thread resource , After different steps , Because the resources of a single connection become very low , The system can handle a large number of requests at the same time . If it is JVM platform ,Zuul 2、Spring Cloud gateway Etc. are good asynchronous gateway selection , It can also be based on Netty、Spring Boot2.x Of webflux、vert.x perhaps servlet3.1 Asynchronous support for self-study .

cache

For some idempotent get request , One layer of cache can be made at the gateway level according to the cache header specified by the business party , Store in Redis Wait for the second level cache , Such repeated requests , It can be processed directly in the gateway layer , Instead of calling the line of business , Reduce the pressure on the business side , In addition, if the business side node hangs up , The gateway can also return its own cache .

Current limiting

For each business component , It can be said that it is a necessary component , If current limiting is not good , When the number of requests surges , It's easy to cause the business side's service to hang up , Such as double 11、 double 12 Wait for a big promotion , The number of requests for the interface is several times as many as usual , If the capacity is not evaluated , If there is no current restriction , It's easy to service the entire unavailable , So it needs to be based on the processing capacity of the business side interface , Do a good job in current limiting strategy , I believe everyone has seen Taobao 、 Baidu grab red packets when the degradation page . So we must do a good job in the access layer , For non core interfaces, you can degrade them directly , Guarantee the availability of core services , For the core interface , According to the interface capacity obtained during pressure measurement , Make corresponding current limiting strategies . There are several kinds of current limiting :

stand-alone . Single machine performance is relatively high , No remote calls involved , It's just a local count , To the interface RT The impact is minimal . But we need to consider the setting of the lower limit flow number , For example, for a single gateway 、 Or the whole gateway cluster , If it's the whole cluster , Need to consider the gateway shrink 、 When expanding capacity, modify the corresponding current limiting number .
Distributed . Distributed requires a storage node to maintain the number of calls of the current interface , such as redis、sentinel etc. , This method involves remote call , There will be some performance loss , In addition, we also need to consider the problem of storage hanging , such as redis If you hang up , The gateway needs to consider the degradation scheme , It's down to local current limit , Or directly degrade the current limiting function itself . There are also different strategies : Simple count 、 Token bucket, etc , In most scenarios, simple counting is enough , But if you need to support burst traffic and other scenarios , We can use token bucket and other schemes . It is also necessary to consider what is the basis for current limitation , For example IP、 Interface 、 User dimension 、 Or some values in the request parameters , Here we can use the expression , Relatively flexible .

stability

Stability is a very important part of the gateway , monitor 、 The alarm needs to be perfect , For example, interface adjustment 、 response time 、 abnormal 、 Error code 、 Success rate and other related monitoring alarms , There are also some thread pools , For example, the number of active threads 、 Queue backlog, etc , There are also system level , such as CPU、 Memory 、FullGC These are basic . Gateway is the gateway to all services , The requirements for the stability of the gateway are higher than other services , It's better to be able to run stably all the time , Try to restart as little as possible , But when it comes to new features 、 Or add logs to check the problem , The inevitable need to redistribute , So you can refer to zuul The way , All the core functions are based on different interceptors , The code of interceptor adopts Groovy To write , Store in database , Support dynamic loading 、 compile 、 function , In this way, when there is a problem, it can locate and solve it in the first time , And if the gateway needs to develop new features , Just add new interceptors , And add it to the gateway dynamically , No need to republish .

Fusing the drop

The fusing mechanism is also very important . If a service hangs up 、 Interface response serious timeout and so on , Then the whole gateway may be dragged down by an interface , Therefore, it is necessary to increase the fusing degradation , When a particular exception occurs , Degradation of the interface is directly returned by the gateway , Can be based on Hystrix perhaps Resilience4j Realization .

journal

Because all requests are processed by the gateway , Therefore, the log also needs to be relatively complete , For example, the time-consuming interface 、 Request mode 、 request IP、 Request parameters 、 Response parameter ( Pay attention to desensitization ) etc. , In addition, it may involve many micro services , So we need to provide a unified traceId It is convenient to associate all logs , You can put this traceId In the response header , It's easy to troubleshoot .

Isolation

For example, thread pool 、http Connection pool 、redis Isolation at the application level , In addition, it can also be based on business scenarios , Take the core business deployment with a separate gateway cluster , Separated from other non core businesses .

Gateway control platform

This is also a very important part , The user experience of the whole process needs to be considered , For example, the process of connecting to the gateway , Can you simplify it as much as possible 、 intelligence , For example, if it is dubbo Interface , We can go to git Get the source code in the warehouse 、 Parse the corresponding class 、 Method , To achieve automatic filling , Try to help users to reduce operations ; In addition, the interface is generally from test -> Advance -> on-line , It would be very troublesome to fill out the form every time , Can we do it automatically , In addition, if the gateway is deployed to multiple zones 、 Even different countries , So at this point , We also need interface data synchronization , Otherwise, the user needs to go to each background to operate once , Very trouble . This personal suggestion is to refer directly to Alibaba cloud 、aws Wait for the gateway service provided , Very comprehensive .

other

There are other points to consider , For example interface mock, Document generation 、sdk Code generation 、 Error code unification 、 Service governance and so on , I don't want to talk about it here .

summary

The current gateway is still a centralized architecture , All requests need to go through the gateway once , So when there is a big boost or a sudden increase in flow , Gateways can be a performance bottleneck , And when the gateway accesses a large number of interfaces , It's not an easy job to do well in traffic assessment , Before each promotion, it is necessary to conduct pressure test for the interface with the business party , Assess the approximate capacity , And expand the capacity of the gateway , And the gateway is the gateway to all traffic , All requests are processed by the gateway , It's complicated to accurately evaluate the capacity . You can refer to the popular ServiceMesh, Take a decentralized approach , Sink the logic of the gateway to sidecar in , sidecar And the application deployed to the same node , And take over the application flow 、 Outflow flow , In such a hurry , Only need to test the relevant business , And targeted expansion can , In addition, the upgrade will be smoother , Centralized gateway , Even if the grayscale is released , But in theory, the traffic of all business parties will flow into the new version of gateway , If something goes wrong , It will affect all businesses , But this decentralized approach , You can upgrade the non core business first , After a period of observation, no problem , And then push it all online . in addition ServiceMesh The plan , It's also more friendly for multilingual support .

Personal summary ： When there are more and more company projects , More and more services are provided , The gateway management must be unified , Unified portal management eliminates the need to build wheels repeatedly, such as authentication for each service , You only need to provide authentication and other general services in one service , After using the unified gateway, the system security can also be guaranteed , It's just like nignx The unified entrance can be used to monitor various security issues , Interface current limiting , Service restriction , Routing and forwarding , Load balancing after route forwarding , The configuration center controls smooth and stable uplink and downlink , Develop new services to stabilize opening and testing , Stable provision of critical services at high concurrency. Non critical services can be degraded , Various services can be developed in different languages without any impact, etc

Reference resources ：API gateway - Tencent cloud developer community - Tencent cloud

Reference resources ： The function of gateway - Simple books

Reference resources ： Section 9 ： Unified gateway Gateway_ The sad blog of Park 7 -CSDN Blog _ Unified gateway

If there is any infringement , Please contact delete

原网站

版权声明
本文为[dj1540225203]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/179/202206281551059791.html