当前位置:网站首页>How to build a 100000 level QPS large flow and high concurrency coupon system from zero
How to build a 100000 level QPS large flow and high concurrency coupon system from zero
2022-06-28 15:09:00 【Floating across the sea】
actual combat ! How to build from zero 10 All level QPS Large flow 、 High concurrency coupon system
Demand background
Spring festival activities , Many business parties need to issue coupons , And for the issuance of bonds QPS There are clear requirements for magnitude . All coupons are issued 、 Write off 、 Queries need a new system to host . therefore , We need to design 、 Develop one that can support 100000 levels QPS Our coupon system , And maintain the complete life cycle of coupons .
Demand disassembly and technical selection
Need to be disassembled
- To configure coupons , It will involve coupon batches ( Coupon template ) establish , The validity period of the coupon template and the inventory information of the coupon
- To issue coupons , It will involve the creation and management of voucher records ( Expiration time , state )
therefore , We can simply break down the requirements into two parts :
meanwhile , Whether it's a coupon template or a coupon record , All need an open query interface , Support coupon template / Query of coupon records .
System selection and middleware
Identified basic needs , We are on demand , Further analyze the middleware that may be used , And the overall organization of the system .
Storage
Because the coupon template 、 These are data that need to be persisted , At the same time, it also needs to support conditional query , So we choose general structured storage MySQL As a storage middleware .
cache
- Because the coupon template information is required when issuing coupons , In case of large flow , It's impossible to start from... Every time MySQL Get coupon template information , So consider introducing caching
- Empathy , Inventory management of coupons , Or inventory deduction , It's also a high frequency 、 Real time operation , So also consider putting it in the cache
Mainstream caching Redis It can meet our needs , So we use Redis As a caching middleware .
Message queue
Because the coupon template / All voucher records need to show the expired status , And conduct business logic processing according to different states , Therefore, it is necessary to introduce the delayed message queue to the coupon template / Process the coupon status .RocketMQ Delay messages are supported , So we use RocketMQ As message queue .
System framework
The coupon issuing system serves as a downstream service , It needs to be called by upstream services . Between the company's internal services , It's all about RPC The service call , The system development language uses golang, So we use golang Service RPC frame kitex Code .
We use kitex+MySQL+Redis+RocketMQ To realize the coupon issuing system ,RPC The service is deployed in the company's docker In the container .
System development and practice
System design and implementation
Overall system architecture
From the requirements disassembly part, we have a general understanding of the system to be developed , The following is an overall system architecture , Contains some specific functions .
data structure ER chart
Corresponding to the system architecture , We need to establish corresponding MySQL Data storage tables .
Core logic implementation
Issuance of securities :
- The issuance process is divided into three parts : Parameter checking 、 Idempotent check 、 Inventory deduction .
Idempotent operation is used to ensure that the coupon issuance request is incorrect , Business party retry 、 Ask again for compensation , You can end up with just one ticket , Prevent financial losses .
The coupon has expired :
Coupon expiration is a process of state Promotion , Here we use RocketMQ To achieve .
- because RocketMQ There is a maximum limit on the delay messages supported , And the validity of cards and coupons The period is not fixed , It is possible to exceed the limit , So we cycle the card and coupon expiration message , Until the card expires .
Large flow 、 Problems and solutions in high concurrency scenarios
After realizing the basic functions of the system , Let's talk about it , If in a large flow 、 High concurrency scenario , Some problems and solutions that the system may encounter .
Storage bottlenecks and solutions
bottleneck :
In the system architecture , We used MySQL、Redis As a storage component . We know , Of a single server I/O Ability is limited , During the actual test , The following data can be obtained :
- Single MySQL Writes per second at 4000 QPS about , Beyond that number ,MySQL Of I/O The delay will increase dramatically .
- MySQL The single table record has reached the level of ten million , Query efficiency will be greatly reduced , If there are billions of dollars , Data query will become a problem .
- Redis The bottleneck of single slice writing is 2w about , The bottleneck of reading is 10w about
Solution :
- Read / write separation . In the query coupon template 、 Query coupon records and other scenarios , We can MySQL Read and write separation , Let this part of the query traffic go MySQL Read Library , To lessen MySQL The query pressure of writing database .
- Divide and conquer . In software design , There is an idea of partition , For storage bottlenecks , The commonly used solution in the industry is divide and rule : Flow dispersion 、 Storage dispersion , namely : Sub database and sub table .
- Issuance of securities , In the final analysis, it is necessary to make persistent storage for the voucher collection records of users . about MySQL In itself I/O For bottlenecks , We can deploy on different servers MySQL Different pieces of , Yes MySQL Do horizontal expansion , thus , Write requests will be distributed in different MySQL On a host , In this way, we can greatly improve MySQL Overall throughput .
- Issued coupons to users , Then users must query their own vouchers . Based on this logic , We use user_id The last four digits are slice keys , Horizontally split the record form received by the user , To support the query of voucher collection records in user dimension .
- Each coupon has a corresponding number , In the process of issuing vouchers to users , We record the number of coupons issued in Redis Medium , In the case of large traffic , We also need to be right Redis Do horizontal expansion , reduce Redis Pressure of single machine .
Capacity estimate :
Based on the above ideas , In order to meet the demand of issuing coupons 12w QPS Under the demand of , Let's estimate the storage resources .
a. MySQL resources
In the actual test , A single coupon is right for MySQL There was a non transactional write ,MySQL The write bottleneck of the stand-alone is 4000, Based on this, we can calculate what we need MySQL The main library resource is :
120000/4000 = 30
b. Redis resources
hypothesis 12w Your coupon QPS, All are the same coupon template , The write bottleneck of a single slice is 2w, You need at least Redis It's divided into :
120000/20000 = 6
Hot inventory problems and Solutions
problem
In the scenario of large flow coupon issuance , If we use a coupon template , So every time you deduct inventory , Visited Redis It must be a specific piece , therefore , The write bottleneck of this partition will be reached , More serious , May lead to the whole Redis Cluster not available .
Solution
Hot stock issues , There are common solutions in the industry : namely , Reduced inventory key Don't focus on one piece . How to ensure the of this coupon template key Don't concentrate on one piece , We dismantle key( Dismantle inventory ) that will do . Pictured :
In business logic , When we were building the coupon template , Just split the hot coupon template into inventory , Subsequent deduction of inventory , You can also deduct the corresponding sub inventory .
Construction of securities
Inventory deduction
There is one more question left , namely : Deduct sub inventory , Every time from 1 If you start , For the Redis The pressure on the corresponding slice has not been reduced , therefore , We need to : Each request , Random non repeated polling sub inventory . The following is a specific idea of the project :
Redis Of sub inventory key The last digit of the is the number of the slice , Such as :xxx_stock_key1、xxx_stock_key2……, When deducting sub inventory , Our husband forms a random non repeating array corresponding to the total number of slices , If the first time is [1,2,3], The second time may be [3,1,2], such , Every request to deduct sub inventory , Will be distributed to different Redis In pieces , Slow light Redis Single slice pressure at the same time , Can also support higher QPS Deduction request for .
One problem with this idea is , When our inventory is close to running out , The polling of a lot of fragment inventory will become meaningless , So we can ask each time , Record the remaining quantity of sub inventory , When the sub inventory of a coupon template is exhausted , The random and non repeated polling operation directly skips this sub inventory partition , This can optimize the response speed of the system when the inventory is about to run out .
The industry is targeting Redis hotspot key To deal with , Except for points key outside , There is another kind. key The idea of backup : namely , Will be the same key, Use a strategy to back up to different Redis Slice it up , This will break up the hot spots . This kind of thinking is applicable to the scene of reading more and writing less , It is not suitable for dealing with the scenario of large traffic writing such as coupon issuance . When facing specific business scenarios , We need to... According to business needs , Choose the right solution to solve the problem .
Failed to obtain coupon template and its solution
problem
high QPS, High concurrency scenario , Even if we can improve the success rate of the interface 0.01%, The actual performance is also considerable . Now let's look back at the whole process of issuing bonds : Voucher checking template (Redis)–> check –> idempotent (MySQL)–> Issuance of securities (MySQL). When checking the coupon template information , We will ask Redis, This is a strong dependency , In actual observation , We will find that ,Redis The probability of overtime is about 10000 2、3. therefore , This part of the ticket issuance request is bound to fail .
Solution
In order to improve the success rate of this part of the request , We have two options .
One is from Redis Failed to get coupon template , Retry internally ; The second is to cache the coupon template information into the local memory of the instance , I.e. introduction Second level cache .
Internal retry can improve the success rate of some requests , But it cannot be fundamentally solved Redis There is a timeout problem , At the same time, the number of retries is also proportional to the response time of the interface . Introduction of L2 cache , Can fundamentally avoid Redis Ticket issuance request failed due to timeout . Therefore, we choose the L2 cache scheme :
Of course , Local caching is introduced , We also need to start a scheduled task in each service instance to brush the latest coupon template information into the local cache and Redis in , Print template information into Redis In the middle of the day , Add distributed locks , Prevent multiple instances from writing at the same time Redis to Redis Create unnecessary stress .
Service governance
After system development , It is also necessary to ensure the reliable operation of the system through a series of operations .
- timeout . The coupon system is a RPC service , So we need to set up a reasonable RPC Timeout time , Ensure that the system will not be dragged down by the failure of the upstream system . For example, the interface for issuing coupons , Our internal execution time shall not exceed 100ms, Therefore, the interface timeout can be set to 500ms, If there is an exception request , stay 500ms after , Will be rejected , So as to ensure the stable operation of our services .
- Monitoring and alarming . For the monitoring of some core interfaces 、 stability 、 Important data , And the system CPU、 Monitoring of memory, etc , We will be in Grafana Create corresponding visual charts on , During the Spring Festival , Real time observation Grafana The dashboard , To ensure that the system anomaly can be observed as soon as possible . meanwhile , For some exceptions , We also have a perfect alarm mechanism , Thus, we can perceive the abnormality of the system at the first time .
- Current limiting . Coupon system is an underlying service , In the actual business scenario, it will be called by multiple upstream services , therefore , Reasonably limit the flow of these upstream services , It is also an essential link to ensure the stability of the coupon system itself .
- Resource isolation . Because our services are deployed in docker In the cluster , Therefore, in order to ensure the high availability of services , Cluster resources for service deployment shall be distributed in different physical areas as much as possible , To avoid service unavailability caused by cluster .
System pressure measurement and actual performance
After completing the above series of work , It's time to test the performance of our services in the production environment . Of course , Before the new service goes online , The service pressure needs to be measured first . Here is a summary of some problems that may need attention in pressure measurement and pressure measurement conclusions .
matters needing attention
- The first is the idea of pressure measurement , Because we weren't sure at first docker Bottleneck 、 Bottleneck of storage components, etc . So our idea of pressure measurement is generally :
- Find the single instance bottleneck
- find MySQL One main write bottleneck 、 Read bottleneck
- find Redis Single slice write bottleneck 、 Read bottleneck
With the above data , We can roughly estimate the number of resources needed , The overall pressure test of the service was carried out .
2. Pressure testing resources are also important , Apply for sufficient pressure measurement resources in advance , In order to make a reasonable pressure measurement plan .
3. During pressure measurement , Pay attention to the monitoring of services and resources , Think deeply about what doesn't meet expectations , Optimize the code .
4. Record the pressure measurement data in time , In order to better resume .
5. Actual use of resources , It is generally the of pressure measurement data 1.5 times , We need to ensure that there is some resource redundancy on the line to cope with sudden traffic growth .
Conclusion
The system is in 13w QPS Under your coupon request , The request success rate reaches 99.9% above , The system monitoring is normal . During the red envelope rain during the Spring Festival , The coupon system carries all the traffic of two red envelope rains , No exception occurred during , Successfully completed the task of issuing coupons .
Systematic business thinking
- The current system , It only supports the function of issuing coupons with high concurrency , The business exploration of bonds is not enough . The follow-up needs to be combined with the business , Try issuing coupons in bulk ( Coupon bag )、 Batch write off and other functions
- The coupon issuing system is just a business center at the bottom , It can adapt to various scenes , We can explore and support more businesses in the future .
summary
Build a large flow from zero 、 Highly concurrent coupon system , First of all, we should fully understand the business needs , Then disassemble the requirements , According to the needs after disassembly , Reasonably select various middleware ; This paper is mainly to build a coupon system , Therefore, various storage components and message queues are used , To complete the storage of coupons 、 Inquire about 、 Expiration action ;
In the process of system development and Implementation , Core of coupon issuance 、 The realization process of coupon expiration is described , And for large flow 、 Possible storage bottlenecks in high concurrency scenarios 、 Hot stock 、 This paper puts forward the corresponding solution to the problem of ticket template cache timeout . among , We used the idea of partition , Horizontally expand the storage middleware to solve the storage bottleneck ; Adopt the idea of inventory dismantling molecular inventory to solve hot inventory problems ; Introduce the local cache to solve the voucher template from Redis Get the problem of timeout . Finally, it ensures the stable availability of the coupon system in the scenario of large traffic and high concurrency ;
Apart from the service itself , We also set the timeout from the service 、 Monitoring and alarming 、 Current limiting 、 Resource isolation and other aspects of service governance , High availability of support services ;
Pressure measurement is an inevitable part of a new service , Through the pressure test, we can have a clear understanding of the overall situation of the service , And the problems exposed during the pressure test will also be the problems that may be encountered online , By piezometry , We can know the overall situation of the new service well , I'm more confident that the service will be launched and officially put into operation .
Byte article
But the article here I think the main content is adding machines
边栏推荐
- vector详解+题目
- 云杉网络DeepFlow帮助5G核心网和电信云构建可观测性
- Yiwen teaches you to quickly generate MySQL database diagram
- 老板嘱咐了三遍:低调、低调、低调
- 鸟类飞行状态下穿戴式神经信号与行为数据检测记录系统的技术难点总结
- halcon 基础总结(一)裁切图片并旋转图像
- 【空间&单细胞组学】第1期:单细胞结合空间转录组研究PDAC肿瘤微环境
- Calculator (force buckle)
- Case driven: a detailed guide from getting started to mastering shell programming
- Jackie Chan and fast brand, who is the Savior of Kwai?
猜你喜欢
随机推荐
Cross cluster deployment of helm applications using karmada
猫狗图像数据集上的深度学习模型性能对比
The hidden crisis of Weilai: past, present and future
Classmate Zhang hasn't learned to be an anchor yet
MIPS汇编语言学习-03-循环
R语言ggplot2可视化:使用patchwork包将两个ggplot2可视化结果横向构成新的结果可视化组合图(使用|符号)
计算器(力扣)
functools:对callable对象的高位函数和操作(持续更新ing...)
石油化工行业供应链系统驱动管理模式创新升级,强化企业内部管理
[MySQL learning notes 23] index optimization
Leetcode 705. Design hash collection
MIPS汇编语言学习-02-逻辑判断-前台输入
张同学还没学会当主播
R language ggplot2 visualization: use the patchwork package to horizontally form two ggplot2 visualization results into a new result visualization combination diagram (using the | symbol)
open3d里pointcloud和numpy数组之间的转化
组合总和-Leetcode
Functools: high order functions and operations on callable objects (continuous updating ing...)
Vector explanation + topic
C语言学习-19-全排列
蔚来潜藏的危机:过去、现在到未来