MongoDB Application in Tencent retail premium code
CSIG Tencent youcode team / Tencent cloud MongoDB The team
This article mainly shares the excellent code business of Tencent smart retail team in MongoDB Application in , use MongoDB As the primary storage service, it brings great benefits to the business , It mainly includes : High performance 、 shortcut DDL operation 、 Low storage costs 、 Huge storage capacity and other benefits , Greatly reduce business storage costs , And improve the efficiency of business iterative development .
1. Business scenario
Tencent Youma from connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods , Including the upgrading of marketing ability and dynamic marketing ability . Tencent premium code is provided by genuine Pintong 、 Store link and member link are composed of three sub products . For more information, please visit the official website of Tencent Youma :
Overall view of Tencent youcode :

1.1. Genuine Pintong
Tencent youcode genuine Pintong provides anti-counterfeiting and authentication capabilities , Realize the whole process genuine product traceability of one object and one code , The full link data is stored in the blockchain , Ensure authenticity ; More direct access to the private domain of the brand , Realize the further transformation of traffic ; At the same time, genuine Pintong provides brand protection ability in wechat domain , Block the spread of brand counterfeiting websites 、 Help consumers identify counterfeit goods .
The product mainly contains the following core features :

1.2. Store link
Tencent Youma Diantong is a service brand 、 Distributor 、 The core roles of industry representatives and the four retail links of terminal stores realize the upgrading of sales management means and sales promotion based on terminal sales stores .
The product mainly contains the following core features :

1.3. Members to pass
Tencent Youma membership is provided for retail brands SaaS+ Products with customized services , Take scanning code as the starting point , Connect online and offline scenes . Provide rich code scanning / Interactive activity model 、 The activity evaluation system helps brands connect consumers .
The product mainly contains the following core features :

2. Code storage
Tencent smart retail premium code business stores QR code information of retail goods , This information is the core data information of smart retail , Provide “ From connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods ” Related services . Therefore, code data storage is the core issue of the project .
2.1. Needs and solutions
To solve the problem of code storage , First, we need to analyze the characteristics of code storage . After analysis, the main characteristics of code storage problem are :
Ø
Huge amounts of data :
The commodity QR code made by Tencent Youma , As more and more goods use Tencent youcode business , QR code data began to show exponential growth .
Ø
Associative storage :
There is... Between codes 1:1 and 1:N:N Correlation relation of , You need to store this relationship , And provide corresponding association query .
Ø
Multidimensional query :
Different dimensions of conditional queries are required for different application scenarios .
After obtaining the code storage characteristics , After many investigations and investigations , Preliminarily selected 2 Storage scheme :
1) MySql + ES:MySql Sub database and sub table store symbol data , Provide reading and writing scenarios that require high performance ; Then synchronize some data according to the demand ES To deal with various complex query scenarios .
2) MongoDB:MongoDB It is the highest ranked distributed storage engine in the world , Its core feature is No Schema、 Highly available and distributed , Ideal for distributed storage .
2.2. Scheme analysis
2.2.1. MySql + ES Scheme analysis
MySql + ES Is a common storage solution , And it is widely used in many fields , For example, in the field of member or commodity information storage . The advantage of this scheme is that it can provide many query methods and different performance guarantees , It can deal with all kinds of complex business query requirements .
MySql + ES The common architecture of is that the write operation acts directly on MySql, And then through canal + Kafka Synchronize data changes to ES, Then, according to different query scenarios, from MySql perhaps ES Query data . The following figure is the possible architecture diagram under the Tencent premium code business scenario :

As can be seen from the architecture diagram , There are several problems in this scheme :
1)
Data synchronization and consistency issues :
This problem will not have an impact when the amount of data is small . But if the amount of data is 10 billion or even 100 billion, it is a very serious problem .
2)
Data capacity issues :
In general MySql It's best to keep the single table data below one million , If the amount of data in a single table is too large, reading and writing is a problem . So if you want to store hundreds of billions of data, you need thousands of tables , When so many sub tables need to be maintained by the business itself, it is almost impossible to develop operation and maintenance .
3)
The question of cost :
Data redundancy storage , Will add additional storage costs . meanwhile ES In order to ensure data reliability and query performance , Need more machines and memory . and ES There is a problem of data inflation , For the same data , It takes quite a while MySql For larger disks .
4)
DDL Operational problems :
MySql After the sub database distribution , because DDL Statement needs to operate on a large number of Library tables , So it's very time consuming , It's also error prone . According to our previous project experience , When there are hundreds of tables , Hundreds of thousands of data in a single table , A simple way to add fields DDL Statements also need 1 Hours or more to complete .
5)
Development cost problem :
This scheme requires the business to maintain its own sub database and sub table 、 Data synchronization and selecting different query engines according to requirements . Not only is the whole architecture complex , At the same time, we need to consider carefully when doing business needs , Using the wrong storage engine with little attention can lead to performance problems .
6)
Horizontal expansion problem :
MySql To expand the capacity of sub database and sub table, the business needs to be expanded manually rehash Relocation data , The cost is very high , Moreover, it is difficult to deal with the problem of data reading and writing in the process of capacity expansion .
2.2.2.
MongoDB Scheme analysis
MongoDB Is a very famous distributed storage engine , Have No Schema、 High availability 、 Distributed 、 Data compression and other advantages . although MongoDB yes NoSQL Storage engine , But its Wired Tiger Storage engine and innerdb The same bottom layer uses B+ Trees , therefore MongoDB On the premise of providing distributed storage, it can provide most of the data at the same time MySql Supported query methods . therefore , In the use of MongoDB when , We don't need to MySql Redundant tables or ES To support most distributed queries . In the application scenario of Tencent youcode , be based on MongoDB The storage architecture of is shown in the figure below :

As you can see from the diagram ,MongoDB It can avoid data synchronization and consistency problems caused by redundant storage 、 Storage costs 、 resources / Operation and maintenance / Development costs . And in further testing and Analysis MongoDB After the function and performance of , We found that MongoDB It also has the following advantages :
Ø
nothing DDL problem :
because MongoDB yes No Schema Of , So you can avoid MySql Of DDL problem .
Ø
The data is automatically uniform :
MongoDB There's automatic rebalance function , When the data is unevenly distributed , Automatic relocation of data , Ensure that the load between each partition is uniform .
Ø
Lower cost :
MongoDB Built in data compression , Under the same data ,MongoDB Less disk required .
Ø
Higher performance :
MongoDB Maximize the use of memory , In most scenarios, it has a performance close to that of an in memory database . After testing MongoDB The single slice reading performance is about 3 ten thousand QPS.
Ø
More ways to read and write :
although MongoDB No, ES The inverted index of , The query method supported is slightly inferior to ES. however ,MongoDB In possession of most ES At the same time , Its performance is much higher than ES; And relative MySql Come on MongoDB The field type of supports embedded objects and array objects , Therefore, it can meet more reading and writing needs .
2.3. Scheme comparison
Through the previous analysis , Our preliminary judgment MongoDB Have better performance . Therefore, in order to further determine MongoDB The advantages of , We made a deep comparison MySql + ES And MongoDB Performance in all aspects .
2.3.1. Storage cost comparison
MongoDB The advantages in storage are mainly reflected in two aspects : Data compression and non redundant storage .
In order to more intuitively see the disk usage , We simulated the business scenario of Tencent youcode ,MySql + ES and MongoDB Actual storage under .
One side , stay MySql+ES Under the scheme of , In order to meet the needs, we need to make a redundant ES Data and MySql Redundant tables for . The core data of the code is stored in MySql in , The total number of disks accounts for only 38.1%. As I said before MongoDB The scheme does not need redundant storage , Therefore use MongoDB It can reduce this 61.9% Total data capacity .

On the other hand , After testing, the same code data ,MongoDB snappy The compression rate of the compression algorithm is about 3 times ,zlib The compression rate of the compression algorithm is about 6 times . therefore , Although the business chooses to ensure the stability of the system snappy Compression algorithm , but MongoDB Still just need MySql One third of the disk consumption .

2.3.4. Development and operation cost
Ø
No data synchronization link
: Use MongoDB No data synchronization required , Therefore, there is no need to maintain canal Service and kafka queue , Greatly reduce the difficulty of development and operation and maintenance .
Ø
Labor costs and benefits :
stay MySql+ES Every time under the framework MySql Add field changes to the cluster , All need operation and maintenance A certain man day investment , And there is a risk of business jitter , At the same time, it will affect the release progress of business iteration , Iterative release is time-consuming and risky .
Ø
Development and maintenance costs :
MongoDB The storage architecture is simple , A storage , No data consistency pressure .
Ø
Dynamic capacity :
MongoDB Support dynamic capacity expansion at any time , There is basically no capacity ceiling problem , and MySql The business needs to be manually expanded rehash Change data , And ensure the consistency and integrity of the data .
2.3.3. Performance comparison
After pressure testing , alike 4C8G Under the machine configuration ,MySql and MongoDB The write performance is basically the same under a large amount of data .MySql The readability of a single slice is about 6000QPS about ,ES The performance of is only 800QPS about . and MongoDB The single slice reading performance is 3 ten thousand QPS about , Far above MySql and ES Performance of .
2.3.4. summary
After the above analysis and comparison , It's obvious that MongoDB It has advantages in all aspects . In order to more intuitively see the differences between different schemes , Here is a list of slave functions 、 performance 、 cost 、 Scalability and maintainability, etc 5 Comparative data on three aspects :

in summary ,MongoDB On the one hand, it fully meets the business needs , At the same time, in terms of performance 、 cost 、 Maintainability and other aspects are better than the other two schemes , Therefore, the final choice of Tencent premium code is MongoDB Storage scheme as the core data code of the business .
3. MongoDB Fragment cluster optimization process
The retail premium code business has high cost requirements 、 Large amount of data , The real online read-write traffic is not too high ( read 3W QPS requirement ), Therefore, low specification is adopted 4C8G specifications ( Single node specification ) Fragment mode cluster deployment .
3.1. Selection of partition cluster partition construction + Presplitting
Retail premium code data query is through code id Inquire about , So choose code id As a film , This maximizes query performance , All index queries can obtain data through the same fragment . Besides , In order to avoid the data imbalance between slices moveChunk operation , So choose hashed Fragmentation mode , At the same time, pre segmentation shall be carried out in advance ,MongoDB The default support hashed Presplitting , Take the priority code details table as an example , The pre slicing method is as follows :
1. use db_code_xx
2. sh.enableSharding("db_code_xx")
3. //n Is the actual number of slices
4. sh.shardCollection("db_code_xx.t_code_xx", {"id": "hashed"}, false,{numInitialChunks:8192*n})
3.2. Low peak period sliding window setting
because MongoDB The instance node specification is low (4C8G), When the partition room chunks In case of unbalanced data , Will trigger automatic balance equilibrium , Due to the low instance specification ,balance There are the following problems in the process :
Ø CPU Consumption is too high , The migration process even consumes 90% about CPU
Ø Business access jitter , Time consuming increase
Ø Slow log increase
Ø Abnormal alarms increase
The above problems are due to balance The process goes on moveChunk The data relocation process causes , In order to quickly migrate data from one fragment to another ,MongoDB The internal will constantly move the data from one fragment to another , This will consume a lot of CPU, This causes business jitter .
MongoDB The kernel also takes into account balance The process has a certain impact on the business , So... Is supported by default balance Window settings , So we can take balance Process and business peak shift , In this way, the business jitter caused by data migration can be avoided to the greatest extent . For example, set early morning 0-6 Point low peak period balance Window settings , The corresponding commands are as follows :
1. use config
2. db.settings.update({"_id":"balancer"},{"$set":{"activeWindow":{"start":"00:00","stop":"06:00"}}},true)
3.3. Write majority optimization
Because the QR code data is very core , In order to avoid the risk of data loss and data regression in extreme cases , Therefore, the client adopts writeConcern={w: “majority”} To configure , Ensure that the data is written to most members of the replica set before sending a confirmation to the client .
The concept of chain copy : Assume that node A(primary)、B node (secondary)、C node (secondary), If B Node slave A Node synchronization data ,C Node slave B Node synchronization data , such A->B->C A chain type synchronous structure is formed between them , As shown in the figure below :


MongoDB Multi node replica sets can support chain replication , You can obtain whether the current replica set supports chain replication through the following command :
1. cmgo-xx:SECONDARY> rs.conf().settings.chainingAllowed
2. true
3. cmgo-xx:SECONDARY>
Besides , You can judge whether there is chain replication in the current replica set node by viewing the synchronization source of each node in the replica set , If the synchronization source is secondary From the node , It indicates that there is a chain copy in the replica set , See the following replica set parameters for details :
1. cmgo-xx:SECONDARY> rs.status().syncSourceHost
2. xx.xx.xx.xx:7021
3. cmgo-xx:SECONDARY>
Since the business is configured to write majority , In view of performance considerations, you can turn off the chain copy function ,MongoDB You can close it with the following commands :
1. cfg = rs.config()
2. cfg.settings.chainingAllowed = false
3. rs.reconfig(cfg)
The benefits of chain replication :
It can greatly reduce the synchronization of the primary node oplog The pressure of the .
Insufficient chain copy :
When the write strategy is majority when , It takes more time to write requests
Based on write performance considerations , When the business adopts “ Write most ” strategy , Directly turn off the chain copy function , Ensure write performance degradation caused by long write link ..
About author
CSIG Tencent youcode team :
Tencent Youma team has been deeply involved in the retail industry for many years , We are committed to connecting consumers to channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods . Tencent Youma builds a membership link 、 Store link 、 Genuine Pintong 、 Code middle stage , Three connections and one code platform . At present, we have been drinking water 、 The beverage and food industries have relatively complete digital solutions , And serve 70+ Enterprises , Connect 150 Billion + goods , Number of code scanning people 60 Billion +.
tencent MongoDB The team :
Tencent cloud MongoDB Currently serving the game 、 Online retailers 、 social contact 、 education 、 News and information 、 Finance 、 The Internet of things 、 Software services and other industries ;MongoDB The team ( abbreviation CMongo) Committed to open source MongoDB Kernel for in-depth research and continuous optimization ( Such as millions of Library tables 、 The physical backup 、 Unclassified 、 Audit, etc ), Provide users with high performance 、 Low cost 、 High availability secure database storage service . Continue to share MongoDB Typical application scenarios inside and outside Tencent 、 Case of stepping on a pit 、 performance optimization 、 Kernel modularity analysis .
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206221416173171.html