当前位置：网站首页>Application of mongodb in Tencent retail premium code

Application of mongodb in Tencent retail premium code

2022-06-28 15:18:00 【Yang Yaya focuses on mongodb and high-performance Middleware】

MongoDB Application in Tencent retail premium code

CSIG Tencent youcode team / Tencent cloud MongoDB The team

This article mainly shares the excellent code business of Tencent smart retail team in MongoDB Application in , use MongoDB As the primary storage service, it brings great benefits to the business , It mainly includes ： High performance 、 shortcut DDL operation 、 Low storage costs 、 Huge storage capacity and other benefits , Greatly reduce business storage costs , And improve the efficiency of business iterative development .

Business scenario

Tencent Youma from connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods , Including the upgrading of marketing ability and dynamic marketing ability . Tencent premium code is provided by genuine Pintong 、 Store link and member link are composed of three sub products . For more information, please visit the official website of Tencent Youma ：https://uma.qq.com/

Overall view of Tencent youcode ：

1.1. Genuine Pintong

Tencent youcode genuine Pintong provides anti-counterfeiting and authentication capabilities , Realize the whole process genuine product traceability of one object and one code , The full link data is stored in the blockchain , Ensure authenticity ; More direct access to the private domain of the brand , Realize the further transformation of traffic ; At the same time, genuine Pintong provides brand protection ability in wechat domain , Block the spread of brand counterfeiting websites 、 Help consumers identify counterfeit goods .

The product mainly contains the following core features ：

1.2. Store link

Tencent Youma Diantong is a service brand 、 Distributor 、 The core roles of industry representatives and the four retail links of terminal stores realize the upgrading of sales management means and sales promotion based on terminal sales stores .

The product mainly contains the following core features ：

1.3. Members to pass

Tencent Youma membership is provided for retail brands SaaS + Products with customized services , Take scanning code as the starting point , Connect online and offline scenes . Provide rich code scanning / Interactive activity model 、 The activity evaluation system helps brands connect consumers .

The product mainly contains the following core features ：

2. Code storage

Tencent smart retail premium code business stores QR code information of retail goods , This information is the core data information of smart retail , Provide “ From connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods ” Related services . Therefore, code data storage is the core issue of the project .

2.1. Needs and solutions

To solve the problem of code storage , First, we need to analyze the characteristics of code storage . After analysis, the main characteristics of code storage problem are ：

Huge amounts of data ： The commodity QR code made by Tencent Youma , As more and more goods use Tencent youcode business , QR code data began to show exponential growth .
Associative storage ： There is... Between codes 1：1 and 1：N：N Correlation relation of , You need to store this relationship , And provide corresponding association query .
Multidimensional query ： Different dimensions of conditional queries are required for different application scenarios .

After obtaining the code storage characteristics , After many investigations and investigations , Preliminarily selected 2 Storage scheme ：

MySql + ES：MySql Sub database and sub table store symbol data , Provide reading and writing scenarios that require high performance ; Then synchronize some data according to the demand ES To deal with various complex query scenarios .
MongoDB：MongoDB It is the highest ranked distributed storage engine in the world , Its core feature is No Schema、 Highly available and distributed , Ideal for distributed storage .

2.2 Scheme analysis

2.2.1. MySql + ES Scheme analysis

MySql + ES Is a common storage solution , And it is widely used in many fields , For example, in the field of member or commodity information storage . The advantage of this scheme is that it can provide many query methods and different performance guarantees , It can deal with all kinds of complex business query requirements .

MySql + ES The common architecture of is that the write operation acts directly on MySql, And then through canal + Kafka Synchronize data changes to ES, Then, according to different query scenarios, from MySql perhaps ES Query data . The following figure is the possible architecture diagram under the Tencent premium code business scenario ：

As can be seen from the architecture diagram , There are several problems in this scheme ：

Data synchronization and consistency issues ： This problem will not have an impact when the amount of data is small . But if the amount of data is 10 billion or even 100 billion, it is a very serious problem .
Data capacity issues ： In general MySql It's best to keep the single table data below one million , If the amount of data in a single table is too large, reading and writing is a problem . So if you want to store hundreds of billions of data, you need thousands of tables , When so many sub tables need to be maintained by the business itself, it is almost impossible to develop operation and maintenance .
The question of cost ： Data redundancy storage , Will add additional storage costs . meanwhile ES In order to ensure data reliability and query performance , Need more machines and memory . and ES There is a problem of data inflation , For the same data , It takes quite a while MySql For larger disks .
DDL Operational problems ：MySql After the sub database distribution , because DDL Statement needs to operate on a large number of Library tables , So it's very time consuming , It's also error prone . According to our previous project experience , When there are hundreds of tables , Hundreds of thousands of data in a single table , A simple way to add fields DDL Statements also need 1 Hours or more to complete .
Development cost problem ： This scheme requires the business to maintain its own sub database and sub table 、 Data synchronization and selecting different query engines according to requirements . Not only is the whole architecture complex , At the same time, we need to consider carefully when doing business needs , Using the wrong storage engine with little attention can lead to performance problems .
Horizontal expansion problem ：MySql To expand the capacity of sub database and sub table, the business needs to be expanded manually rehash Relocation data , The cost is very high , Moreover, it is difficult to deal with the problem of data reading and writing in the process of capacity expansion .
2.2.2. MongoDB Scheme analysis

MongoDB Is a very famous distributed storage engine , Have No Schema、 High availability 、 Distributed 、 Data compression and other advantages . although MongoDB yes NoSQL Storage engine , But its Wired Tiger Storage engine and innerdb The same bottom layer uses B + Trees , therefore MongoDB On the premise of providing distributed storage, it can provide most of the data at the same time MySql Supported query methods . therefore , In the use of MongoDB when , We don't need to MySql Redundant tables or ES To support most distributed queries . In the application scenario of Tencent youcode , be based on MongoDB The storage architecture of is shown in the figure below ：

As you can see from the diagram ,MongoDB It can avoid data synchronization and consistency problems caused by redundant storage 、 Storage costs 、 resources / Operation and maintenance / Development costs . And in further testing and Analysis MongoDB After the function and performance of , We found that MongoDB It also has the following advantages ：

nothing DDL problem ： because MongoDB yes No Schema Of , So you can avoid MySql Of DDL problem .
The data is automatically uniform ：MongoDB There's automatic rebalance function , When the data is unevenly distributed , Automatic relocation of data , Ensure that the load between each partition is uniform .
Lower cost ：MongoDB Built in data compression , Under the same data ,MongoDB Less disk required .
Higher performance ：MongoDB Maximize the use of memory , In most scenarios, it has a performance close to that of an in memory database . After testing MongoDB The single slice reading performance is about 3 ten thousand QPS.
More ways to read and write ： although MongoDB No, ES The inverted index of , The query method supported is slightly inferior to ES. however ,MongoDB In possession of most ES At the same time , Its performance is much higher than ES; And relative MySql Come on MongoDB The field type of supports embedded objects and array objects , Therefore, it can meet more reading and writing needs .

2.3. Scheme comparison

Through the previous analysis , Our preliminary judgment MongoDB Have better performance . Therefore, in order to further determine MongoDB The advantages of , We made a deep comparison MySql + ES And MongoDB Performance in all aspects .

2.3.1. Storage cost comparison

MongoDB The advantages in storage are mainly reflected in two aspects ： Data compression and non redundant storage .

In order to more intuitively see the disk usage , We simulated the business scenario of Tencent youcode ,MySql + ES and MongoDB Actual storage under .

One side , stay MySql+ES Under the scheme of , In order to meet the needs, we need to make a redundant ES Data and MySql Redundant tables for . The core data of the code is stored in MySql in , The total number of disks accounts for only 38.1%. As I said before MongoDB The scheme does not need redundant storage , Therefore use MongoDB It can reduce this 61.9% Total data capacity .

On the other hand , After testing, the same code data ,MongoDB snappy The compression rate of the compression algorithm is about 3 times ,zlib The compression rate of the compression algorithm is about 6 times . therefore , Although the business chooses to ensure the stability of the system snappy Compression algorithm , but MongoDB Still just need MySql One third of the disk consumption .

2.3.2. Development and operation cost

No data synchronization link ： Use MongoDB No data synchronization required , Therefore, there is no need to maintain canal Service and kafka queue , Greatly reduce the difficulty of development and operation and maintenance .
Labor costs and benefits ： stay MySql+ES Every time under the framework MySql Add field changes to the cluster , All need operation and maintenance A certain man day investment , And there is a risk of business jitter , At the same time, it will affect the release progress of business iteration , Iterative release is time-consuming and risky .
Development and maintenance costs ：MongoDB The storage architecture is simple , A storage , No data consistency pressure .
Dynamic capacity ：MongoDB Support dynamic capacity expansion at any time , There is basically no capacity ceiling problem , and MySql The business needs to be manually expanded rehash Change data , And ensure the consistency and integrity of the data .

2.3.2. Performance comparison

After pressure testing , alike 4C8G Under the machine configuration ,MySql and MongoDB The write performance is basically the same under a large amount of data .MySql The readability of a single slice is about 6000QPS about ,ES The performance of is only 800QPS about . and MongoDB The single slice reading performance is 3 ten thousand QPS about , Far above MySql and ES Performance of .

2.3.4. summary

After the above analysis and comparison , It's obvious that MongoDB It has advantages in all aspects . In order to more intuitively see the differences between different schemes , Here is a list of slave functions 、 performance 、 cost 、 Scalability and maintainability, etc 5 Comparative data on three aspects ：

in summary ,MongoDB On the one hand, it fully meets the business needs , At the same time, in terms of performance 、 cost 、 Maintainability and other aspects are better than the other two schemes , Therefore, the final choice of Tencent premium code is MongoDB Storage scheme as the core data code of the business .

3. MongoDB Fragment cluster optimization process

The retail premium code business has high cost requirements 、 Large amount of data , The real online read-write traffic is not too high ( read 3W QPS requirement ), Therefore, low specification is adopted 4C8G specifications ( Single node specification ) Fragment mode cluster deployment .

3.1. Selection of partition cluster partition construction + Presplitting

Retail premium code data query is through code id Inquire about , So choose code id As a film , This maximizes query performance , All index queries can obtain data through the same fragment . Besides , In order to avoid the data imbalance between slices moveChunk operation , So choose hashed Fragmentation mode , At the same time, pre segmentation shall be carried out in advance ,MongoDB The default support hashed Presplitting , Take the priority code details table as an example , The pre slicing method is as follows ：

1.	use db_code_xx  
2.	sh.enableSharding("db_code_xx")  
3.	//n Is the actual number of slices  
4.	sh.shardCollection("db_code_xx.t_code_xx", {"id": "hashed"}, false,{numInitialChunks:8192*n})

3.2. Low peak period sliding window setting

because MongoDB The instance node specification is low (4C8G), When the partition room chunks In case of unbalanced data , Will trigger automatic balance equilibrium , Due to the low instance specification ,balance There are the following problems in the process ：

CPU Consumption is too high , The migration process even consumes 90% about CPU
Business access jitter , Time consuming increase
Slow log increase
Abnormal alarms increase

The above problems are due to balance The process goes on moveChunk The data relocation process causes , In order to quickly migrate data from one fragment to another ,MongoDB The internal will constantly move the data from one fragment to another , This will consume a lot of CPU, This causes business jitter .

MongoDB The kernel also takes into account balance The process has a certain impact on the business , So... Is supported by default balance Window settings , So we can take balance Process and business peak shift , In this way, the business jitter caused by data migration can be avoided to the greatest extent . For example, set early morning 0-6 Point low peak period balance Window settings , The corresponding commands are as follows ：

1.	use config  
2.	db.settings.update({"_id":"balancer"},{"$set":{"activeWindow":{"start":"00:00","stop":"06:00"}}},true)

3.3. Write majority optimization

Because the QR code data is very core , In order to avoid the risk of data loss and data regression in extreme cases , Therefore, the client adopts writeConcern={w: “majority”} To configure , Ensure that the data is written to most members of the replica set before sending a confirmation to the client .

The concept of chain copy ： Assume that node A (primary)、B node (secondary)、C node (secondary), If B Node slave A Node synchronization data ,C Node slave B Node synchronization data , such A->B->C A chain type synchronous structure is formed between them , As shown in the figure below ：

MongoDB Multi node replica sets can support chain replication , You can obtain whether the current replica set supports chain replication through the following command ：

1.	cmgo-xx:SECONDARY> rs.conf().settings.chainingAllowed  
2.	true  
3.	cmgo-xx:SECONDARY>

Besides , You can judge whether there is chain replication in the current replica set node by viewing the synchronization source of each node in the replica set , If the synchronization source is secondary From the node , It indicates that there is a chain copy in the replica set , See the following replica set parameters for details ：

1.	cmgo-xx:SECONDARY> rs.status().syncSourceHost  
2.	xx.xx.xx.xx:7021  
3.	cmgo-xx:SECONDARY>

Since the business is configured to write majority , In view of performance considerations, you can turn off the chain copy function ,MongoDB You can close it with the following commands ：

1.	cfg = rs.config()  
2.	cfg.settings.chainingAllowed = false
3.	rs.reconfig(cfg)

The benefits of chain replication ： It can greatly reduce the synchronization of the primary node oplog The pressure of the .

Insufficient chain copy ： When the write strategy is majority when , It takes more time to write requests

Based on write performance considerations , When the business adopts “ Write most ” strategy , Directly turn off the chain copy function , Ensure write performance degradation caused by long write link ..

About author

CSIG Tencent youcode team ：

Tencent Youma team has been deeply involved in the retail industry for many years , We are committed to connecting consumers to channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods . Tencent Youma builds a membership link 、 Store link 、 Genuine Pintong 、 Code middle stage , Three connections and one code platform . At present, we have been drinking water 、 The beverage and food industries have relatively complete digital solutions , And serve 70 + Enterprises , Connect 150 Billion + goods , Number of code scanning people 60 Billion +.

tencent MongoDB The team ：

Tencent cloud MongoDB Currently serving the game 、 Online retailers 、 social contact 、 education 、 News and information 、 Finance 、 The Internet of things 、 Software services and other industries ;MongoDB The team ( abbreviation CMongo) Committed to open source MongoDB Kernel for in-depth research and continuous optimization ( Such as millions of Library tables 、 The physical backup 、 Unclassified 、 Audit, etc ), Provide users with high performance 、 Low cost 、 High availability secure database storage service . Continue to share MongoDB Typical application scenarios inside and outside Tencent 、 Case of stepping on a pit 、 performance optimization 、 Kernel modularity analysis .

原网站

版权声明
本文为[Yang Yaya focuses on mongodb and high-performance Middleware]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/179/202206281510367968.html