当前位置：网站首页>Application of mongodb in Tencent retail premium code

Application of mongodb in Tencent retail premium code

2022-06-22 15:43:00 【InfoQ】

MongoDB Application in Tencent retail premium code

CSIG Tencent youcode team / Tencent cloud MongoDB The team

    This article mainly shares the excellent code business of Tencent smart retail team in MongoDB Application in , use MongoDB As the primary storage service, it brings great benefits to the business , It mainly includes ： High performance 、 shortcut DDL operation 、 Low storage costs 、 Huge storage capacity and other benefits , Greatly reduce business storage costs , And improve the efficiency of business iterative development .

1.   Business scenario

Tencent Youma from connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods , Including the upgrading of marketing ability and dynamic marketing ability . Tencent premium code is provided by genuine Pintong 、 Store link and member link are composed of three sub products . For more information, please visit the official website of Tencent Youma ：

https://uma.qq.com/

       Overall view of Tencent youcode ：

1.1.  Genuine Pintong

Tencent youcode genuine Pintong provides anti-counterfeiting and authentication capabilities , Realize the whole process genuine product traceability of one object and one code , The full link data is stored in the blockchain , Ensure authenticity ; More direct access to the private domain of the brand , Realize the further transformation of traffic ; At the same time, genuine Pintong provides brand protection ability in wechat domain , Block the spread of brand counterfeiting websites 、 Help consumers identify counterfeit goods .

The product mainly contains the following core features ：

1.2.   Store link

Tencent Youma Diantong is a service brand 、 Distributor 、 The core roles of industry representatives and the four retail links of terminal stores realize the upgrading of sales management means and sales promotion based on terminal sales stores .

The product mainly contains the following core features ：

1.3.   Members to pass

Tencent Youma membership is provided for retail brands SaaS+ Products with customized services , Take scanning code as the starting point , Connect online and offline scenes . Provide rich code scanning / Interactive activity model 、 The activity evaluation system helps brands connect consumers .

The product mainly contains the following core features ：

2.   Code storage

Tencent smart retail premium code business stores QR code information of retail goods , This information is the core data information of smart retail , Provide “ From connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods ” Related services . Therefore, code data storage is the core issue of the project .

2.1.   Needs and solutions

To solve the problem of code storage , First, we need to analyze the characteristics of code storage . After analysis, the main characteristics of code storage problem are ：

Huge amounts of data ：

The commodity QR code made by Tencent Youma , As more and more goods use Tencent youcode business , QR code data began to show exponential growth .

Associative storage ：

There is... Between codes 1：1 and 1：N：N Correlation relation of , You need to store this relationship , And provide corresponding association query .

Multidimensional query ：

Different dimensions of conditional queries are required for different application scenarios .

After obtaining the code storage characteristics , After many investigations and investigations , Preliminarily selected 2 Storage scheme ：

1)  MySql + ES：MySql Sub database and sub table store symbol data , Provide reading and writing scenarios that require high performance ; Then synchronize some data according to the demand ES To deal with various complex query scenarios .

2)  MongoDB：MongoDB It is the highest ranked distributed storage engine in the world , Its core feature is No Schema、 Highly available and distributed , Ideal for distributed storage .

2.2.  Scheme analysis

2.2.1.  MySql + ES Scheme analysis

MySql + ES Is a common storage solution , And it is widely used in many fields , For example, in the field of member or commodity information storage . The advantage of this scheme is that it can provide many query methods and different performance guarantees , It can deal with all kinds of complex business query requirements .

MySql + ES The common architecture of is that the write operation acts directly on MySql, And then through canal + Kafka Synchronize data changes to ES, Then, according to different query scenarios, from MySql perhaps ES Query data . The following figure is the possible architecture diagram under the Tencent premium code business scenario ：

As can be seen from the architecture diagram , There are several problems in this scheme ：

Data synchronization and consistency issues ：

This problem will not have an impact when the amount of data is small . But if the amount of data is 10 billion or even 100 billion, it is a very serious problem .

Data capacity issues ：

In general MySql It's best to keep the single table data below one million , If the amount of data in a single table is too large, reading and writing is a problem . So if you want to store hundreds of billions of data, you need thousands of tables , When so many sub tables need to be maintained by the business itself, it is almost impossible to develop operation and maintenance .

The question of cost ：

Data redundancy storage , Will add additional storage costs . meanwhile ES In order to ensure data reliability and query performance , Need more machines and memory . and ES There is a problem of data inflation , For the same data , It takes quite a while MySql For larger disks .

DDL Operational problems ：

MySql After the sub database distribution , because DDL Statement needs to operate on a large number of Library tables , So it's very time consuming , It's also error prone . According to our previous project experience , When there are hundreds of tables , Hundreds of thousands of data in a single table , A simple way to add fields DDL Statements also need 1 Hours or more to complete .

Development cost problem ：

This scheme requires the business to maintain its own sub database and sub table 、 Data synchronization and selecting different query engines according to requirements . Not only is the whole architecture complex , At the same time, we need to consider carefully when doing business needs , Using the wrong storage engine with little attention can lead to performance problems .

Horizontal expansion problem ：

MySql To expand the capacity of sub database and sub table, the business needs to be expanded manually rehash Relocation data , The cost is very high , Moreover, it is difficult to deal with the problem of data reading and writing in the process of capacity expansion .

2.2.2.     

MongoDB Scheme analysis

MongoDB Is a very famous distributed storage engine , Have No Schema、 High availability 、 Distributed 、 Data compression and other advantages . although MongoDB yes NoSQL Storage engine , But its Wired Tiger Storage engine and innerdb The same bottom layer uses B+ Trees , therefore MongoDB On the premise of providing distributed storage, it can provide most of the data at the same time MySql Supported query methods . therefore , In the use of MongoDB when , We don't need to MySql Redundant tables or ES To support most distributed queries . In the application scenario of Tencent youcode , be based on MongoDB The storage architecture of is shown in the figure below ：

As you can see from the diagram ,MongoDB It can avoid data synchronization and consistency problems caused by redundant storage 、 Storage costs 、 resources / Operation and maintenance / Development costs . And in further testing and Analysis MongoDB After the function and performance of , We found that MongoDB It also has the following advantages ：

nothing DDL problem ：

because MongoDB yes No Schema Of , So you can avoid MySql Of DDL problem .

The data is automatically uniform ：

MongoDB There's automatic rebalance function , When the data is unevenly distributed , Automatic relocation of data , Ensure that the load between each partition is uniform .

Lower cost ：

MongoDB Built in data compression , Under the same data ,MongoDB Less disk required .

Higher performance ：

MongoDB Maximize the use of memory , In most scenarios, it has a performance close to that of an in memory database . After testing MongoDB The single slice reading performance is about 3 ten thousand QPS.

2.3.   Scheme comparison

Through the previous analysis , Our preliminary judgment MongoDB Have better performance . Therefore, in order to further determine MongoDB The advantages of , We made a deep comparison MySql + ES And MongoDB Performance in all aspects .

2.3.1.   Storage cost comparison

MongoDB The advantages in storage are mainly reflected in two aspects ： Data compression and non redundant storage .

In order to more intuitively see the disk usage , We simulated the business scenario of Tencent youcode ,MySql + ES and MongoDB Actual storage under .

One side , stay MySql+ES Under the scheme of , In order to meet the needs, we need to make a redundant ES Data and MySql Redundant tables for . The core data of the code is stored in MySql in , The total number of disks accounts for only 38.1%. As I said before MongoDB The scheme does not need redundant storage , Therefore use MongoDB It can reduce this 61.9% Total data capacity .

On the other hand , After testing, the same code data ,MongoDB snappy The compression rate of the compression algorithm is about 3 times ,zlib The compression rate of the compression algorithm is about 6 times . therefore , Although the business chooses to ensure the stability of the system snappy Compression algorithm , but MongoDB Still just need MySql One third of the disk consumption .

2.3.4.    Development and operation cost

No data synchronization link

： Use MongoDB No data synchronization required , Therefore, there is no need to maintain canal Service and kafka queue , Greatly reduce the difficulty of development and operation and maintenance .

Labor costs and benefits ：

stay MySql+ES Every time under the framework MySql Add field changes to the cluster , All need operation and maintenance A certain man day investment , And there is a risk of business jitter , At the same time, it will affect the release progress of business iteration , Iterative release is time-consuming and risky .

Development and maintenance costs ：

MongoDB The storage architecture is simple , A storage , No data consistency pressure .

Dynamic capacity ：

MongoDB Support dynamic capacity expansion at any time , There is basically no capacity ceiling problem , and MySql The business needs to be manually expanded rehash Change data , And ensure the consistency and integrity of the data .

2.3.3.    Performance comparison

After pressure testing , alike 4C8G Under the machine configuration ,MySql and MongoDB The write performance is basically the same under a large amount of data .MySql The readability of a single slice is about 6000QPS about ,ES The performance of is only 800QPS about . and MongoDB The single slice reading performance is 3 ten thousand QPS about , Far above MySql and ES Performance of .

2.3.4.    summary

After the above analysis and comparison , It's obvious that MongoDB It has advantages in all aspects . In order to more intuitively see the differences between different schemes , Here is a list of slave functions 、 performance 、 cost 、 Scalability and maintainability, etc 5 Comparative data on three aspects ：

in summary ,MongoDB On the one hand, it fully meets the business needs , At the same time, in terms of performance 、 cost 、 Maintainability and other aspects are better than the other two schemes , Therefore, the final choice of Tencent premium code is MongoDB Storage scheme as the core data code of the business .

3.  MongoDB Fragment cluster optimization process

The retail premium code business has high cost requirements 、 Large amount of data , The real online read-write traffic is not too high ( read 3W QPS requirement ), Therefore, low specification is adopted 4C8G specifications ( Single node specification ) Fragment mode cluster deployment .

3.1.   Selection of partition cluster partition construction + Presplitting

Retail premium code data query is through code id Inquire about , So choose code id As a film , This maximizes query performance , All index queries can obtain data through the same fragment . Besides , In order to avoid the data imbalance between slices moveChunk operation , So choose hashed Fragmentation mode , At the same time, pre segmentation shall be carried out in advance ,MongoDB The default support hashed Presplitting , Take the priority code details table as an example , The pre slicing method is as follows ：

1. use db_code_xx 
2. sh.enableSharding(&quot;db_code_xx&quot;) 
3. //n Is the actual number of slices  
4. sh.shardCollection(&quot;db_code_xx.t_code_xx&quot;, {&quot;id&quot;: &quot;hashed&quot;}, false,{numInitialChunks:8192*n})

3.2.   Low peak period sliding window setting

because MongoDB The instance node specification is low (4C8G), When the partition room chunks In case of unbalanced data , Will trigger automatic balance equilibrium , Due to the low instance specification ,balance There are the following problems in the process ：

Ø  CPU Consumption is too high , The migration process even consumes 90% about CPU

Ø  Business access jitter , Time consuming increase

Ø  Slow log increase

Ø  Abnormal alarms increase

The above problems are due to balance The process goes on moveChunk The data relocation process causes , In order to quickly migrate data from one fragment to another ,MongoDB The internal will constantly move the data from one fragment to another , This will consume a lot of CPU, This causes business jitter .

MongoDB The kernel also takes into account balance The process has a certain impact on the business , So... Is supported by default balance Window settings , So we can take balance Process and business peak shift , In this way, the business jitter caused by data migration can be avoided to the greatest extent . For example, set early morning 0-6 Point low peak period balance Window settings , The corresponding commands are as follows ：

1. use config 
2. db.settings.update({&quot;_id&quot;:&quot;balancer&quot;},{&quot;$set&quot;:{&quot;activeWindow&quot;:{&quot;start&quot;:&quot;00:00&quot;,&quot;stop&quot;:&quot;06:00&quot;}}},true)

3.3.   Write majority optimization

Because the QR code data is very core , In order to avoid the risk of data loss and data regression in extreme cases , Therefore, the client adopts writeConcern={w: “majority”} To configure , Ensure that the data is written to most members of the replica set before sending a confirmation to the client .

       The concept of chain copy ： Assume that node A(primary)、B node (secondary)、C node (secondary), If B Node slave A Node synchronization data ,C Node slave B Node synchronization data , such A->B->C A chain type synchronous structure is formed between them , As shown in the figure below ：

MongoDB Multi node replica sets can support chain replication , You can obtain whether the current replica set supports chain replication through the following command ：

1. cmgo-xx:SECONDARY> rs.conf().settings.chainingAllowed 
2. true 
3. cmgo-xx:SECONDARY>

Besides , You can judge whether there is chain replication in the current replica set node by viewing the synchronization source of each node in the replica set , If the synchronization source is secondary From the node , It indicates that there is a chain copy in the replica set , See the following replica set parameters for details ：

1. cmgo-xx:SECONDARY> rs.status().syncSourceHost 
2. xx.xx.xx.xx:7021 
3. cmgo-xx:SECONDARY>

Since the business is configured to write majority , In view of performance considerations, you can turn off the chain copy function ,MongoDB You can close it with the following commands ：

1. cfg = rs.config() 
2. cfg.settings.chainingAllowed = false
3. rs.reconfig(cfg)

The benefits of chain replication ：

It can greatly reduce the synchronization of the primary node oplog The pressure of the .

Insufficient chain copy ：

When the write strategy is majority when , It takes more time to write requests

       Based on write performance considerations , When the business adopts “ Write most ” strategy , Directly turn off the chain copy function , Ensure write performance degradation caused by long write link ..

About author

CSIG Tencent youcode team ：

Tencent Youma team has been deeply involved in the retail industry for many years , We are committed to connecting consumers to channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods . Tencent Youma builds a membership link 、 Store link 、 Genuine Pintong 、 Code middle stage , Three connections and one code platform . At present, we have been drinking water 、 The beverage and food industries have relatively complete digital solutions , And serve 70+ Enterprises , Connect 150 Billion + goods , Number of code scanning people 60 Billion +.

tencent MongoDB The team ：

Tencent cloud MongoDB Currently serving the game 、 Online retailers 、 social contact 、 education 、 News and information 、 Finance 、 The Internet of things 、 Software services and other industries ;MongoDB The team ( abbreviation CMongo) Committed to open source MongoDB Kernel for in-depth research and continuous optimization ( Such as millions of Library tables 、 The physical backup 、 Unclassified 、 Audit, etc ), Provide users with high performance 、 Low cost 、 High availability secure database storage service . Continue to share MongoDB Typical application scenarios inside and outside Tencent 、 Case of stepping on a pit 、 performance optimization 、 Kernel modularity analysis .

原网站

版权声明
本文为[InfoQ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/173/202206221416173171.html