当前位置:网站首页>Application of mongodb in Tencent retail premium code
Application of mongodb in Tencent retail premium code
2022-06-28 15:18:00 【Yang Yaya focuses on mongodb and high-performance Middleware】
MongoDB Application in Tencent retail premium code
CSIG Tencent youcode team / Tencent cloud MongoDB The team
This article mainly shares the excellent code business of Tencent smart retail team in MongoDB Application in , use MongoDB As the primary storage service, it brings great benefits to the business , It mainly includes : High performance 、 shortcut DDL operation 、 Low storage costs 、 Huge storage capacity and other benefits , Greatly reduce business storage costs , And improve the efficiency of business iterative development .
- Business scenario
Tencent Youma from connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods , Including the upgrading of marketing ability and dynamic marketing ability . Tencent premium code is provided by genuine Pintong 、 Store link and member link are composed of three sub products . For more information, please visit the official website of Tencent Youma :https://uma.qq.com/
Overall view of Tencent youcode :
1.1. Genuine Pintong
Tencent youcode genuine Pintong provides anti-counterfeiting and authentication capabilities , Realize the whole process genuine product traceability of one object and one code , The full link data is stored in the blockchain , Ensure authenticity ; More direct access to the private domain of the brand , Realize the further transformation of traffic ; At the same time, genuine Pintong provides brand protection ability in wechat domain , Block the spread of brand counterfeiting websites 、 Help consumers identify counterfeit goods .
The product mainly contains the following core features :
1.2. Store link
Tencent Youma Diantong is a service brand 、 Distributor 、 The core roles of industry representatives and the four retail links of terminal stores realize the upgrading of sales management means and sales promotion based on terminal sales stores .
The product mainly contains the following core features :
1.3. Members to pass
Tencent Youma membership is provided for retail brands SaaS + Products with customized services , Take scanning code as the starting point , Connect online and offline scenes . Provide rich code scanning / Interactive activity model 、 The activity evaluation system helps brands connect consumers .
The product mainly contains the following core features :
2. Code storage
Tencent smart retail premium code business stores QR code information of retail goods , This information is the core data information of smart retail , Provide “ From connecting consumers to connecting channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods ” Related services . Therefore, code data storage is the core issue of the project .
2.1. Needs and solutions
To solve the problem of code storage , First, we need to analyze the characteristics of code storage . After analysis, the main characteristics of code storage problem are :
- Huge amounts of data : The commodity QR code made by Tencent Youma , As more and more goods use Tencent youcode business , QR code data began to show exponential growth .
- Associative storage : There is... Between codes 1:1 and 1:N:N Correlation relation of , You need to store this relationship , And provide corresponding association query .
- Multidimensional query : Different dimensions of conditional queries are required for different application scenarios .
After obtaining the code storage characteristics , After many investigations and investigations , Preliminarily selected 2 Storage scheme :
MySql + ES:MySql Sub database and sub table store symbol data , Provide reading and writing scenarios that require high performance ; Then synchronize some data according to the demand ES To deal with various complex query scenarios .
MongoDB:MongoDB It is the highest ranked distributed storage engine in the world , Its core feature is No Schema、 Highly available and distributed , Ideal for distributed storage .
2.2 Scheme analysis
2.2.1. MySql + ES Scheme analysis
MySql + ES Is a common storage solution , And it is widely used in many fields , For example, in the field of member or commodity information storage . The advantage of this scheme is that it can provide many query methods and different performance guarantees , It can deal with all kinds of complex business query requirements .
MySql + ES The common architecture of is that the write operation acts directly on MySql, And then through canal + Kafka Synchronize data changes to ES, Then, according to different query scenarios, from MySql perhaps ES Query data . The following figure is the possible architecture diagram under the Tencent premium code business scenario :
As can be seen from the architecture diagram , There are several problems in this scheme :
Data synchronization and consistency issues : This problem will not have an impact when the amount of data is small . But if the amount of data is 10 billion or even 100 billion, it is a very serious problem .
Data capacity issues : In general MySql It's best to keep the single table data below one million , If the amount of data in a single table is too large, reading and writing is a problem . So if you want to store hundreds of billions of data, you need thousands of tables , When so many sub tables need to be maintained by the business itself, it is almost impossible to develop operation and maintenance .
The question of cost : Data redundancy storage , Will add additional storage costs . meanwhile ES In order to ensure data reliability and query performance , Need more machines and memory . and ES There is a problem of data inflation , For the same data , It takes quite a while MySql For larger disks .
DDL Operational problems :MySql After the sub database distribution , because DDL Statement needs to operate on a large number of Library tables , So it's very time consuming , It's also error prone . According to our previous project experience , When there are hundreds of tables , Hundreds of thousands of data in a single table , A simple way to add fields DDL Statements also need 1 Hours or more to complete .
Development cost problem : This scheme requires the business to maintain its own sub database and sub table 、 Data synchronization and selecting different query engines according to requirements . Not only is the whole architecture complex , At the same time, we need to consider carefully when doing business needs , Using the wrong storage engine with little attention can lead to performance problems .
Horizontal expansion problem :MySql To expand the capacity of sub database and sub table, the business needs to be expanded manually rehash Relocation data , The cost is very high , Moreover, it is difficult to deal with the problem of data reading and writing in the process of capacity expansion .
2.2.2. MongoDB Scheme analysis
MongoDB Is a very famous distributed storage engine , Have No Schema、 High availability 、 Distributed 、 Data compression and other advantages . although MongoDB yes NoSQL Storage engine , But its Wired Tiger Storage engine and innerdb The same bottom layer uses B + Trees , therefore MongoDB On the premise of providing distributed storage, it can provide most of the data at the same time MySql Supported query methods . therefore , In the use of MongoDB when , We don't need to MySql Redundant tables or ES To support most distributed queries . In the application scenario of Tencent youcode , be based on MongoDB The storage architecture of is shown in the figure below :
As you can see from the diagram ,MongoDB It can avoid data synchronization and consistency problems caused by redundant storage 、 Storage costs 、 resources / Operation and maintenance / Development costs . And in further testing and Analysis MongoDB After the function and performance of , We found that MongoDB It also has the following advantages :
- nothing DDL problem : because MongoDB yes No Schema Of , So you can avoid MySql Of DDL problem .
- The data is automatically uniform :MongoDB There's automatic rebalance function , When the data is unevenly distributed , Automatic relocation of data , Ensure that the load between each partition is uniform .
- Lower cost :MongoDB Built in data compression , Under the same data ,MongoDB Less disk required .
- Higher performance :MongoDB Maximize the use of memory , In most scenarios, it has a performance close to that of an in memory database . After testing MongoDB The single slice reading performance is about 3 ten thousand QPS.
- More ways to read and write : although MongoDB No, ES The inverted index of , The query method supported is slightly inferior to ES. however ,MongoDB In possession of most ES At the same time , Its performance is much higher than ES; And relative MySql Come on MongoDB The field type of supports embedded objects and array objects , Therefore, it can meet more reading and writing needs .
2.3. Scheme comparison
Through the previous analysis , Our preliminary judgment MongoDB Have better performance . Therefore, in order to further determine MongoDB The advantages of , We made a deep comparison MySql + ES And MongoDB Performance in all aspects .
2.3.1. Storage cost comparison
MongoDB The advantages in storage are mainly reflected in two aspects : Data compression and non redundant storage .
In order to more intuitively see the disk usage , We simulated the business scenario of Tencent youcode ,MySql + ES and MongoDB Actual storage under .
One side , stay MySql+ES Under the scheme of , In order to meet the needs, we need to make a redundant ES Data and MySql Redundant tables for . The core data of the code is stored in MySql in , The total number of disks accounts for only 38.1%. As I said before MongoDB The scheme does not need redundant storage , Therefore use MongoDB It can reduce this 61.9% Total data capacity .
On the other hand , After testing, the same code data ,MongoDB snappy The compression rate of the compression algorithm is about 3 times ,zlib The compression rate of the compression algorithm is about 6 times . therefore , Although the business chooses to ensure the stability of the system snappy Compression algorithm , but MongoDB Still just need MySql One third of the disk consumption .
2.3.2. Development and operation cost
- No data synchronization link : Use MongoDB No data synchronization required , Therefore, there is no need to maintain canal Service and kafka queue , Greatly reduce the difficulty of development and operation and maintenance .
- Labor costs and benefits : stay MySql+ES Every time under the framework MySql Add field changes to the cluster , All need operation and maintenance A certain man day investment , And there is a risk of business jitter , At the same time, it will affect the release progress of business iteration , Iterative release is time-consuming and risky .
- Development and maintenance costs :MongoDB The storage architecture is simple , A storage , No data consistency pressure .
- Dynamic capacity :MongoDB Support dynamic capacity expansion at any time , There is basically no capacity ceiling problem , and MySql The business needs to be manually expanded rehash Change data , And ensure the consistency and integrity of the data .
2.3.2. Performance comparison
After pressure testing , alike 4C8G Under the machine configuration ,MySql and MongoDB The write performance is basically the same under a large amount of data .MySql The readability of a single slice is about 6000QPS about ,ES The performance of is only 800QPS about . and MongoDB The single slice reading performance is 3 ten thousand QPS about , Far above MySql and ES Performance of .
2.3.4. summary
After the above analysis and comparison , It's obvious that MongoDB It has advantages in all aspects . In order to more intuitively see the differences between different schemes , Here is a list of slave functions 、 performance 、 cost 、 Scalability and maintainability, etc 5 Comparative data on three aspects :
in summary ,MongoDB On the one hand, it fully meets the business needs , At the same time, in terms of performance 、 cost 、 Maintainability and other aspects are better than the other two schemes , Therefore, the final choice of Tencent premium code is MongoDB Storage scheme as the core data code of the business .
3. MongoDB Fragment cluster optimization process
The retail premium code business has high cost requirements 、 Large amount of data , The real online read-write traffic is not too high ( read 3W QPS requirement ), Therefore, low specification is adopted 4C8G specifications ( Single node specification ) Fragment mode cluster deployment .
3.1. Selection of partition cluster partition construction + Presplitting
Retail premium code data query is through code id Inquire about , So choose code id As a film , This maximizes query performance , All index queries can obtain data through the same fragment . Besides , In order to avoid the data imbalance between slices moveChunk operation , So choose hashed Fragmentation mode , At the same time, pre segmentation shall be carried out in advance ,MongoDB The default support hashed Presplitting , Take the priority code details table as an example , The pre slicing method is as follows :
1. use db_code_xx
2. sh.enableSharding("db_code_xx")
3. //n Is the actual number of slices
4. sh.shardCollection("db_code_xx.t_code_xx", {"id": "hashed"}, false,{numInitialChunks:8192*n})
3.2. Low peak period sliding window setting
because MongoDB The instance node specification is low (4C8G), When the partition room chunks In case of unbalanced data , Will trigger automatic balance equilibrium , Due to the low instance specification ,balance There are the following problems in the process :
- CPU Consumption is too high , The migration process even consumes 90% about CPU
- Business access jitter , Time consuming increase
- Slow log increase
- Abnormal alarms increase
The above problems are due to balance The process goes on moveChunk The data relocation process causes , In order to quickly migrate data from one fragment to another ,MongoDB The internal will constantly move the data from one fragment to another , This will consume a lot of CPU, This causes business jitter .
MongoDB The kernel also takes into account balance The process has a certain impact on the business , So... Is supported by default balance Window settings , So we can take balance Process and business peak shift , In this way, the business jitter caused by data migration can be avoided to the greatest extent . For example, set early morning 0-6 Point low peak period balance Window settings , The corresponding commands are as follows :
1. use config
2. db.settings.update({"_id":"balancer"},{"$set":{"activeWindow":{"start":"00:00","stop":"06:00"}}},true)
3.3. Write majority optimization
Because the QR code data is very core , In order to avoid the risk of data loss and data regression in extreme cases , Therefore, the client adopts writeConcern={w: “majority”} To configure , Ensure that the data is written to most members of the replica set before sending a confirmation to the client .
The concept of chain copy : Assume that node A (primary)、B node (secondary)、C node (secondary), If B Node slave A Node synchronization data ,C Node slave B Node synchronization data , such A->B->C A chain type synchronous structure is formed between them , As shown in the figure below :
MongoDB Multi node replica sets can support chain replication , You can obtain whether the current replica set supports chain replication through the following command :
1. cmgo-xx:SECONDARY> rs.conf().settings.chainingAllowed
2. true
3. cmgo-xx:SECONDARY>
Besides , You can judge whether there is chain replication in the current replica set node by viewing the synchronization source of each node in the replica set , If the synchronization source is secondary From the node , It indicates that there is a chain copy in the replica set , See the following replica set parameters for details :
1. cmgo-xx:SECONDARY> rs.status().syncSourceHost
2. xx.xx.xx.xx:7021
3. cmgo-xx:SECONDARY>
Since the business is configured to write majority , In view of performance considerations, you can turn off the chain copy function ,MongoDB You can close it with the following commands :
1. cfg = rs.config()
2. cfg.settings.chainingAllowed = false
3. rs.reconfig(cfg)
The benefits of chain replication : It can greatly reduce the synchronization of the primary node oplog The pressure of the .
Insufficient chain copy : When the write strategy is majority when , It takes more time to write requests
Based on write performance considerations , When the business adopts “ Write most ” strategy , Directly turn off the chain copy function , Ensure write performance degradation caused by long write link ..
About author
CSIG Tencent youcode team :
Tencent Youma team has been deeply involved in the retail industry for many years , We are committed to connecting consumers to channel terminals , Realize the digital upgrading of enterprises based on the digitization of goods . Tencent Youma builds a membership link 、 Store link 、 Genuine Pintong 、 Code middle stage , Three connections and one code platform . At present, we have been drinking water 、 The beverage and food industries have relatively complete digital solutions , And serve 70 + Enterprises , Connect 150 Billion + goods , Number of code scanning people 60 Billion +.
tencent MongoDB The team :
Tencent cloud MongoDB Currently serving the game 、 Online retailers 、 social contact 、 education 、 News and information 、 Finance 、 The Internet of things 、 Software services and other industries ;MongoDB The team ( abbreviation CMongo) Committed to open source MongoDB Kernel for in-depth research and continuous optimization ( Such as millions of Library tables 、 The physical backup 、 Unclassified 、 Audit, etc ), Provide users with high performance 、 Low cost 、 High availability secure database storage service . Continue to share MongoDB Typical application scenarios inside and outside Tencent 、 Case of stepping on a pit 、 performance optimization 、 Kernel modularity analysis .
边栏推荐
- MIPS assembly language learning-03-cycle
- Leetcode 705. Design hash collection
- Fleet |「後臺探秘」第 3 期:狀態管理
- C#/VB. Net to convert PDF to excel
- 【算法篇】刷了两道大厂面试题,含泪 ”重学数组“
- 5000倍回报,南非报业投资腾讯赚了一个省
- MIPS assembly language learning -02- logic judgment - foreground input
- Is PMP really useful?
- Classmate Zhang hasn't learned to be an anchor yet
- Facebook! Adaptive gradient defeats manual parameter adjustment
猜你喜欢
厨卫电器行业S2B2C系统网站解决方案:打造S2B2C平台全渠道商业系统
Gbase Nantah General Motors appears at the 6th World Intelligence Conference
信创操作系统--麒麟Kylin桌面操作系统 (项目十 安全中心)
蔚来潜藏的危机:过去、现在到未来
The latest pycharm activation cracking code in 2022 is permanent_ Detailed installation tutorial (applicable to multiple versions)
币圈大地震:去年赚100万,今年亏500万
Curve 替换 Ceph 在网易云音乐的实践
QQ被盗号后群发黄图,大批用户“社死”
How to build a 100000 level QPS large flow and high concurrency coupon system from zero
3. caller service call - dapr
随机推荐
How to solve the following problems in the Seata database?
ORACLE中dbms_output.put_line输出问题的解决过程
Case driven: a detailed guide from getting started to mastering shell programming
Innovation and upgrading of supply chain system driven management mode in petrochemical industry and strengthening internal management of enterprises
石油化工行业供应链系统驱动管理模式创新升级,强化企业内部管理
不要使用短路逻辑编写 stl sorter 多条件比较
叮!Techo Day 腾讯技术开放日如约而至!
论文解读(GCC)《Efficient Graph Convolution for Joint Node RepresentationLearning and Clustering》
Fleet | background Discovery issue 3: Status Management
[C language] how to generate normal or Gaussian random numbers
R language ggplot2 visualization: use the patchwork package (directly use the plus sign +) to horizontally combine a ggplot2 visualization result and a piece of text content to form a final result gra
[C language] implementation of binary tree and three Traversals
使用Karmada实现Helm应用的跨集群部署
Talking about open source - Linus and Jim talk about open source in China
浪擎与浪潮,一个从OEM到价值共生的生态样板
如何从零搭建10万级 QPS 大流量、高并发优惠券系统
Longest continuous sequence
Send2vec tutorial
R语言ggplot2可视化:使用patchwork包(直接使用加号+)将两个ggplot2可视化结果横向组合起来形成单个可视化结果图
R language ggplot2 visualization: use the patchwork package (directly use the plus sign +) to horizontally combine a ggplot2 visualization result and a plot function visualization result to form a fin