One hundred million level monthly living for the whole people K song Feed The business is in Tencent cloud MongoDB Application and Optimization Practice of
All the people K Song background development group / tencent MongoDB The team
Business background and business MongoDb scale
All the people K Song is one of the four product lines of Tencent music group , Live over 1.5 Billion , And constantly introduce new audio and entertainment functions and new playing methods , It has greatly enriched the music and entertainment activities of hundreds of millions of users .
MongoDb Natural support high availability 、 Distributed 、 High performance 、 High compression 、schema free、 Perfect client access balancing strategy and other functions . As the core Department of Tencent Music Group ,K song feed The business adopts Tencent cloud MongoDb As the primary storage service , Great convenience K Rapid iterative development of song business .
This article mainly shares K Some steps in the evolution of song technology 、 The project design 、 Performance optimization, etc , It mainly includes the following technical points :
Chapter one : Business level optimization process
1. Tencent music is popular K Song business features
Every social product , Cannot leave Feed Flow design , In the whole people K Song scene , The following main problems need to be solved :
1. We have some thousand w fans , Millions of fans , There are performance challenges of relationship chain diffusion
2. Feed There are many kinds of business , There are complex business policies to control what is important to ensure Feed Exposure
about Feed Stream data spits out , There are a wide variety of control strategies , Through these different control strategies to achieve the function :
1. Big v Exposure frequency control , Avoid brush flow behavior
2. Friends have jointly released some interactive games Feed, A merger , Avoid swiping the screen
3. Support different classifications Feed Search for
4. Users whose security issues need to be filtered out Feed
5. Real time streaming is recommended / Mixed platoon
6. Low quality Feed, The system automatically sends type Feed Do exposure frequency control
2. Reading and writing model selection

Feed The mainstream implementation models are mainly divided into 3 Kind of , These models have large-scale products in use in the industry :
1. Reading diffusion (QQ Space )
2. Write spread ( WeChat friend circle )
3. Big v Reading diffusion + Normal user write diffusion ( Sina weibo )
There is no best model , Only the right Architecture , The main thing is to weigh your own business model , Read write ratio , As well as historical burden and realization cost .
K The song uses a read diffusion model , The use of the read diffusion model is considered as follows :
1. There are many millions / Millions of fans v, Write spread is serious , Push delay is high , At the same time, the storage cost will be high
2. Low activity users , Lost user push wastes computing and storage resources
3. Security compliance related audits can lead to a large number of write proliferation
4. Write spread qps=3 x Reading diffusion qps
5. K Historical reasons for the introduction of song relationship chain , The cost of writing early is high , At the same time, it will cost a lot to change to read-write diffusion mode in the later stage .
However, the read diffusion mode has the following obvious disadvantages :
1. Turn the page to pull out all the data in front of the timeline , Performance overhead is increasing , Performance is getting worse
2. Focus on + The number of friends can reach 10000 , Implement global filtering , Plug in , Merge , The frequency control strategy is complex , Poor performance
3. Read diffusion optimization
The data stored in the read diffusion model is mainly divided into 3 large :
3.1. Optimize the background
The relationship chain reading diffusion model before optimization , Every time you pull Feed Data time , Through the chain of relationships , Time stamp , as well as Feed Index data to read and spread to build candidate result sets . Finally, according to the specific Feedid Pull Feed Details to build the results and return .
For the first screen , If a page is 10 strip , Through the chain of relationships + The latest timestamp filters out the latest 20 individual uid( Pre pull more to avoid data filtering by various business filtering and merging strategies ), Then pull each uid Abreast of the times 60 strip Feed Simple index information to build candidate sets , Build the most through various business consolidation filtering strategies 10 The latest Feedid, Pull again Feed Details build response results .
The minimum timestamp of the last returned data when turning pages basetime Bring it here , Then we need to put basetime Previous releases Feed Of uid as well as basetime Then there are the latest releases 20 individual uid To filter out , Repeat the above process of building candidate sets to output the data of this page . This implementation logic turns pages more and more slowly , Delay is unstable .
3.2. The optimization process
For the above problems , So we optimized the reading diffusion model , The optimization architecture is shown below :

We read the results of diffusion Cache Pattern , Solve the problem of turning pages more and more slowly , Complex global filtering logic .
Cahce advantage
Timeline Cache Problems to be solved ? disadvantages ?
Besides , We put Cache It is mainly divided into total quantity generation process , Incremental update process , And patching logic to solve these problems :
By caching the chain of relationships , If the relationship chain changes , Living viscera Feed Too much filtering results in Cache The volume is too small , Then the repair logic is triggered .
Final , Through these strategies , Let's have Feed Streaming systems also have some advantages over write diffusion , The main advantages are as follows :
4. Main table design
4.1. Feed Table design
Feed The design here establishes 2 Tables :
This table uses the user userid Make pianjian ,Feedid Be the only one , The core fields of the table are as follows :

This table uses uid Make a piece of health and the only health , And do ttl, The core fields of the table are as follows :

FeedCache It's a kv Stored documents ,k yes uid,value yes CacheFeedData jce The result of serialization . for fear of TTL Deleting data consumes online business performance : You can specify an expiration time when writing data . The expiration time is directly configured as the business low peak period
4.2. Account relationship table design
Focusing on the relationship chain usually involves two dimensions of data :
A concern , A fan ( A focus action will generate two dimensions of data ).
Generally, there is not much attention , Usually only a few thousand at most , It is often pulled out completely , This can be stored as kv The way ( High performance can consider memory database or cache).
Attention is to use Redis Stored , One key Corresponding value It's the top RightCache Of this structure jce The result of serialization .


Fans are a long list ( Millions, even tens of millions ), It is usually displayed in a list , Storage and mongodb in , To the user id For pianjian , Each fan as a separate doc, The use of memory type storage memory fragments is relatively high , Memory costs are high . Attention and fan data can be finally consistent using message queues .
The fan data is based on MongoDb Document storage , It mainly contains the following fields :opuid,fuid,realtiontype,time.
Chapter two :MongoDb Use layer optimization
The business mongodb The deployment architecture is as follows :

K Song business MongoDb Architecture diagram : The client passes Tencent cloud VIP Forward to proxy mongos layer , agent mongos After receiving the request , from config server( Store routing information , Not shown in the architecture diagram ) Get route information , Then get the forwarding rules according to the routing information , Finally, forward the request to the corresponding storage layer partition .
In the process of business online development , Find out MongoDb The use of some unreasonable , By optimizing these unreasonable ways of use , Improved access MongoDb Performance of , Finally, the whole Feed Streaming system user experience .K Song business MongoDb The main optimization points are as follows :
1. Optimal slice construction and slice mode selection
Information flow business mentioned earlier Feed Details 、 The fan list is stored in MongoDb in , Both tables use user userId To build in pieces , The partition method is hashed Fragmentation , And pre slicing in advance :
sh.shardCollection("xx.follower", {userId:"hashed"}, false, { numInitialChunks: 8192* Subdivision number } )
sh.shardCollection("xx.FeedInfo", {FeedId:"hashed"}, false, { numInitialChunks: 8192* Subdivision number } )
Select... For the two tables respectively FeedId and userId Build a movie , And adopt hashed Fragmentation mode , At the same time, pre partition the table in advance , It is mainly based on the following aspects :
By pre slicing and adopting hashed Fragmentation mode , Data can be written to different partitions in a balanced way , Avoid data asymmetry moveChunk operation , The storage capacity of each partition is fully utilized , Maximize write performance .
adopt FeedId Query a certain item Feed Details and adoption userId Query the fan list information of the user , As a result of hashed Fragmentation mode , The same Id Value corresponding hash The calculated value will fall in the same shard Fragmentation , This ensures the highest efficiency of the entire query .
explain :
Because all queries are specified id Types of queries , Therefore, it can be guaranteed from the same shard Reading data , Maximize read performance . however , If the query is for example FeedId Class , for example db.FeedInfo.find({FeedId:{$gt: 1000,$lt:2000}}), This kind of scene is not suitable for hashed Fragmentation mode , Because of satisfaction {$gt: 1000} There may be many pieces of conditional data , adopt hash After calculation , The data will be hashed to multiple slices , This kind of scene range segmentation is better , Data in a range may fall into the same partition . therefore , Selection of partition cluster partition construction 、 Sharding plays a very important role in the read and write performance of the whole cluster , It needs to be selected according to the actual situation of the business .
K song feed Business is based on feedId、userId The query , No range query exists , So choose hash The pre slicing method is used to set the slice creation , This maximizes query promotion 、 Write function .
2. Query how to optimize without slice construction
The last section mentioned , Query if you bring a piece to build , It can ensure that the data falls in the same shard, This maximizes read performance . however , In the actual business scenario , One business accesses the same table , Some requests can be carried with the build field , Some queries are not built , This part of the query without slice building needs to be broadcast to multiple shard, then mongos After aggregation, return to the client , This kind of query without slice construction is more efficient than that from the same shard Data acquisition performance will be much worse .
If the cluster has a large number of partitions , A query without slicing SQL The frequency is very high , To improve query performance , This problem can be avoided and solved by establishing an auxiliary index table . With Feed Take the details table as an example , This tab is created as a user userId, If users want to see all of their published Feed, The query criteria only need to be accompanied by userId that will do .
however , if necessary FeedId Gets or specifies a Feed You need to query the broadcast operation , because Feed The detail sheet is built as userId, At this time, the performance will be affected . Query without slice construction not only affects query performance , It also increases the system load of each partition , Therefore, you can add auxiliary index tables ( Assume table name :FeedId_userId_relationship) To solve the problem . Each... In the auxiliary table doc The document mainly contains 2 A field :
This field and the... Of the detail table FeedId Agreement , Represents a specific article Feed details .
This field and detail table userId Agreement , On behalf of FeedId The corresponding one Feed Details are provided by the user launch .
FeedId_userId_relationship Auxiliary table adopts FeedId As a piece of construction , The pre slicing function mentioned above is also adopted , The table and Feed The implicit relation of the detail table is as follows :

Pictured above , Through some FeedId Specific inquiry Feed, First of all, according to the FeedId Look up the... From the secondary index table FeedId Corresponding userId, Then, according to the query results userId+FeedId To get the corresponding details . The whole query process needs to look up two tables , The query statement is as follows :
1. //FeedId_userId_relationship The table is built into pieces FeedId, advance hashed Presplitting
2. db.FeedId_userId_relationship.find({“FeedId”: “375”}, {userId:1}) // Suppose you return userId by ”3567”
3. //FeedInfo The table is built into pieces userId, advance hashed Presplitting
4. db.FeedInfo.find({“userId”: “3567”})
Above , By introducing a secondary index table , Finally, the problem of cross partition broadcasting is solved . The introduction of auxiliary tables will increase the storage cost , At the same time, a supplementary query will be added , Generally, it is only in slices shard More , And queries without slice construction are used more frequently .
3. count Slow operation optimization
Mentioned earlier , The fan relationship table exists MongoDb in , Each data mainly contains several fields , Each fan of the user corresponds to one MongoDb Document data , The corresponding data contents are as follows :
1. { "_id" : ObjectId("6176647d2b18266890bb7c63"), "userid" : “345”, "follow_userid" : “3333”, "realtiontype" : 3, "follow_time" : ISODate("2017-06-12T11:26:26Z") }
Each fan of a user corresponds to a piece of data , If you need to find out how many fans a user has , The following query is used to obtain ( For example, find users id by ”345” The total number of fans of the user ):
db.fans.count({"userid" : “345”})
The corresponding execution plan of this query is as follows :
1. {
2. "executionSuccess" : true,
3. "nReturned" : 0,
4. "executionTimeMillis" : 0,
5. "totalKeysExamined" : 156783,
6. "totalDocsExamined" : 0,
7. "executionStages" : {
8. "stage" : "COUNT",
9. "nReturned" : 0,
10. ......
11. "nSkipped" : 0,
12. "inputStage" : {
13. "stage" : "COUNT_SCAN",
14. ......
15. }
16. },
17. "allPlansExecution" : [ ]
18. }
And other relational databases ( for example mysql) similar , As can be seen from the execution plan above , Find a condition for a table count, In the case of optimal indexing , Its speed is mainly proportional to the amount of data that meets the conditions . For example, if the user has more fans , Then the scanned keys( That is, the index table ) The more , So its query will be slower .
From the above analysis, we can see that , If a user has many fans , Then count Performance will be slow . therefore , We can use an idempotent count to store the total number of fans and followers , This data access volume is relatively high , You can use high-performance storage , for example Redis To store . Idempotency can be calculated using Redis Of lua Script to ensure .
Optimization method : The number of fans is one Redis Of key, use lua Script execution ( Count key incrby Operation and opuid_touid_op do key Of setnx expire) To complete idempotency computation .
4. Write most optimizations
Write data can be selected according to the data reliability of the service writeConcern Strategy :
{w: 0} : Writing to the client does not need to send any acknowledgments . scene : High performance requirements ; Not concerned about data integrity
{w: 1}: default writeConcern, Data written to Primary Send the confirmation to the client . scene : Take into account the performance and a certain degree of data reliability .
{w: “majority”}: After data is written to most members of the replica set, send confirmation to the client . scene : Data integrity requirements are relatively high 、 Avoid data rollback scenarios , This option reduces write performance
For scenarios with high data reliability requirements, it is often used {j: true} Option to ensure that when writing journal After the log is persisted, it is returned to the client for confirmation . Scenarios with high data reliability will reduce write performance , stay K song Feed In the early stage of business use, you will find that the write delay is not stable in most scenarios , This is the case with core businesses , from 5ms To 1s shake . Locate through analysis , We found that it is caused by the chain copy from write time to policy .
The concept of chain copy : Assume that node A(primary)、B node (secondary)、C node (secondary), If B Node slave A Node synchronization data ,C Node slave B Node synchronization data , such A->B->C A chain type synchronous structure is formed between them , As shown in the figure below :


mongodb Multi node replica sets can support chain replication , You can obtain whether the current replica set supports chain replication through the following command :
1. cmgo-xx:SECONDARY> rs.conf().settings.chainingAllowed
2. true
3. cmgo-xx:SECONDARY>
Besides , You can judge whether there is chain replication in the current replica set node by viewing the synchronization source of each node in the replica set , If the synchronization source is secondary From the node , It indicates that there is a chain copy in the replica set , See the following replica set parameters for details :
1. cmgo-xx:SECONDARY> rs.status().syncSourceHost
2. xx.xx.xx.xx:7021
3. cmgo-xx:SECONDARY>
Since the business is configured to write majority , In view of performance considerations, you can turn off the chain copy function ,mongodb You can close it with the following commands :
1. cfg = rs.config()
2. cfg.settings.chainingAllowed = false
3. rs.reconfig(cfg)
The benefits of chain replication :
It can greatly reduce the synchronization of the primary node oplog The pressure of the .
Insufficient chain copy :
When the write strategy is majority when , It takes more time to write requests
When the business adopts “ Write most ” strategy , Also turn off chain copy accordingly ; Avoid writing requests that take longer . After we turn off the chain copy, the overall write delay document is 10ms within .
5. Massive qps Business jitter optimization
In some core clusters , We find that there are sometimes more slow queries during peak periods , Service jitter , The appearance of jitter seems to be due to individual CPU Caused by soaring , By analyzing the specific height CPU The thread of , as well as perf Performance analysis specific functions , We find that there are mainly two problems :
1、 The number of connections rises sharply during peak hours , The connection authentication overhead is too high , As a result of CPU soaring
2、WT Storage engine cache The utilization rate and dirty data ratio are too high ,MongoDb The user thread is blocked for dirty data cleaning , Finally, the business side jitters .
In order to optimize these two problems , We optimize MongoDb To solve the problem
1、MongoDb The upper and lower limits of the connection pool are consistent , Reduce the overhead of establishing a connection
2、 Trigger memory cleanup ahead of time eviction_target=60 , The trigger value of user thread participating in memory cleanup has been increased to 97%:eviction_trigger=97, Add more cleanup threads :evict.threads_max:20, So as to reduce the slow queries during the peak period 150k/min=>20k/min, The service stability has also been improved .
The optimized effect is shown in the figure :

6. Data backup process business jitter optimization

Tencent cloud MongoDb By default, full backup and incremental backup of cluster data will be performed regularly in the early morning , And support default 7 Return to file at any time point within days . however , As the amount of cluster data increases , Currently, the amount of data in this cluster is quite large , The cluster jitters regularly in the early morning , The main phenomena are as follows :
1. Increased access latency
2. Slow log increase
3. CPU Increased usage
Through analysis , The problem found is consistent with the data backup time , During physical backup and logical backup, data backup is required for the whole instance , The system resource load increases , Finally, it will affect the business query service .
How to optimize : Hide nodes during data backup , Make sure that the node is not visible to the client .
About author
All the people K Song background development group :
ctychen,ianxiong
Tencent cloud mongodb:
Tencent cloud MongoDB Currently serving the game 、 Online retailers 、 social contact 、 education 、 News and information 、 Finance 、 The Internet of things 、 Software services and other industries ;MongoDB The team ( abbreviation CMongo) Committed to open source MongoDb Kernel for in-depth research and continuous optimization ( Such as millions of Library tables 、 The physical backup 、 Unclassified 、 Audit, etc ), Provide users with high performance 、 Low cost 、 High availability secure database storage service . Continue to share MongoDb Typical application scenarios inside and outside Tencent 、 Case of stepping on a pit 、 performance optimization 、 Kernel modularity analysis .
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206262127481269.html