当前位置：网站首页>Application of tidb in Netease games

Application of tidb in Netease games

2022-06-24 08:37:00 【PingCAP】

The authors introduce ： Li Wenjie , Netease mutual entertainment Senior Database Engineer ,TUG 2019 The year and 2020 year MVA. Mainly responsible for big data research and development and data analysis , Provide refined operation guidance for products ; At the same time, promote the use of TiDB, Accumulate experience and explore the best solution for cloud and database distributed business , At present, it is TiDB Head of management team .

This paper is compiled from TUG Netease online business activities , Shared by Mr. Li Wenjie, Senior Database Engineer of Netease games , This paper mainly introduces distributed database TiDB Practical experience in the application of Netease games .

Netease game first introduced TiDB It's from AP From the perspective of the use of . In the first use TiDB when , Let's move the task of running batch business, which requires a lot of computation, to TiDB above . During the migration , If the amount of running is large , I believe many people will meet “transaction too large” The problem of reporting a mistake .

TiDB Transaction restrictions

After some investigation, we found that , Because distributed transactions need to be committed in two phases , And the bottom layer needs to do Raft Copy , If a transaction is very large , It makes the submission process very slow , And it'll get stuck underneath Raft Copy the process . In order to avoid jamming the system ,TiDB Limits the size of the transaction , The content of specific restriction has single transaction SQL Number of statements 、KV Number and size of key value pairs 、 single KV Key value pairs, sizes, etc .

Knowing the limit , We can find a solution , namely According to the needs of the business, the large transaction is divided into several small transactions and executed in batches , That's what I failed to run before SQL It's going to work , And in MySQL/Oracle Run batch program in , And they all moved to TiDB.

meanwhile , We have to think about , When there are no problems , The program can run very smoothly , But there are network problems in the crash room , Or when something else goes wrong , Some data will be written to TiDB in , And the other part of the data is not written . The performance in the scenario is that the execution of a transaction does not guarantee its atomicity , Only part of it has been carried out , Part of the success , Some of them failed .

After investigation, it was found that , This is because we manually turn on transaction segmentation , In this case, the atomicity of large transactions cannot be guaranteed , Only the atomicity of small transactions in each batch can be guaranteed , From the overall point of view of the whole mission , The data is inconsistent .

So how to solve this problem ？

TiDB Big business optimization

After feedback to the official ,TiDB 4.0 Deep optimization of big business , Not only have some restrictions been lifted , And the single transaction size is reduced from 100MB Relax the restrictions directly to 10GB, Directly optimized 100 times . But at the same time, it brings another problem , It's going on t+1 When we run batch business , There may be hundreds or even tens of millions of data the day before , If you use a program JDBC+TiDB Method handling , It's not efficient , The processing time often takes several hours , Even more than dozens of hours .

So how to improve the overall throughput of computing tasks ？ The answer is TiSpark.

TiSpark： Efficient handling of complex OLAP Calculation

TiSpark Is in Spark Based on the development of a plug-in , It can efficiently get from TiKV Reading data . At the same time, it supports index lookup and calculation push down strategy , High query performance . We found in practice that , Use TiSpark Read TiKV The way ,5 It can be done in minutes 2 Hundred million lines of data reading . It also has high write performance . With TiSpark, We can go straight through Spark Tool access TiKV data . Time has proved ,TiSpark read 、 Write TiKV They all have very good performance , adopt TiSpark We can deal with more complicated 、 Operations with a large amount of data .

TiSpark practice

In practice , Use TiSpark There are two main ways ：

Mode one ：TiSpark + JDBC write in

TiSpark + JDBC Write mode can automatically segment large transactions , But it doesn't necessarily guarantee the atomicity and isolation of transactions , In addition, manual intervention is needed for fault recovery . In this way, the write speed can reach 180 Line ten thousand /min, adopt TiDB Handle SQL Write again TiKV, Average speed .

Mode two ：TiSpark Batch write TiKV

TiSpark Batch write TiKV It doesn't automatically slice big transactions . adopt TiSpark Direct reading and writing TiKV, It's equivalent to reading and writing directly through large transactions TiKV The data of , It can guarantee the atomicity and isolation of transactions , At the same time, it has good write performance , Write speed can reach 300 Line ten thousand /min. after TiSpark Application , It solves the problem of our massive batch processing task , But there are also some hidden dangers .TiSpark Reading 、 Write TiKV When , because TiKV It's as a whole TiDB Architecture storage engine , If the data reading and writing pressure of the storage engine layer is high , It will have a significant impact on other online businesses . Besides , stay TiSpark Reading and writing TiKV when , If there is no right IO Limit , It's easy to cause performance jitters , This leads to an increase in access latency , It will also have an impact on other online businesses .

How can we achieve effective isolation ? Maybe TiFlash The columnar storage engine provides the answer .

TiFlash： Column storage engine

TiFlash As TiKV A supplement to the line storage engine , It is TiKV Data raft copy ,TiFlash As in TiKV Based on the column copy , after raft The protocol ensures the consistency and integrity of data synchronization . In this way, the same data can be stored in two storage engines .TiKV What we save is row data ,TiFlash It holds column data .

Doing it Spark In the calculation and analysis of , We can directly from TiFlash The cluster reads , Computational efficiency will be very high . Do... With column data AP analysis , For line data , It's a dimension reduction hit .

TiFlash：TPC-H Performance analysis

TiSpark + TiFlash The combination of the two methods improves the computational efficiency in both quantity and quality .TPC-H Performance analysis shows that , With the TiKV In the horizontal comparison of , Almost all query scenario TiFlash Execution efficiency is higher than TiKV, And the efficiency of some scenes is much higher than that of others TiKV . It was used TiFlash in the future , It doesn't affect TiKV Cluster performance of , It does not affect the offline cluster business , And when doing offline big data analysis , Still able to maintain good performance and throughput .

It has been proved by practice that ,TiFlash It can solve many of our problems , It's a great tool .

TiFlash application ： Computing is more efficient

In the calculation scenario of some indicators of Netease game user portraits , On use TiSpark + TiFlash after , Different business content SQL Compared with the processing speed TiSpark + TiKV Fast, at least 4 times . So use TiFlash after , The efficiency of offline batch processing has been improved qualitatively .

JSpark： Cross source offline computing

With the increase of business scale and application scenarios , Different data is distributed and stored in different storage engines . For example, log data is stored in Hive, The database data is stored in TiDB, Access across data sources requires a lot of data migration , Time consuming and laborious . Can we get through to different data sources directly , Achieve cross source visits ？ To solve this problem, Netease game adopts JSpark Tools .JSaprk It's to get through the underlying storage , An offline computing tool to achieve cross source access goals . The core of this tool is TiSpark + Spark Components ,Spark As a bridge , It can access different data sources .

JSpark be based on TiSpark and JDBC encapsulate , stay TiKV It can read and write data , stay TiFlash It can be expressed AP Calculation , stay TiDB You can do regular SQL Calculation , So far we've implemented TiDB and Hive Read and write to each other , follow-up JSpark The tools will support TiDB And ES Reading and writing visits to each other , Realization TiDB、Hive、ES Multi source data access .

at present JSpark Tools , It mainly realizes the following functions ：

Support TiSpark+JDBC Mode reading and writing TiDB And, speaking, reading and writing Hive, The efficiency of this way is average .
- Application scenarios ： stay TiDB The wide table only operates some columns needed by the business .
Support reading TiDB Table data ,Spark The calculation results are written into Hive Target table . Read recommended use TiSpark Read TiKV or TiFlash The way , Write recommended TiSpark write in TiKV The way , More efficient .
- Application scenarios ： Rotate regularly TiDB Partition table expired partition , Backup keeps copies permanently to Hive, avoid TiDB Watch is too big .
Support reading Hive Table data ,Spark The calculation results are written into TiDB Target table . Recommended TiSpark write in TiKV The way , Efficient .
- Application scenarios ： analysis Hive Data output user profile indicators and write online TiDB, Providing online business TP Inquire about . Another practical scenario is recovery Hive Backup to TiDB.
Support front end web Or business initiation http request , Remote start Spark Homework , Finish the ground floor TiDB and Hive Joint query for .
- Application scenarios ： Front end management platform click the query button , Get a player Hive Link logs and TiDB The result of joint aggregation of data , Extract relevant behavioral data .

We're developing and using JSpark During the relevant functions , And I found out TiSpark An optimizable point of .

at present TiSpark Simultaneous access to multiple TiDB data source , Only one... Can be registered at runtime TiDB colony , Cannot register multiple , It's not easy to cross TiDB Cluster computing . In the future, we hope TiSpark It can support multiple access at the same time TiDB colony .

TiDB application ：HTAP Data systems

JSpark At present, it is the core framework of offline computing , In addition, it is also related to JFlink Real time computing framework , Together, big data processing capabilities .JSpark Responsible for offline big data computing framework ,JFlink Responsible for real-time computing framework , Together they make up HTAP The data system of .

HTAP Ability to calculate ：JSpark+JFlink

First , Aggregate online data to TiDB colony , Rely on JSpark + JFlink stay TiDB Offline and real-time computing with other data sources , Output user profile and other index analysis data , Feedback online business query .

TiDB application ：HTAP Data systems

at present , after 3 The development of , The total number of cluster instances of Netease games 170 individual , Data scale reaches 100 + TB Level . Our business covers user portraits 、 Anti indulgence 、 operating 、 report form 、 Business monitoring and other aspects , And business scale and cluster scale are also developing and expanding .

The above is the use of Netease games TiDB In the process of , about AP The evolution of Computing , I hope today's sharing can inspire you .

原网站

版权声明
本文为[PingCAP]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/06/20210622172019935W.html