当前位置:网站首页>Using consistent hash algorithm in Presto to enhance the data cache locality of dynamic clusters
Using consistent hash algorithm in Presto to enhance the data cache locality of dynamic clusters
2022-06-24 17:19:00 【Alluxio official】
The author of this article : Zhongrongrong Presto TSC member/Commiter
take Alluxio And Presto Combined operation is becoming more and more popular in the community , Use solid state drives or memory to cache hot data sets , Can achieve near Presto worker Local rows of data , This avoids the high latency caused by remote data reading .Presto Support hash based soft affinity scheduling (soft affinity scheduling), In this way, the same data in the whole cluster is cached only once 、 Two copies , More hot data can be cached locally , Improve cache efficiency . The existing hash algorithm is not ideal when the cluster size changes . To address this issue , This paper introduces a new hash algorithm for soft affinity scheduling —— Consistent Hashing (consistent hashing).
Soft affinity scheduling
Presto Use a method called soft affinity scheduling (soft affinity scheduling) The scheduling strategy of , Divide a piece (split, The smallest unit of data processing ) Dispatch to the same Presto worker( Preferred node ) On . Slice and Presto worker The mapping relationship is calculated by the hash function on the partition , Ensure that the same fragment is always hashed to the same worker On .
In the first processing of fragmentation , Data will be cached in the preferred worker Node . When subsequent queries process the same fragment , These requests will be dispatched to the same... Again worker Node . here , Because the data has been cached locally , There is no need to read data remotely .
To improve load balancing , Better handling worker The problem of unstable node response , Two preferred nodes will be selected . If the first node is busy or unresponsive , The second node is used . Data may be physically cached in two worker Node .
More about soft affinity scheduling , Please check out “ adopt Alluxio Data cache reduction Presto Delay ”(prestodb.io/blog/2020/0…)
The hash algorithm
Soft affinity scheduling relies on a hash algorithm to compute fragmentation and worker Mapping between nodes . We used to use modular functions :
The hash strategy is simple , And the cluster is stable 、worker The effect is very good when the nodes do not change . however , If a worker The node is temporarily unavailable or disconnected ,worker The number of nodes may change , Split to worker The mapping of nodes will all need to be reassigned , As a result, the cache hit rate decreases significantly . If something goes wrong worker Go online again later , You need to reallocate again .
In response to this question ,Presto Which one is calculated by taking a mold worker When assigned to a specific partition , Would be right worker Total mold , Instead of running worker Number . However , This can only alleviate worker Reallocation problems caused by temporary drop of nodes . Sometimes because of workload fluctuations , increase / Delete worker Is reasonable operation . In these cases , Is it possible to maintain a reasonable cache hit rate without large-scale reallocation ?
Our solution is a consistent hash algorithm .
Consistent Hashing
The consistent hash is defined by David Karger stay 1997 For the first time in , It is an allocation algorithm that distributes network access requests to a group of network servers whose number often changes . This technology is widely used in load balancing 、 Distributed hash table, etc .
How consistent hashing works
such as , Output the hash to the range [0, MAX_VALUE] Map to a torus ( take MAX_VALUE Connect to 0). To illustrate how consistent hashing works , Let's suppose that Presto Clusters are composed of 3 individual Presto worker Node composition , Among them is 10 Pieces are queried repeatedly .
First ,worker Nodes are hashed to the hash ring . Each partition is assigned to a hash ring adjacent to the hash value of the partition worker( notes : here “ adjacent ” Defined as the location of the hash value from the partition , The first one found clockwise worker node ).
In the above case , The partition distribution is as follows :
Delete one worker
Now? , If worker2 Offline for some reason , So according to the algorithm , Fragmentation 0、5 and 7 Will be scheduled to the corresponding next hash value worker, That is to say worker1 On .
Only those assigned to have been offline worker( Here it is. worker2) The partition of needs to be re determined to which worker. Other data is not affected . If worker32 Go online later , Fragmentation 0、5 and 7 Will be reassigned to worker2, It won't affect anything else worker shooting .
Add one more worker
If the workload increases , You need to add another in the cluster worker node ——worker4, worker4 The position of the hash value of on the hash ring is shown in the following figure :
under these circumstances , Fragmentation 8 Will fall worker4 The range of , The allocation of all other partitions is unaffected , Therefore, the cache hit rate of these shards will not be affected . The result of the reallocation is as follows :
Virtual node
As can be seen from the above , Consistent hashing ensures that when nodes change , On average, only
The partition of needs to be reallocated . However , because worker The distribution lacks randomness , Slices may not be evenly distributed across all worker Node . To address this issue , We introduced “ Virtual node ” The concept of . A virtual node can redistribute its load to multiple nodes when a node disconnects , So as to reduce the load fluctuation caused by cluster instability .
Put each physical worker Nodes are mapped to multiple virtual nodes . The virtual node replaces the original physical node , On the hash ring . And then , Each slice will first be assigned to the adjacent ( Clockwise nearest ) The virtual node of , Then it is routed to the physical node mapped by the virtual node . The example below is a possible scenario , That is, every physical worker All nodes correspond 3 Virtual nodes .
As the number of nodes on the hash ring increases , The hash space will be divided more evenly .
When a physical node goes down , All virtual nodes corresponding to this physical node will be hashed . Not all partitions are reassigned to the same node , Instead, it is assigned to multiple virtual nodes , So as to map to multiple physical nodes , To achieve better load balancing .
As shown in the figure below , When the delete worker3 when , Fragmentation 2 and 3 Will be hashed back to worker2, And it's divided into pieces 8 Be hashed back to worker1.
How to be in Presto Use consistent hashes in ?
This is our recent contribution to Presto An experimental function of . If you are interested in testing or cooperation , Please contact us .
Before using this function , Please follow the guide first (prestodb.io/docs/curren…) Or a tutorial (docs.alluxio.io/os/user/sta…) Enable Presto The cache of . Make sure you choose SOFT_AFFINITY As a configuration item of the scheduling policy . stay /catalog/hive.properties In file , add to hive.node-selection-strategy=SOFT_AFFINITY.
Need to pass in config.properties Add node-scheduler.node-selection-hash-strategy=CONSISTENT_HASHING To enable consistent hashing .
Conclusion
As mentioned above , When adding or deleting nodes , Consistent hashing minimizes the impact of workload reallocation . When clustered worker When a node changes , Based on the consistent hash algorithm, the workload in worker Allocation between nodes , It can minimize the impact on cache hit rate on existing nodes . therefore , stay Presto In the scenario where the cluster size is expanded or reduced according to the workload , Or the hardware devices in the deployment environment are not fully controlled, resulting in worker Nodes may be reassigned and adjusted at any time , Consistent hashing strategy will be a better choice .
stay Alluxio Community , We are constantly improving Alluxio And various computing engines ( For example Presto) Integration in functionality and availability . With in Presto Consistent hash is introduced into scheduling ,Alluxio You can use Presto The soft affinity of , Achieve better data locality and cache efficiency , Ultimately improve processing performance and reduce costs . We will continue to contribute to the entire data ecosystem , Continuously improve and optimize our products .
边栏推荐
- Tencent released "warehouse express" and issued "ID card" for each commodity!
- Cloud native monitoring via blackbox_ Exporter monitoring website
- FPGA project development: experience sharing of lmk04821 chip project development based on jesd204b (I)
- [kotlin] constructor summary
- Classic examples of C language 100
- [play with Tencent cloud] & lt; trtc-room> Applet component usage
- FPGA systematic learning notes serialization_ Day8 [design of 4-bit multiplier and 4-bit divider]
- [log service CLS] Tencent cloud game battle engine mgobe accesses CLS
- Can yangjianyun's new media operation in 2021 bear all the expectations of the enterprise's private domain traffic demand?
- Go kit microservice integrates Promtheus to solve monitoring alarm problems
猜你喜欢

MySQL learning -- table structure of SQL test questions
![[leetcode108] convert an ordered array into a binary search tree (medium order traversal)](/img/e1/0fac59a531040d74fd7531e2840eb5.jpg)
[leetcode108] convert an ordered array into a binary search tree (medium order traversal)

Why do you develop middleware when you are young? "You can choose your own way"

Daily algorithm & interview questions, 28 days of special training in large factories - the 15th day (string)
随机推荐
集体突破之后,中国公有云的下一步落在哪里?
Low education without food? As an old Android rookie in the past six years, I was the most difficult one
Industrial security experts talk about how to guarantee the safety of data elements in the rapid development of digital economy?
CentOS 7 installing SQL server2017 (Linux)
Robot toolbox matlab robotics toolbox
FPGA systematic learning notes serialization_ Day10 [sequential logic, competitive adventure, synchronous reset, asynchronous reset]
In those years, I insisted on learning the motivation of programming
A tutorial on how the zblog system obtains user related information based on user ID
Building a cross public chain platform to solve DAPP development problems
TRCT test cloud + article online speed
Kubernetes 1.20.5 helm installation Jenkins
中金证券靠谱吗?是否合法?开股票账户安全吗?
Future banks need to think about today's structure with tomorrow's thinking
As for IOT safety, 20 CSOs from major manufacturers say
Development analysis of main chain system
Solution to the problem that qlineedit setting qdoublevalidator setting range is invalid
[go language development] start to develop Meitu station from 0 - Lesson 5 [receive pictures and upload]
How Tencent cloud es achieves cross cluster data copy & lt through reindex; Lower & gt;
Sigai intelligent container damage identification products are deployed in Rizhao Port and Yingkou Port
[web] what happens after entering the URL from the address bar?