当前位置:网站首页>What is a consistent hash? What scenarios can it be applied to?
What is a consistent hash? What scenarios can it be applied to?
2022-06-28 06:56:00 【Alluxio】
The author of this article : Zhongrongrong Presto TSC member/Commiter
take Alluxio And Presto Combined operation is becoming more and more popular in the community , Use solid state drives or memory to cache hot data sets , Can achieve near Presto worker Local rows of data , This avoids the high latency caused by remote data reading .Presto Support hash based soft affinity scheduling (soft affinity scheduling), In this way, the same data in the whole cluster is cached only once 、 Two copies , More hot data can be cached locally , Improve cache efficiency . The existing hash algorithm is not ideal when the cluster size changes . To address this issue , This paper introduces a new hash algorithm for soft affinity scheduling —— Consistent Hashing (consistent hashing).
Soft affinity scheduling
Presto Use a method called soft affinity scheduling (soft affinity scheduling) The scheduling strategy of , Divide a piece (split, The smallest unit of data processing ) Dispatch to the same Presto worker( Preferred node ) On . Slice and Presto worker The mapping relationship is calculated by the hash function on the partition , Ensure that the same fragment is always hashed to the same worker On .
In the first processing of fragmentation , Data will be cached in the preferred worker Node . When subsequent queries process the same fragment , These requests will be dispatched to the same... Again worker Node . here , Because the data has been cached locally , There is no need to read data remotely .
To improve load balancing , Better handling worker The problem of unstable node response , Two preferred nodes will be selected . If the first node is busy or unresponsive , The second node is used . Data may be physically cached in two worker Node .
More about soft affinity scheduling , Please check out “ adopt Alluxio Data cache reduction Presto Delay ”(https://prestodb.io/blog/2020/06/16/alluxio-datacaching#soft-affinity-scheduling)
The hash algorithm
Soft affinity scheduling relies on a hash algorithm to compute fragmentation and worker Mapping between nodes . We used to use modular functions :
The hash strategy is simple , And the cluster is stable 、worker The effect is very good when the nodes do not change . however , If a worker The node is temporarily unavailable or disconnected ,worker The number of nodes may change , Split to worker The mapping of nodes will all need to be reassigned , As a result, the cache hit rate decreases significantly . If something goes wrong worker Go online again later , You need to reallocate again .
In response to this question ,Presto Which one is calculated by taking a mold worker When assigned to a specific partition , Would be right worker Total mold , Instead of running worker Number . However , This can only alleviate worker Reallocation problems caused by temporary drop of nodes . Sometimes because of workload fluctuations , increase / Delete worker Is reasonable operation . In these cases , Is it possible to maintain a reasonable cache hit rate without large-scale reallocation ?
Our solution is a consistent hash algorithm .
Consistent Hashing
The consistent hash is defined by David Karger stay 1997 For the first time in , It is an allocation algorithm that distributes network access requests to a group of network servers whose number often changes . This technology is widely used in load balancing 、 Distributed hash table, etc .
How consistent hashing works
such as , Output the hash to the range [0, MAX_VALUE] Map to a torus ( take MAX_VALUE Connect to 0). To illustrate how consistent hashing works , Let's suppose that Presto Clusters are composed of 3 individual Presto worker Node composition , Among them is 10 Pieces are queried repeatedly .
First ,worker Nodes are hashed to the hash ring . Each partition is assigned to a hash ring adjacent to the hash value of the partition worker( notes : here “ adjacent ” Defined as the location of the hash value from the partition , The first one found clockwise worker node ).
In the above case , The partition distribution is as follows :
Delete one worker
Now? , If worker2 Offline for some reason , So according to the algorithm , Fragmentation 0、5 and 7 Will be scheduled to the corresponding next hash value worker, That is to say worker1 On .
Only those assigned to have been offline worker( Here it is. worker2) The partition of needs to be re determined to which worker. Other data is not affected . If worker32 Go online later , Fragmentation 0、5 and 7 Will be reassigned to worker2, It won't affect anything else worker shooting .
Add one more worker
If the workload increases , You need to add another in the cluster worker node ——worker4, worker4 The position of the hash value of on the hash ring is shown in the following figure :
under these circumstances , Fragmentation 8 Will fall worker4 The range of , The allocation of all other partitions is unaffected , Therefore, the cache hit rate of these shards will not be affected . The result of the reallocation is as follows :
Virtual node
As can be seen from the above , Consistent hashing ensures that when nodes change , On average, only
The partition of needs to be reallocated . However , because worker The distribution lacks randomness , Slices may not be evenly distributed across all worker Node . To address this issue , We introduced “ Virtual node ” The concept of . A virtual node can redistribute its load to multiple nodes when a node disconnects , So as to reduce the load fluctuation caused by cluster instability .
Put each physical worker Nodes are mapped to multiple virtual nodes . The virtual node replaces the original physical node , On the hash ring . And then , Each slice will first be assigned to the adjacent ( Clockwise nearest ) The virtual node of , Then it is routed to the physical node mapped by the virtual node . The example below is a possible scenario , That is, every physical worker All nodes correspond 3 Virtual nodes .
As the number of nodes on the hash ring increases , The hash space will be divided more evenly .
When a physical node goes down , All virtual nodes corresponding to this physical node will be hashed . Not all partitions are reassigned to the same node , Instead, it is assigned to multiple virtual nodes , So as to map to multiple physical nodes , To achieve better load balancing .
As shown in the figure below , When the delete worker3 when , Fragmentation 2 and 3 Will be hashed back to worker2, And it's divided into pieces 8 Be hashed back to worker1.
How to be in Presto Use consistent hashes in ?
This is our recent contribution to Presto An experimental function of . If you are interested in testing or cooperation , Please contact us .
Before using this function , Please follow the guide first (https://prestodb.io/docs/current/cache/alluxio.html) Or a tutorial (https://docs.alluxio.io/os/user/stable/en/compute/Presto.html) Enable Presto The cache of . Make sure you choose SOFT_AFFINITY As a configuration item of the scheduling policy . stay /catalog/hive.properties In file , add to hive.node-selection-strategy=SOFT_AFFINITY.
Need to pass in config.properties Add node-scheduler.node-selection-hash-strategy=CONSISTENT_HASHING To enable consistent hashing .
Conclusion
As mentioned above , When adding or deleting nodes , Consistent hashing minimizes the impact of workload reallocation . When clustered worker When a node changes , Based on the consistent hash algorithm, the workload in worker Allocation between nodes , It can minimize the impact on cache hit rate on existing nodes . therefore , stay Presto In the scenario where the cluster size is expanded or reduced according to the workload , Or the hardware devices in the deployment environment are not fully controlled, resulting in worker Nodes may be reassigned and adjusted at any time , Consistent hashing strategy will be a better choice .
stay Alluxio Community , We are constantly improving Alluxio And various computing engines ( For example Presto) Integration in functionality and availability . With in Presto Consistent hash is introduced into scheduling ,Alluxio You can use Presto The soft affinity of , Achieve better data locality and cache efficiency , Ultimately improve processing performance and reduce costs . We will continue to contribute to the entire data ecosystem , Continuously improve and optimize our products .
边栏推荐
- 全方位透析真实企业软件测试流程
- Rn7302 three-phase electric quantity detection (based on STM32 single chip microcomputer)
- Mise en œuvre de l'actionneur asynchrone d'exécution à partir de zéro
- LeetCode+ 66 - 70 高精度、二分专题
- Libuv framework echo server C source code explanation (TCP part)
- RN7302三相电量检测(基于STM32单片机)
- 我的MVVM开源项目《出行防疫App》已发布
- eyebeam高级设置
- Comprehensive analysis of real enterprise software testing process
- Wechat applets - basics takes you to understand the life cycle of applets (I)
猜你喜欢
FPGA - 7 Series FPGA selectio -07- iserdese2 of advanced logic resources
Linked list (III) - reverse linked list
Linked list (I) - remove linked list elements
强化学习——格子世界
浮动与定位
职场IT老鸟的几点小习惯
Paper recommendation: efficientnetv2 - get smaller models and faster training speed through NAS, scaling and fused mbconv
推荐10个好用到爆的Jupyter Notebook插件,让你效率飞起
Error reporting - resolve core JS / modules / es error. cause. JS error
Niubi 666, this project makes web page making as simple as building blocks
随机推荐
CRC32概述以及实现和使用
图片按日期批量导入WPS表格
我的MVVM开源项目《出行防疫App》已发布
实现这个 issue 得700块钱人民币,有人做嘛?
KMP string
Linked list (I) - remove linked list elements
选拔赛题目代码
The code is correct, and the rendering page does not display the reason
MySQL (I) - Installation
[c #] [reprint]furion frame address and tutorial address
Puge -- understanding of getordefault() method
Eyebeam advanced settings
浮动与定位
微信小程序分页功能,下拉刷新功能,直接干货拿来就用
[rust translation] implement rust asynchronous actuator from scratch
Teach you how to use UCOS
Causes of wechat applet compilation page blank bug
什么是一致性哈希?可以应用在哪些场景?
Paper recommendation: efficientnetv2 - get smaller models and faster training speed through NAS, scaling and fused mbconv
FPGA - 7 Series FPGA selectio -08- oserdese2 of advanced logic resources