Using consistent hash algorithm in Presto to enhance the data cache locality of dynamic clusters

The author of this article ： Zhongrongrong Presto TSC member/Commiter

take Alluxio And Presto Combined operation is becoming more and more popular in the community , Use solid state drives or memory to cache hot data sets , Can achieve near Presto worker Local rows of data , This avoids the high latency caused by remote data reading .Presto Support hash based soft affinity scheduling （soft affinity scheduling）, In this way, the same data in the whole cluster is cached only once 、 Two copies , More hot data can be cached locally , Improve cache efficiency . The existing hash algorithm is not ideal when the cluster size changes . To address this issue , This paper introduces a new hash algorithm for soft affinity scheduling —— Consistent Hashing （consistent hashing）.

Soft affinity scheduling

Presto Use a method called soft affinity scheduling （soft affinity scheduling） The scheduling strategy of , Divide a piece （split, The smallest unit of data processing ） Dispatch to the same Presto worker（ Preferred node ） On . Slice and Presto worker The mapping relationship is calculated by the hash function on the partition , Ensure that the same fragment is always hashed to the same worker On .

In the first processing of fragmentation , Data will be cached in the preferred worker Node . When subsequent queries process the same fragment , These requests will be dispatched to the same... Again worker Node . here , Because the data has been cached locally , There is no need to read data remotely .

To improve load balancing , Better handling worker The problem of unstable node response , Two preferred nodes will be selected . If the first node is busy or unresponsive , The second node is used . Data may be physically cached in two worker Node .

More about soft affinity scheduling , Please check out “ adopt Alluxio Data cache reduction Presto Delay ”（prestodb.io/blog/2020/0…）

The hash algorithm

Soft affinity scheduling relies on a hash algorithm to compute fragmentation and worker Mapping between nodes . We used to use modular functions ：

The hash strategy is simple , And the cluster is stable 、worker The effect is very good when the nodes do not change . however , If a worker The node is temporarily unavailable or disconnected ,worker The number of nodes may change , Split to worker The mapping of nodes will all need to be reassigned , As a result, the cache hit rate decreases significantly . If something goes wrong worker Go online again later , You need to reallocate again .

In response to this question ,Presto Which one is calculated by taking a mold worker When assigned to a specific partition , Would be right worker Total mold , Instead of running worker Number . However , This can only alleviate worker Reallocation problems caused by temporary drop of nodes . Sometimes because of workload fluctuations , increase / Delete worker Is reasonable operation . In these cases , Is it possible to maintain a reasonable cache hit rate without large-scale reallocation ？

Our solution is a consistent hash algorithm .

Consistent Hashing

The consistent hash is defined by David Karger stay 1997 For the first time in , It is an allocation algorithm that distributes network access requests to a group of network servers whose number often changes . This technology is widely used in load balancing 、 Distributed hash table, etc .

How consistent hashing works

such as , Output the hash to the range [0, MAX_VALUE] Map to a torus （ take MAX_VALUE Connect to 0）. To illustrate how consistent hashing works , Let's suppose that Presto Clusters are composed of 3 individual Presto worker Node composition , Among them is 10 Pieces are queried repeatedly .

First ,worker Nodes are hashed to the hash ring . Each partition is assigned to a hash ring adjacent to the hash value of the partition worker（ notes ： here “ adjacent ” Defined as the location of the hash value from the partition , The first one found clockwise worker node ）.

In the above case , The partition distribution is as follows ：

Delete one worker

Now? , If worker2 Offline for some reason , So according to the algorithm , Fragmentation 0、5 and 7 Will be scheduled to the corresponding next hash value worker, That is to say worker1 On .

Only those assigned to have been offline worker（ Here it is. worker2） The partition of needs to be re determined to which worker. Other data is not affected . If worker32 Go online later , Fragmentation 0、5 and 7 Will be reassigned to worker2, It won't affect anything else worker shooting .

Add one more worker

If the workload increases , You need to add another in the cluster worker node ——worker4, worker4 The position of the hash value of on the hash ring is shown in the following figure ：

under these circumstances , Fragmentation 8 Will fall worker4 The range of , The allocation of all other partitions is unaffected , Therefore, the cache hit rate of these shards will not be affected . The result of the reallocation is as follows ：

Virtual node

As can be seen from the above , Consistent hashing ensures that when nodes change , On average, only

The partition of needs to be reallocated . However , because worker The distribution lacks randomness , Slices may not be evenly distributed across all worker Node . To address this issue , We introduced “ Virtual node ” The concept of . A virtual node can redistribute its load to multiple nodes when a node disconnects , So as to reduce the load fluctuation caused by cluster instability .

Put each physical worker Nodes are mapped to multiple virtual nodes . The virtual node replaces the original physical node , On the hash ring . And then , Each slice will first be assigned to the adjacent （ Clockwise nearest ） The virtual node of , Then it is routed to the physical node mapped by the virtual node . The example below is a possible scenario , That is, every physical worker All nodes correspond 3 Virtual nodes .

As the number of nodes on the hash ring increases , The hash space will be divided more evenly .

When a physical node goes down , All virtual nodes corresponding to this physical node will be hashed . Not all partitions are reassigned to the same node , Instead, it is assigned to multiple virtual nodes , So as to map to multiple physical nodes , To achieve better load balancing .

As shown in the figure below , When the delete worker3 when , Fragmentation 2 and 3 Will be hashed back to worker2, And it's divided into pieces 8 Be hashed back to worker1.

How to be in Presto Use consistent hashes in ？

This is our recent contribution to Presto An experimental function of . If you are interested in testing or cooperation , Please contact us .

Before using this function , Please follow the guide first （prestodb.io/docs/curren…） Or a tutorial （docs.alluxio.io/os/user/sta…） Enable Presto The cache of . Make sure you choose SOFT_AFFINITY As a configuration item of the scheduling policy . stay /catalog/hive.properties In file , add to hive.node-selection-strategy=SOFT_AFFINITY.

Need to pass in config.properties Add node-scheduler.node-selection-hash-strategy=CONSISTENT_HASHING To enable consistent hashing .

Conclusion

As mentioned above , When adding or deleting nodes , Consistent hashing minimizes the impact of workload reallocation . When clustered worker When a node changes , Based on the consistent hash algorithm, the workload in worker Allocation between nodes , It can minimize the impact on cache hit rate on existing nodes . therefore , stay Presto In the scenario where the cluster size is expanded or reduced according to the workload , Or the hardware devices in the deployment environment are not fully controlled, resulting in worker Nodes may be reassigned and adjusted at any time , Consistent hashing strategy will be a better choice .

stay Alluxio Community , We are constantly improving Alluxio And various computing engines （ For example Presto） Integration in functionality and availability . With in Presto Consistent hash is introduced into scheduling ,Alluxio You can use Presto The soft affinity of , Achieve better data locality and cache efficiency , Ultimately improve processing performance and reduce costs . We will continue to contribute to the entire data ecosystem , Continuously improve and optimize our products .

当前位置：网站首页>Using consistent hash algorithm in Presto to enhance the data cache locality of dynamic clusters