当前位置:网站首页>Uncover ges super large scale graph computing engine hyg: Graph Segmentation
Uncover ges super large scale graph computing engine hyg: Graph Segmentation
2022-06-25 18:51:00 【Huawei cloud developer Alliance】
Abstract :GES Large scale graph computing engine HyG By implementing different point edge partition algorithms , Users can flexibly choose a variety of segmentation strategies , So as to achieve better computing performance .
This article is shared from Huawei cloud community 《GES Very large scale graph computing engine HyG Secret map segmentation 》, author :π .
For very large scale map data , spot 、 The number of edges can reach tens or even hundreds of billions 、 One trillion , It often takes several hundred to store graph data of this size GB Even a few TB Storage space , For efficient graph calculation and analysis , It is often necessary to store graph data in memory , therefore , For very large scale graph data , There is no way to store the whole picture on a single machine , This requires the graph computing framework to segment the original graph , Cut a large graph into several subgraphs , Then run these subgraphs on a cluster , And then complete all kinds of graph analysis 、 Figure query task .GES Large scale graph computing engine HyG By implementing different point edge partition algorithms , Users can flexibly choose a variety of segmentation strategies , So as to achieve better computing performance .
A good graph segmentation algorithm is very important to the performance of graph computing engine :
- Reasonable segmentation can make the load of each subgraph more balanced , And then accelerate the iteration speed of each round
- Reasonable segmentation can greatly reduce the communication overhead between subgraphs
There are two main branches of graph segmentation algorithm :
- Offline segmentation : Need to load complete graph data , According to the overall topology of the graph , Do several iterations 、 Clustering and other methods , To get high-quality partition boundaries
- Stream slicing : At the right point / During the division of edges , The data of the divided object is streamed to the graph division module , The partition policy of the current object depends on the partition state of the previous object at most , It has nothing to do with the information of the object transmitted in the subsequent streaming mode
The advantages of streaming graph partitioning algorithm
1) Memory usage is small . Streaming graph partition algorithm each partition only needs to load part of the graph data to partition , The synchronization of partition states can be completed with only a small communication cost between partitions . The offline graph partitioning algorithm needs to load the complete graph data , And for example XtraPupl、Metis Iterate the entire graph several times , Refine its partitions , A lot of memory overhead .
2) High mapping efficiency . In the flow graph partition algorithm , Objects are streamed in , The partition result can be determined immediately , You only need to traverse each partition object once . And support multi-threaded multi object parallel streaming input 、 Partition , After the partition is completed, the state of the partition is synchronized between threads , Achieve efficient partitioning . But similar XtraPupl、Metis Offline partition algorithm , Because it requires multiple iterations , The partition of an object needs to be traversed more than once , And because its partition algorithm is highly dependent on the overall topology of the graph , As a result, its exploitable parallelism is not high , The efficiency of graph partition is not as good as that of flow graph partition algorithm .
3) The quality of the plot is quite , Better in some cases . In the test of many graph calculation algorithms and multiple data sets , Each graph has its own advantages and disadvantages , However, there are always several specific partitioning algorithms in flow graph partitioning , The time cost of reasoning in graph computation is better than that of offline graph partitioning algorithm XtraPupl, The flow graph partition algorithm is more flexible for users , It can be used for different algorithms 、 Data sets 、 Zoning quantity , Select the appropriate partition algorithm , Get better graph computing reasoning performance .
our HyG Graph computing engine uses flow segmentation algorithm to segment graph .
HyG How graph computing engine implements Graph Segmentation ?
The core of graph segmentation is two steps .1、 How to assign points to each host ;2、 How to assign edges to hosts .HyG These two steps are abstracted , The following two methods are defined :
getMaster(nodeID) // Given a point ID, Return to the host to which this point should be assigned
getEdgeOwner(edgeSrcID, edgeDstID) // Given an edge , Return to which host this side needs to be assigned
Once the above two functions are given ,HyG Can complete the segmentation of large graphs . There are a large number of Graph Segmentation Algorithms in the industry and academia , The essential difference is getMaster and getEdgeOwner Different implementations of , Almost all graph segmentation algorithms can be rewritten into this pattern . Now let's take an example of edge cutting (OEC) Example , have a look HyG By redefining getMaster and getEdgeOwner Two functions complete graph segmentation .

First we define
GetMaster(nodeID)
return floor(nodeID / hostNum)This function is very simple , That's the point. ID Divide by the number of hosts and round down , chart 1 We have 4 vertices , Suppose we have 4 Host computer , In this case, a point is assigned to each host .
Then we define
GetEdgeOwner(edgeSrcID, edgeDstID)
return masterOf(edgeSrcID)This function is also very simple , Is the host where the source vertex of the edge is located ID, because GetMaster The division of points has been completed , Therefore, this step of direct query can get the results . In this way, the slicer will determine which host this edge should be assigned to .
At the completion of GetMaster and GetEdgeOwner After the definition of ,HyG The original graph can be cut into 4 Subtext , Pictured 3 Shown .

chart 3
The segmentation example mentioned above is the simplest method of segmentation , In real application scenarios , More complex segmentation strategies are often needed to achieve effectiveness 、 Balanced segmentation effect .
HyG The implemented point partition algorithms are mainly divided into two categories :ContiguousEB and FennelEB.
1) ContiguousEB Directly assign the point to the partition that reads the point in the picture reading phase , No additional partition operations .
2) FennelEB The current partition state will be maintained during partition , Stream read to the vertex , According to the allocated quantity of each partition 、 The locality of the vertex in each partition , Generate a score for each partition , Assign vertices to the highest rated partition .
HyG The implemented edge partition algorithms are mainly divided into three categories :Source,Hybrid and Cartesian.
1) Source Assign an edge directly to the zone where its corresponding vertex is located , If it is an algorithm divided by out edges , Divide it into the partition where the source vertex is located , If it is an algorithm divided by the incoming edge , Divide it into the zone where the target vertex is located .
2) Hybrid Set a threshold threshold, Vertices whose degree is lower than the threshold are called low degree vertices , Those above the threshold are regarded as height vertices . This edge partition algorithm guarantees the locality of low degree vertices , At the same time, the height vertices are scattered , It avoids the load imbalance problem caused by a few height vertices .
3) Cartesian The adjacency matrix is divided into several two-dimensional meshes , Each grid corresponds to a partition , Mesh edges , This partitioning algorithm preserves a certain degree of locality , At the same time, better load balancing can be achieved .
HyG By implementing different point partition algorithms (GetMaster) And different edge partition algorithms (GetEdgeOwner), Various segmentation strategies can be implemented , Very flexible and easy to use , Users can choose different segmentation strategies according to the characteristics of their graph data , So as to achieve better computing performance .
at present HyG Online to Huawei cloud GES service , Welcome to experience ~

Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- RMAN backup database_ Skip offline, read-only, and inaccessible files
- On location and scale in CNN
- El table highly adaptive
- 广州华锐互动VR全景为各行各业带来发展
- Huawei cloud SRE deterministic operation and maintenance special issue (the first issue)
- 一晚上做了一个xpath终结者:xpath-helper-plus
- [in depth understanding of tcapulusdb technology] tcapulusdb operation and maintenance doc
- 利尔达蓝牙空调接收器方案助力打造更舒适的公路生活
- [in depth understanding of tcapulusdb technology] tcapulusdb operation and maintenance
- Electronic basic project construction & communication between main thread and rendering thread
猜你喜欢

《痞子衡嵌入式半月刊》 第 57 期
![[deeply understand tcapulusdb technology] create a game zone](/img/91/cf4eae9a4336ca407c0da805b9d909.png)
[deeply understand tcapulusdb technology] create a game zone

Analysis on the market scale and pattern of contrast agents in China in 2021: Jiangsu Hengrui pharmaceutical, general electric, Yangzijiang Pharmaceutical Group, Bayer and bleco account for more than
![[deeply understand tcapulusdb technology] tmonitor module architecture](/img/82/24a8502604fccb89fea9963c3f3495.png)
[deeply understand tcapulusdb technology] tmonitor module architecture

如何快速关闭8080端口

Kwai 616 war report was launched, and the essence was thrown away for the second time to lead the new wave. Fast brand jumped to the top 3 of the hot list
![Overview and trend analysis of China's foreign direct investment industry in 2020 [figure]](/img/b3/73e01601885eddcd05b68a20f83ca8.jpg)
Overview and trend analysis of China's foreign direct investment industry in 2020 [figure]

Sorting out the latest data mining competition scheme!

利尔达蓝牙空调接收器方案助力打造更舒适的公路生活
![[in depth understanding of tcapulusdb technology] tcapulusdb operation and maintenance](/img/7b/8c4f1549054ee8c0184495d9e8e378.png)
[in depth understanding of tcapulusdb technology] tcapulusdb operation and maintenance
随机推荐
[C language practice - print the upper triangle and its deformation (with blank version)]
IDEA常用插件
Move graph explorer to jupyterab: use ges4jupyter to connect ges and explore graphs
Command records of common data types for redis cli operations
Tiger DAO VC产品正式上线,Seektiger生态的有力补充
In 2021, China's private equity market is growing, and the scale of private equity fund management reaches 19.78 trillion yuan [figure]
Training of long and difficult sentences in postgraduate entrance examination day81
Analysis on the development trend of China's intense pulsed light equipment industry in 2021: the market scale is growing, and the proportion of imported brands is large [figure]
Class 02 loader subsystem
Apifox简单了解——WEB端测试的集大成者
Current situation and trend analysis of China's glass packaging containers in 2021: the revenue of glass packaging containers increases year by year [figure]
Kwai 616 war report was launched, and the essence was thrown away for the second time to lead the new wave. Fast brand jumped to the top 3 of the hot list
QQ机器人:群成员自我禁言管理【最新beta2版本】
C generic class case
想知道新股民怎样炒股票开户?在线开户安全么?
中金财富安全吗? 开户需要多久
Comparison rules of strings in JS
Training of long and difficult sentences in postgraduate entrance examination day90
Kotlin Compose 终结toDo项目 点击可以编辑修改todo
Detailed explanation of route add command