当前位置:网站首页>Figure operation flow of HAMA BSP Model
Figure operation flow of HAMA BSP Model
2022-06-22 16:57:00 【ZH519080】
Hama-architecture:

Apache-hama Cluster is based on BSP Based on a framework BSPMaster、( Multiple ) Unrelated GroomServer node controler 、 It can run independently Zookpeer The cluster consists of .BSPMaster use “ fifo ” The principle is right GroomServer monitor 、job Submission processing of 、 Task allocation and record the whole running dynamics ,BSPMaster call BSP Class setup Method 、bsp Methods and cleanup Method pair superstep Control .GroomServer adopt “HeartBeat” towards BSPMaster Send heartbeat message , towards BSPMaster Report current GroomServer Node cluster status . The status information includes the maximum task quantity and available memory capacity of the node cluster .BSPMaster Start according to the heartbeat information BSP Task handle job Divided into... One by one task, And then task Assigned to GroomServer Calculate the node group ,GroomServer start-up BSPPeer perform GroomServer Assigned task.Zookpeer management BSPPeers The barrier synchronization of , Realization BarrierSynchronisation Mechanism .Zookpeer mainly BSPPeer.sync() Method 、enterBarrier() Methods and leaveBarrier() Methods control BSP Barrier synchronization phase of the mission . Transmission of information 、 The receiving is completed in the barrier synchronization stage .
1、job Submission and distribution of :

stay GraphJob Class waitForCompetition() Call in method submit() Method to job Submit to BSPMaster, The main contents submitted are :VertexClass In the implementation of 、VertexInputReader In the implementation of 、Edge The properties and Vertex Of ID、value etc. .
according to BSPMaster monitor GroomServer The running state of the cluster ,BSPMaster For submitted job According to the number of tasks that can be run GroomServer The cluster of computing nodes assigns tasks . By default GroomServer The number of running tasks is 1, If task The running task quantity of is not 1, be BSPMaster The maximum task amount allocated is the difference between the maximum task amount borne by the cluster and the current task amount . according to BSPMaster The amount of tasks assigned to complete is right job The processed data is divided , After data segmentation BSPMaster control GroomServer Assign data to each BSPPeer in .
HAMA-Graph-setup and bsp Operation diagram
Hama-Graph The operation of the system also follows BSP-HAMA Framework of the , It's all in BSP.setup() Method 、BSP.bsp() Method 、BSP.cleanup() Methods and BSP.clear() Method . use BSP.setup() Methods and BSP.cleanup() Methods are used to execute the start calculation and end calculation, and output the calculation results .superstep The calculation of is mainly in BSP.bsp() Method , therefore BSP.bsp() It mainly controls the core part of the whole parallel computing .BSP.clear() The method is used to clear this superstep And prepare for the next step superstep Calculation .

2、 Data loading and superstep initialization
Data loading and superstep The initialization of is mainly started by controlling superstep Calculated BSP.setup() Method .

The allocated data passes through BSP.setup() Methods GraphJobRunner.loadVertices() Method to load data into memory .Job After publishing, use VertexInputReader.parseVertex() Method to parse the data , After parsing the data, use hashMap Of put(getNumPeers,GraphJobMessage) Method to temporarily store data in hashMap in . obtain hashMap Properties of ( That is, the parsed value value ) use BSPPeer.send(peerName,GraphJobMessage) Method in peers Information transfer between , The main contents to be delivered are peer Address and GraphJobMessage Information . But there is no guarantee that messages are sent and received in the same order , So in the barrier synchronization phase , The same message sent may not arrive BSPPeer On , also BSPPeer There is information transmission between them . Wait until the information transmission is completed in the barrier grid synchronization phase MessageQueue.poll() Method BSPPeer Get the message .
Wait until all the data is loaded into memory superstep The initialization ( The whole program is executed only once ).Superstep When initializing, the GraphJobRunner.doInitialSuperstep() To perform the . When all BSPPeer The work to be done after receiving the corresponding message is to superstep Initialization of calculation , This work is regarded as the first... After the drama data is loaded supertep Calculation , Although there are VerticeInfo.startSuperstep() Methods and VerticesInfo.finishstartstep() Method runs but does not really superstep Calculation , Because the vertex is still inactive , Just to set BSPPeer Cluster computing threads .
3、 Barrier synchronization
But how does the information work in the barrier synchronization stage ?

In the process of synchronization, information transmission mainly uses outgoingMessageManager( Output information manager ) and localQueue( Local message queue ), adopt outgoingMessageManager( Output information manager ) hold peer Address and GraphJobMessage Package to outgoingBundles( Output package ) in .
All the information is packaged in outgoingBundle The preparation for information transmission is completed . Then all the information is in the barrier synchronization phase (sync()) To summarize . hold peer Address and GraphJobMessage Information from outgoingBundle After taking it out of the , adopt LocalBSPRunner.tansfer() Method store in hashMap The information in is loaded into localQueueForNextIteration(SynchronizedQueue object ) in , And use MessageQueue.addBundle() Method to package .
When information is loaded into localQueue( Local message queue ) In the after , All the information starts enterBarrier Stage . stay enterBarrier Phases wait by scheduling threads (wait()) Others have not yet entered the barrier synchronization stage . When the last thread that controls the message enters the barrier synchronization phase , A non empty barrier is provided in the structure , Then the current message thread is executed and other message threads are still in the waiting phase , If the last message thread enters the barrier synchronization phase, there is no non empty barrier in the structure , Then each thread enters the preemptive mode , The thread that grabs the execution right will execute the thread first . It is not until all the message threads enter the barrier synchronization phase that they really start to BSPPeer Information transfer between . During the barrier synchronization phase BSPPeer Transmission between is used in AbstractMessageManager.clearOutgoingMessages() Methods localQueueForNextIteration.getMessageQueue() Method uses localQueue The replacement is completed . In the barrier synchronization phase, when all the taskID The associated information for the identification is passed through localQueue Send it out and go to leaveBarrier Stage .
4、superstep Calculation


In a superstep During the calculation BSPPeer Only messages can be sent or the last one can be processed superstep Message received in .
Really start superstep The calculation is in doSuperstep(GpprootraphJobMessage,BSPPeer) Implemented in the method of . When BSPPeer Cluster start superstep Before calculation, you need to use class AtomitInteger Activate the vertex to make it active , Traverse all the information and the arrangement order of vertices , With the same ID The iteration information of starts as the first vertex superstep The calculation of . Method of use startSuperstep() perform superstep The beginning of , Next, customize compute() Method implementation , When no vertex is activated or the number of iterations set by the user is reached, it means superstep end .
边栏推荐
- Spark Streaming checkpoint的问题与恢复
- Bidirectional data binding V-model and v-decorator
- Make the code elegant (learn debugging + code style)
- Task scheduling design of collection system
- [wechat applet custom bottom tabbar]
- jsp学习之(一)---------jsp概述
- Basic application of scala for
- uniapp微信小程序获取页面二维码(带有参数)
- 每秒处理10万高并发订单的乐视集团支付系统架构分享
- 双向数据绑定v-model与v-decorator
猜你喜欢

Parts beyond the text are indicated by ellipsis

LETV group payment system architecture sharing for processing 100000 high concurrent orders per second

【微信小程序封装底部弹出框】一

面对默认导入失败的情况

新手必会的静态站点生成器——Gridsome
![[pop up box 2 at the bottom of wechat applet package]](/img/31/266e6a1f4200347c9324ea37b78562.png)
[pop up box 2 at the bottom of wechat applet package]

How to add a "security lock" to the mobile office of government and enterprises?

【微信小程序获取自定义tabbar的高度】绝对可用!!!

每秒处理10万高并发订单的乐视集团支付系统架构分享

web技术分享| 【高德地图】实现自定义的轨迹回放
随机推荐
使用IDM让百度云加速的方法
Scala for derivation: the ability to define a value in the first part of a for expression and use it in subsequent (outer) expressions
spark-cache的源码分析
mysql 字符串字段转浮点型字段
mysql账号增删改、数据导入导出命令举例
scala之闭包函数浅知
每秒处理10万高并发订单的乐视集团支付系统架构分享
Interface (optimization type annotation)
Summary of Changan chain usage skills
Summary of spark common operators
[deep anatomy of C language] keywords if & else & bool type
Unable to connect after win10 WiFi is disconnected
代码扫描工具扫出的 Arrays.asList 使用BUG
Machine learning notes - Hagrid - Introduction to gesture recognition image data set
Gridhome, a must-have static site generator for beginners
Vs2017 solution to not displaying qstring value in debugging status
Why buy increased life insurance? Is increased life insurance safe and reliable?
接口(优化类型注解)
Spark性能调优之道——解决Spark数据倾斜(Data Skew)的N种姿势
The way to optimize spark performance -- solving N poses of spark data skew