当前位置:网站首页>Figure operation flow of HAMA BSP Model
Figure operation flow of HAMA BSP Model
2022-06-22 16:57:00 【ZH519080】
Hama-architecture:

Apache-hama Cluster is based on BSP Based on a framework BSPMaster、( Multiple ) Unrelated GroomServer node controler 、 It can run independently Zookpeer The cluster consists of .BSPMaster use “ fifo ” The principle is right GroomServer monitor 、job Submission processing of 、 Task allocation and record the whole running dynamics ,BSPMaster call BSP Class setup Method 、bsp Methods and cleanup Method pair superstep Control .GroomServer adopt “HeartBeat” towards BSPMaster Send heartbeat message , towards BSPMaster Report current GroomServer Node cluster status . The status information includes the maximum task quantity and available memory capacity of the node cluster .BSPMaster Start according to the heartbeat information BSP Task handle job Divided into... One by one task, And then task Assigned to GroomServer Calculate the node group ,GroomServer start-up BSPPeer perform GroomServer Assigned task.Zookpeer management BSPPeers The barrier synchronization of , Realization BarrierSynchronisation Mechanism .Zookpeer mainly BSPPeer.sync() Method 、enterBarrier() Methods and leaveBarrier() Methods control BSP Barrier synchronization phase of the mission . Transmission of information 、 The receiving is completed in the barrier synchronization stage .
1、job Submission and distribution of :

stay GraphJob Class waitForCompetition() Call in method submit() Method to job Submit to BSPMaster, The main contents submitted are :VertexClass In the implementation of 、VertexInputReader In the implementation of 、Edge The properties and Vertex Of ID、value etc. .
according to BSPMaster monitor GroomServer The running state of the cluster ,BSPMaster For submitted job According to the number of tasks that can be run GroomServer The cluster of computing nodes assigns tasks . By default GroomServer The number of running tasks is 1, If task The running task quantity of is not 1, be BSPMaster The maximum task amount allocated is the difference between the maximum task amount borne by the cluster and the current task amount . according to BSPMaster The amount of tasks assigned to complete is right job The processed data is divided , After data segmentation BSPMaster control GroomServer Assign data to each BSPPeer in .
HAMA-Graph-setup and bsp Operation diagram
Hama-Graph The operation of the system also follows BSP-HAMA Framework of the , It's all in BSP.setup() Method 、BSP.bsp() Method 、BSP.cleanup() Methods and BSP.clear() Method . use BSP.setup() Methods and BSP.cleanup() Methods are used to execute the start calculation and end calculation, and output the calculation results .superstep The calculation of is mainly in BSP.bsp() Method , therefore BSP.bsp() It mainly controls the core part of the whole parallel computing .BSP.clear() The method is used to clear this superstep And prepare for the next step superstep Calculation .

2、 Data loading and superstep initialization
Data loading and superstep The initialization of is mainly started by controlling superstep Calculated BSP.setup() Method .

The allocated data passes through BSP.setup() Methods GraphJobRunner.loadVertices() Method to load data into memory .Job After publishing, use VertexInputReader.parseVertex() Method to parse the data , After parsing the data, use hashMap Of put(getNumPeers,GraphJobMessage) Method to temporarily store data in hashMap in . obtain hashMap Properties of ( That is, the parsed value value ) use BSPPeer.send(peerName,GraphJobMessage) Method in peers Information transfer between , The main contents to be delivered are peer Address and GraphJobMessage Information . But there is no guarantee that messages are sent and received in the same order , So in the barrier synchronization phase , The same message sent may not arrive BSPPeer On , also BSPPeer There is information transmission between them . Wait until the information transmission is completed in the barrier grid synchronization phase MessageQueue.poll() Method BSPPeer Get the message .
Wait until all the data is loaded into memory superstep The initialization ( The whole program is executed only once ).Superstep When initializing, the GraphJobRunner.doInitialSuperstep() To perform the . When all BSPPeer The work to be done after receiving the corresponding message is to superstep Initialization of calculation , This work is regarded as the first... After the drama data is loaded supertep Calculation , Although there are VerticeInfo.startSuperstep() Methods and VerticesInfo.finishstartstep() Method runs but does not really superstep Calculation , Because the vertex is still inactive , Just to set BSPPeer Cluster computing threads .
3、 Barrier synchronization
But how does the information work in the barrier synchronization stage ?

In the process of synchronization, information transmission mainly uses outgoingMessageManager( Output information manager ) and localQueue( Local message queue ), adopt outgoingMessageManager( Output information manager ) hold peer Address and GraphJobMessage Package to outgoingBundles( Output package ) in .
All the information is packaged in outgoingBundle The preparation for information transmission is completed . Then all the information is in the barrier synchronization phase (sync()) To summarize . hold peer Address and GraphJobMessage Information from outgoingBundle After taking it out of the , adopt LocalBSPRunner.tansfer() Method store in hashMap The information in is loaded into localQueueForNextIteration(SynchronizedQueue object ) in , And use MessageQueue.addBundle() Method to package .
When information is loaded into localQueue( Local message queue ) In the after , All the information starts enterBarrier Stage . stay enterBarrier Phases wait by scheduling threads (wait()) Others have not yet entered the barrier synchronization stage . When the last thread that controls the message enters the barrier synchronization phase , A non empty barrier is provided in the structure , Then the current message thread is executed and other message threads are still in the waiting phase , If the last message thread enters the barrier synchronization phase, there is no non empty barrier in the structure , Then each thread enters the preemptive mode , The thread that grabs the execution right will execute the thread first . It is not until all the message threads enter the barrier synchronization phase that they really start to BSPPeer Information transfer between . During the barrier synchronization phase BSPPeer Transmission between is used in AbstractMessageManager.clearOutgoingMessages() Methods localQueueForNextIteration.getMessageQueue() Method uses localQueue The replacement is completed . In the barrier synchronization phase, when all the taskID The associated information for the identification is passed through localQueue Send it out and go to leaveBarrier Stage .
4、superstep Calculation


In a superstep During the calculation BSPPeer Only messages can be sent or the last one can be processed superstep Message received in .
Really start superstep The calculation is in doSuperstep(GpprootraphJobMessage,BSPPeer) Implemented in the method of . When BSPPeer Cluster start superstep Before calculation, you need to use class AtomitInteger Activate the vertex to make it active , Traverse all the information and the arrangement order of vertices , With the same ID The iteration information of starts as the first vertex superstep The calculation of . Method of use startSuperstep() perform superstep The beginning of , Next, customize compute() Method implementation , When no vertex is activated or the number of iterations set by the user is reached, it means superstep end .
边栏推荐
- 每秒處理10萬高並發訂單的樂視集團支付系統架構分享
- mysql账号增删改、数据导入导出命令举例
- 面试知识点
- JSP learning (2) -- JSP script elements and instructions
- 迭代器与生成器
- Add a millennial sign to a number (amount in millennia)
- 交互电子白板有哪些特点?电子白板功能介绍
- Why buy increased life insurance? Is increased life insurance safe and reliable?
- web技术分享| 【高德地图】实现自定义的轨迹回放
- spark的NaiveBayes中文文本分类
猜你喜欢
![Web technology sharing | [Gaode map] to realize customized track playback](/img/0b/25fc8967f5cc2cea626e0b3f2b7594.png)
Web technology sharing | [Gaode map] to realize customized track playback

【微信小程序封装底部弹出框】一

LETV group payment system architecture sharing for processing 100000 high concurrent orders per second

社会担当 广汽本田“梦想童行”倡导儿童道路交通安全

【C语言】深度剖析指针和数组的关系

JSP learning (2) -- JSP script elements and instructions
![[wechat applet custom bottom tabbar]](/img/04/2ea4ab3fd8571499190a9b3c9990b2.png)
[wechat applet custom bottom tabbar]

linux系统维护篇:mysql8.0.13源码下载及安装之“傻瓜式”操作步骤(linux-centos6.8)亲测可用系列
![[C language] use of library function qsort](/img/b0/6e86e31243164479b0f3d960d039ef.png)
[C language] use of library function qsort

Linux system maintenance: mysql8.0.13 source code download and installation "fool" operation steps (Linux centos6.8) test available series
随机推荐
Test for API
面试题之JS判断数据类型的方法
hydra安装及使用
Spark性能调优之道——解决Spark数据倾斜(Data Skew)的N种姿势
spark-cache的源码分析
Spark on data skew
Idea installation summary
spark关于数据倾斜问题
NiO file and folder operation examples
In case of default import failure
Summary of Changan chain usage skills
Add a millennial sign to a number (amount in millennia)
Oracle database and table
mysql 字符串字段转浮点型字段
[wechat applet to obtain the height of custom tabbar] is absolutely available!!!
NiO service multithreaded version
spark-shuffle的读数据源码分析
Shell learning
ABAP query tutorial in sap: sq01, sq02, sq03-017
Redis实现延迟队列的正确姿势