当前位置:网站首页>Specific meanings of node and edge in Flink graph

Specific meanings of node and edge in Flink graph

2022-06-25 11:39:00 Direction_ Wind

 Insert picture description here
StreamGraph: It's based on the user's adoption of Stream API The original diagram generated by the written code . Used to represent the topology of a program .
JobGraph:StreamGraph After optimization, it generates JobGraph, Submit to JobManager Data structure of . The main optimization is , Multiple nodes that meet the conditions chain Together as a node , This reduces the serialization required for data to flow between nodes / Deserialization / Transmission consumption .
ExecutionGraph:JobManager according to JobGraph Generate ExecutionGraph.ExecutionGraph yes JobGraph Parallel version of , It is the core data structure of scheduling layer .
Physical execution diagram :JobManager according to ExecutionGraph Yes Job After scheduling , In all TaskManager Upper Department Task After the formation of “ chart ”, It's not a specific data structure .

StreamGraph It's a mapping of user logic .JobGraph On this basis, some optimizations have been made , For example, fuck a part
Make string chain To improve efficiency .ExecutionGraph It exists for scheduling , Added the concept of parallel processing . And in the
What is really implemented on this basis is Task And its related structures .
Here is a brief explanation of some nouns .
StreamGraph: According to the user through Stream API The original diagram generated by the written code .
StreamNode: To represent operator Class , And has all the relevant properties , Such as concurrency 、 In and out, etc .
StreamEdge: Means to connect two StreamNode The edge of .
JobGraph:StreamGraph After optimization, it generates JobGraph, Submit to JobManager Data structure of .
JobVertex: After optimization, there are many qualified StreamNode May be chain Together, generate a JobVertex, That is, a * * JobVertex Contains one or more operator,JobVertex The input is JobEdge, The output is IntermediateDataSet.
IntermediateDataSet: Express JobVertex Output , That is, through operator Process the resulting data set .producer yes JobVertex,consumer yes JobEdge.
JobEdge: On behalf of job graph A data transmission channel in .source yes IntermediateDataSet,target yes JobVertex. That is, the data passes through JobEdge from IntermediateDataSet Pass to target JobVertex.
ExecutionGraph:JobManager according to JobGraph Generate ExecutionGraph.ExecutionGraph yes JobGraph Parallel version of , It is the core data structure of scheduling layer .
ExecutionJobVertex: and JobGraph Medium JobVertex One-to-one correspondence . every last ExecutionJobVertex There are as many as concurrency ExecutionVertex.
ExecutionVertex: Express ExecutionJobVertex One of the concurrent subtasks of , Input is ExecutionEdge, The output is IntermediateResultPartition.
IntermediateResult: and JobGraph Medium IntermediateDataSet One-to-one correspondence . One IntermediateResult Contains multiple IntermediateResultPartition, The number of them is equal to this operator The concurrency of .
IntermediateResultPartition: Express ExecutionVertex An output partition of ,producer yes ExecutionVertex,consumer It's a number of ExecutionEdge.
ExecutionEdge: Express ExecutionVertex The input of ,source yes IntermediateResultPartition,target yes ExecutionVertex.source and target Can only be one .
Execution: It's the execution of a ExecutionVertex An attempt to . In case of failure or data recalculation ExecutionVertex There may be more than one ExecutionAttemptID. One Execution adopt ExecutionAttemptID To uniquely identify .JM and TM About task The deployment and task status All updates are made through ExecutionAttemptID To determine the message recipient .
Physical execution diagram :JobManager according to ExecutionGraph Yes Job After scheduling , In all TaskManager Upper Department Task After the formation of “ chart ”, It's not a specific data structure .
Task:Execution Assigned after being dispatched TaskManager Start the corresponding Task.Task Wrapped with user execution logic operator.
ResultPartition: Represents by a Task The generated data , and ExecutionGraph Medium IntermediateResultPartition One-to-one correspondence .
ResultSubpartition: yes ResultPartition A subarea of . Every ResultPartition Contains multiple ResultSubpartition, The amount will be consumed by the downstream Task Sum of numbers DistributionPattern To decide .
InputGate: representative Task Input encapsulation of , and JobGraph in JobEdge One-to-one correspondence . Every InputGate Consumed one or more ResultPartition.
InputChannel: Every InputGate Will contain more than one InputChannel, and ExecutionGraph Medium ExecutionEdge One-to-one correspondence , And also ResultSubpartition One to one connection , That is, a InputChannel Receive one ResultSubpartition Output .

原网站

版权声明
本文为[Direction_ Wind]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202200537152838.html