当前位置:网站首页>Spark's wide dependence and narrow dependence yyds dry goods inventory
Spark's wide dependence and narrow dependence yyds dry goods inventory
2022-06-24 23:27:00 【Sunzhongming】
Let's talk about wide dependence and narrow dependence
The core point to distinguish the wide and narrow dependence is Son RDD Of partition With the father RDD Of partition Whether it is 1 Relationship to many ,
If this is the case , Description multiple parents rdd Of partition Need to go through shuffle The process is summarized into a sub rdd Of partition, This is a wide dependency , stay DAGScheduler There will be stage The segmentation of .
Narrow dependence :Narrow Dependency
Father RDD Hezi RDD Is a one-to-one dependency , Such as map,filter
Wide dependence :Shuffle Dependency
Nature is shuffle. Such as reduceByKey,groupyByKey, Father RDD A partition data is given to the child RDD Multiple sections of
There is shuffle It's just wide dependence , Otherwise, it is narrow dependence
RDD As a data structure , It's essentially a Read only partition record set . One RDD Can contain multiple partitions , Each partition is a piece of dataset .
First , Narrow dependencies can be supported in The same node On , With pipeline Form to execute multiple commands ( Also called the same Stage The operation of ), For example, in the implementation of map after , Followed by execution filter. contrary , Wide dependency requires that all parent partitions be available , You may need to call something like MapReduce And so on Cross node transfer .
secondly , From the perspective of failure recovery . Failure recovery with narrow dependency is more effective , Because it just needs to recalculate the lost parent partition that will do , And it can be recomputed on different nodes in parallel ( If a machine is too slow, it will be rescheduled to multiple nodes ).
边栏推荐
猜你喜欢
[JS] - [linked list - application] - learning notes
[JS] - [stack, team - application] - learning notes
安装IBM CPLEX学术版 academic edition | conda 安装 CPLEX
Case analysis: using "measurement" to improve enterprise R & D efficiency | ones talk
Dig deep into MySQL - resolve the non clustered index of MyISAM storage engine
MySQL 表的增删查改
[JS] - [tree] - learning notes
7-6 铺设油井管道
RT-thread使用rt-kprintf
Mousse shares listed on Shenzhen Stock Exchange: becoming popular by mattress and "foreign old man", with a market value of 22.4 billion yuan
随机推荐
Construction equipment [6]
15 lines of code using mathematical formulas in wangeditor V5
7-7 数字三角形
Listen to the markdown file and hot update next JS page
376. machine tasks
Selective sort method
R语言dplyr包select函数将dataframe数据中的指定数据列移动到dataframe数据列中的第一列(首列)
257. 关押罪犯
Collation of Digital IC design experience (II)
R语言使用nnet包的multinom函数构建无序多分类logistic回归模型、使用AIC函数比较两个模型的AIC值的差异(简单模型和复杂模型)
HarmonyOS访问数据库实例(3)--用ORM Bee测下HarmonyOS到底有多牛
【js】-【链表-应用】-学习笔记
记录一下MySql update会锁定哪些范围的数据
【js】-【数组应用】-学习笔记
Laravel user authorization
376. 機器任務
golang map clear
378. 骑士放置
372. chessboard coverage
idea创建模块提示已存在