当前位置:网站首页>Spark's wide dependence and narrow dependence yyds dry goods inventory
Spark's wide dependence and narrow dependence yyds dry goods inventory
2022-06-24 23:27:00 【Sunzhongming】
Let's talk about wide dependence and narrow dependence
The core point to distinguish the wide and narrow dependence is Son RDD Of partition With the father RDD Of partition Whether it is 1 Relationship to many ,
If this is the case , Description multiple parents rdd Of partition Need to go through shuffle The process is summarized into a sub rdd Of partition, This is a wide dependency , stay DAGScheduler There will be stage The segmentation of .
Narrow dependence :Narrow Dependency
Father RDD Hezi RDD Is a one-to-one dependency , Such as map,filter

Wide dependence :Shuffle Dependency
Nature is shuffle. Such as reduceByKey,groupyByKey, Father RDD A partition data is given to the child RDD Multiple sections of 
There is shuffle It's just wide dependence , Otherwise, it is narrow dependence
RDD As a data structure , It's essentially a Read only partition record set . One RDD Can contain multiple partitions , Each partition is a piece of dataset .
First , Narrow dependencies can be supported in The same node On , With pipeline Form to execute multiple commands ( Also called the same Stage The operation of ), For example, in the implementation of map after , Followed by execution filter. contrary , Wide dependency requires that all parent partitions be available , You may need to call something like MapReduce And so on Cross node transfer .
secondly , From the perspective of failure recovery . Failure recovery with narrow dependency is more effective , Because it just needs to recalculate the lost parent partition that will do , And it can be recomputed on different nodes in parallel ( If a machine is too slow, it will be rescheduled to multiple nodes ).
边栏推荐
- 斐波那契
- Laravel authentication module auth
- Use of laravel verifier
- Detailed explanation of online group chat and dating platform project (servlet implementation)
- R语言使用MatchIt包进行倾向性匹配分析、使用match.data函数构建匹配后的样本集合、通过双样本t检验分析(双独立样本t检验)来判断倾向性评分匹配后样本中的所有协变量的平衡情况
- 7-8 循环日程安排问题
- Basic data type
- 宁德时代定增450亿:高瓴认购30亿 曾毓群仍控制23%股权
- [JS] - [array, stack, queue, linked list basics] - Notes
- 当初吃土建起来的“中台”,现在为啥不香了?
猜你喜欢

Huawei machine learning service speech recognition function enables applications to paint "sound" and color

From client to server

记录一下MySql update会锁定哪些范围的数据

华为机器学习服务语音识别功能,让应用绘“声”绘色

Dig deep into MySQL - resolve the clustered index / secondary index / federated index of InnoDB storage engine

明天就是PMP考试了(6月25日),这些大家都了解了吗?

Latest development of jetpack compose

SimpleDateFormat 格式化和解析日期的具体类

7-7 数字三角形

Record the range of data that MySQL update will lock
随机推荐
R语言使用glm函数构建泊松对数线性回归模型处理三维列联表数据构建饱和模型、使用summary函数获取模型汇总统计信息、解读模型系数交互作用及其显著性
伪原创智能改写api百度-收录良好
7-2 求解买股票问题
Docker-mysql8-master-slave
Building Survey [3]
Laravel study notes
Actipro WPF Controls 2022.1.2
【js】-【数组应用】-学习笔记
UNION ALL UNION FULL JOIN
Laravel message queue
还在用 SimpleDateFormat 做时间格式化?小心项目崩掉
How should we measure agile R & D projects?
当初吃土建起来的“中台”,现在为啥不香了?
EMI的主要原因-工模电流
R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses exp function, confint function and coef function to obtain the confidence
R language dplyr package group_ By function and summarize_ The at function calculates the dataframe to calculate the number of counts and the mean value of different groups (summary data by category v
Main cause of EMI - mold current
安装IBM CPLEX学术版 academic edition | conda 安装 CPLEX
Dig deep into MySQL - resolve the difference between clustered and non clustered indexes
What good smart home brands in China support homekit?