当前位置:网站首页>Spark accumulators and broadcast variables
Spark accumulators and broadcast variables
2022-06-24 07:00:00 【Angryshark_ one hundred and twenty-eight】
accumulator
Accumulators are somewhat similar Redis The counter of , But it's more powerful than a counter , Not only can it be used to count , It can also be used to accumulate and sum 、 Accumulate and merge elements, etc .
Suppose we have a word.txt Text , We want to count the words in the text “sheep” The number of rows , We can read the text directly filter Filter and count .
sc.textFile("word.txt").filter(_.contains("sheep")).count()
Suppose we want to count the words in the text separately "sheep""wolf" The number of rows , If it needs to be calculated twice according to the above method
sc.textFile("word.txt").filter(_.contains("sheep")).count()
sc.textFile("word.txt").filter(_.contains("wolf")).count()
If you want to make statistics separately 100 Lines of words , Then calculate 100 Time
If an accumulator is used , You only need to read it once
val count1=sc.acccumlator(0)
val count2=sc.acccumlator(0)
...
def processLine(line:String):Unit{
if(line.contains("sheep")){
count1+=1
}
if(line.contains("wolf")){
count2+=1
}
...
}
sc.textFile("word.txt").foreach(processLine(_))
Not only Int Types can be accumulated ,Long、Double、Collection You can also add up , You can also customize , And this variable can be in Spark Of WebUI See the interface .
Be careful : Accumulator can only be Driver End definition and reading , Can't be in Executor End read .
Broadcast variables
Broadcast variables allow caching of a read-only variable on each machine (worker) above , Not every task (task) Save a backup . Using broadcast variables, a copy of a large data input set can be allocated to each node in a more efficient way .
Broadcast variables improve data sharing efficiency in two ways :
(1) Each node in the cluster ( Physical machines ) There is only one copy , The default closure is a copy of each task ;
(2) Broadcast transmission is through BT Download mode , That is to say P2P download , When there are many clusters , It can greatly improve the data transmission rate . After the broadcast variable is modified , No feedback to other nodes .
val list=sc.parallize(0 to 10)
val brdList=sc.broadcast(list)
sc.textFile("test.txt").filter(brdList.value.contains(_.toInt)).foreach(println)
When using , Attention should be paid to :
(1) Apply to Small variable distribution , For motion, there are dozens of M The variable of , Each task is sent once, which consumes memory , It's a waste of time
(2) Broadcast variables can only be driver End definition , stay Executor End read ,Executor Do not modify
边栏推荐
- Let's talk about BOM and DOM (5): dom of all large Rovers and the pits in BOM compatibility
- 网吧管理系统与数据库
- CloudCompare&PCL 点云裁剪(基于裁剪盒)
- The data synchronization tool dataX has officially supported reading and writing tdengine
- Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.
- With a goal of 50million days' living, pwnk wants to build a "Disneyland" for the next generation of young people
- [binary tree] - middle order traversal of binary tree
- On BOM and DOM (3): DOM node operation - element style modification and DOM content addition, deletion, modification and query
- Do you want to research programming? I got six!
- 开源与创新
猜你喜欢
![Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.](/img/2c/d04f5dfbacb62de9cf673359791aa9.png)
Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.

C language student management system - can check the legitimacy of user input, two-way leading circular linked list

潞晨科技获邀加入NVIDIA初创加速计划
![[binary number learning] - Introduction to trees](/img/7d/c01bb58bc7ec9c88f857d21ed07a2d.png)
[binary number learning] - Introduction to trees

File system notes

Leetcode: Sword finger offer 26: judge whether T1 contains all topologies of T2

oracle sql综合运用 习题
![[JUC series] completionfuture of executor framework](/img/d0/c26c9b85d1c1b0da4f1a6acc6d33e3.png)
[JUC series] completionfuture of executor framework

缓存操作rockscache原理图

Challenges brought by maker education to teacher development
随机推荐
SAP实施项目上的内部顾问与外部顾问,相互为难还是相互成就?【英文版】
Interpreting top-level design of AI robot industry development
Kubernets traifik proxy WS WSS application
Easy car Interviewer: talk about MySQL memory structure, index, cluster and underlying principle!
Do you want to research programming? I got six!
Do you know about Statistics?
How long will it take for us to have praise for Shopify's market value of 100 billion yuan?
基于三维GIS系统的智慧水库管理应用
go 断点续传
Cloudcompare & PCL point cloud clipping (based on clipping box)
Open source and innovation
When the VPC main network card has multiple intranet IP addresses, the server cannot access the network internally, but the server can be accessed externally. How to solve this problem
记录--关于JSP前台传参数到后台出现乱码的问题
【二叉树】——二叉树中序遍历
数据库 存储过程 begin end
Basic knowledge of wechat applet cloud development literacy chapter (I) document structure
Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.
智能视觉组A4纸识别样例
On BOM and DOM (3): DOM node operation - element style modification and DOM content addition, deletion, modification and query
应用配置管理,基础原理分析