当前位置:网站首页>Spark累加器和广播变量
Spark累加器和广播变量
2022-06-24 06:39:00 【Angryshark_128】
累加器
累加器有些类似Redis的计数器,但要比计数器强大,不仅可以用于计数,还可以用来累加求和、累加合并元素等。
假设我们有一个word.txt文本,我们想要统计该文本中单词“sheep”的行数,我们可以直接读取文本filter过滤然后计数。
sc.textFile("word.txt").filter(_.contains("sheep")).count()
假设我们想分别统计文本中单词"sheep""wolf"的行数,如果按照上述方法需要计算两次
sc.textFile("word.txt").filter(_.contains("sheep")).count()
sc.textFile("word.txt").filter(_.contains("wolf")).count()
如果要分别统计100个单词的行数,则要计算100次
如果使用累加器,则只需要读一次即可
val count1=sc.acccumlator(0)
val count2=sc.acccumlator(0)
...
def processLine(line:String):Unit{
if(line.contains("sheep")){
count1+=1
}
if(line.contains("wolf")){
count2+=1
}
...
}
sc.textFile("word.txt").foreach(processLine(_))
不仅Int类型可以累加,Long、Double、Collection也可以累加,还可以进行自定义,而且这个变量可以在Spark的WebUI界面看到。
注意:累加器只能在Driver端定义和读取,不能在Executor端读取。
广播变量
广播变量允许缓存一个只读的变量在每台机器(worker)上面,而不是每个任务(task)保存一份备份。利用广播变量能够以一种更有效率的方式将一个大数据量输入集合的副本分配给每个节点。
广播变量通过两个方面提高数据共享效率:
(1)集群中每个节点(物理机器)只有一个副本,默认的闭包是每个任务一个副本;
(2)广播传输是通过BT下载模式实现的,也就是P2P下载,在集群多的情况下,可以极大地提高数据传输速率。广播变量修改后,不会反馈到其他节点。
val list=sc.parallize(0 to 10)
val brdList=sc.broadcast(list)
sc.textFile("test.txt").filter(brdList.value.contains(_.toInt)).foreach(println)
使用时,需注意:
(1)适用于小变量分发,对于动则几十M的变量,每个任务都发送一次既消耗内存,也浪费时间
(2)广播变量只能在driver端定义,在Executor端读取,Executor不能修改
边栏推荐
- Come on, it's not easy for big factories to do projects!
- Easyscreen live streaming component pushes RTSP streams to easydarwin for operation process sharing
- Open source and innovation
- Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.
- typescript vscode /bin/sh: ts-node: command not found
- On BOM and DOM (1): overview of BOM and DOM
- Do you want to research programming? I got six!
- Application of intelligent reservoir management based on 3D GIS system
- 潞晨科技获邀加入NVIDIA初创加速计划
- 35岁危机?内卷成程序员代名词了
猜你喜欢

缓存操作rockscache原理图
![Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.](/img/2c/d04f5dfbacb62de9cf673359791aa9.png)
Command ‘[‘where‘, ‘cl‘]‘ returned non-zero exit status 1.

Localized operation on cloud, the sea going experience of kilimall, the largest e-commerce platform in East Africa

记录--关于JSP前台传参数到后台出现乱码的问题

puzzle(019.1)Hook、Gear

网吧管理系统与数据库

数据库 存储过程 begin end

Application of intelligent reservoir management based on 3D GIS system

puzzle(019.1)Hook、Gear

Internet cafe management system and database
随机推荐
【JUC系列】Executor框架之CompletionFuture
sql join的使用
Become TD hero, a superhero who changes the world with Technology | invitation from tdengine community
Centos7 deploying mysql-5.7
Source code analysis of current limiting component uber/ratelimit
[binary number learning] - Introduction to trees
雲監控系統 HertzBeat v1.1.0 發布,一條命令開啟監控之旅!
Internet cafe management system and database
FreeRTOS MPU使系统更健壮!
Application of O & M work order
Flutter environment installation & operation
How to give full play to the advantages of Internet of things by edge computing intelligent gateway
Application configuration management, basic principle analysis
原神方石机关解密
Talk about how to dynamically specify feign call service name according to the environment
On BOM and DOM (3): DOM node operation - element style modification and DOM content addition, deletion, modification and query
Game website making tutorial and correct view of games
Nine unique skills of Huawei cloud low latency Technology
What are the easy-to-use character recognition software? Which are the mobile terminal and PC terminal respectively
What is domain name resolution? What if the domain name cannot be resolved?