当前位置:网站首页>Remember that spark foreachpartition once led to oom
Remember that spark foreachpartition once led to oom
2022-07-25 15:16:00 【The south wind knows what I mean】
List of articles
Problem description
spark streaming Program line reporting error The log is as follows :
org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.spark.broadcast.TorrentBroadcast$.$anonfun$blockifyObject$1(TorrentBroadcast.scala:306)
at org.apache.spark.broadcast.TorrentBroadcast$.$anonfun$blockifyObject$1$adapted(TorrentBroadcast.scala:306)
at org.apache.spark.broadcast.TorrentBroadcast$$$Lambda$2411/66155661.apply(Unknown Source)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.toChunkedByteBuffer(ChunkedByteBufferOutputStream.scala:114)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:315)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:137)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:91)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:35)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:77)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1479)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1223)
at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1118)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1061)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2196)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Cause analysis :
According to the log information Locate in the code
dataFrame.foreachPartition
1.foreachPartition Introduce

2. It was used foreachPartition After operator , Where are the benefits ?
1、 For what we wrote function function , Just call it once , One at a time partition All data
2、 Mainly create or obtain a database connection
3、 Just send it to the database once SQL Statement and multiple sets of parameters
In the actual production environment , Clear and uniform , Is to use foreachPartition operation ; But there's a problem , Follow mapPartitions Same operation , If one partition The number of is really very large , For example, it's really 100 ten thousand , That's basically unreliable .
Come in at once , It's very likely that OOM, Memory overflow problem .
Solution :
1. Program plus memory
边栏推荐
猜你喜欢

如何解决Visual Stuido2019 30天体验期过后的登陆问题

【JS高级】js之正则相关函数以及正则对象_02

Raft of distributed consistency protocol

SPI传输出现数据与时钟不匹配延后问题分析与解决

记一次Spark foreachPartition导致OOM

记一次Yarn Required executor memeory is above the max threshold(8192MB) of this cluster!

oracle_12505错误解决方法

Overview of JS synchronous, asynchronous, macro task and micro task

基于OpenCV和YOLOv3的目标检测实例应用

反射-笔记
随机推荐
C language function review (pass value and address [binary search], recursion [factorial, Hanoi Tower, etc.))
在win10系统下使用命令查看WiFi连接密码
【微信小程序】小程序宿主环境详解
MySQL installation and configuration super detailed tutorial and simple database and table building method
Spark-SQL UDF函数
记一次redis超时
The number of query results of maxcompute SQL is limited to 1W
Gbdt source code analysis of boosting
Instance Tunnel 使用
图片的懒加载
Iframe nested other website page full screen settings
TypeScript学习2——接口
Fast-lio: fast and robust laser inertial odometer based on tightly coupled IEKF
System. Accessviolationexception: an attempt was made to read or write to protected memory. This usually indicates that other memory is corrupted
How much memory can a program use at most?
Hbck 修复问题
dpdk 收发包问题案例:使用不匹配的收发包函数触发的不收包问题定位
Spark002---spark任务提交,传入json作为参数
ice 100G 网卡分片报文 hash 问题
ESXI6.7.0 升级到7.0U3f(2022年7月12 更新)