当前位置:网站首页>Once spark reported an error: failed to allocate a page (67108864 bytes), try again
Once spark reported an error: failed to allocate a page (67108864 bytes), try again
2022-07-25 15:15:00 【The south wind knows what I mean】
Project scenario :
There is a demand from the business side , We need two tables to complete join operation , Watch (4800 Ten thousand ) The big table (26 Billion bars ). Typical small and large watches join, The first thing that comes to mind Broadcast Join Make the best of it .
Problem description
1, Open the door .
//sc It's a small table.
select /*+ BROADCASTJOIN(sc) */
sc.courseid,
csc.courseid
from sale_course sc join course_shopping_cart csc
on sc.courseid=csc.courseid
2, Pack cluster run, Start to bug
2022-06-22 19:36:56 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:36:57 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:36:59 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:00 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:00 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:03 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:03 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:04 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:05 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:05 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 2 with no recent heartbeats: 139818 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 5 with no recent heartbeats: 178273 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 7 with no recent heartbeats: 162256 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 3 with no recent heartbeats: 154289 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) 2
3, After reading it, I think there is insufficient memory , A print GC Look at the log again
2022-06-22T19:32:04.731+0800: [GC (Allocation Failure) [PSYoungGen: 994157K->47291K(1377280K)] 1061069K->240591K(4076032K), 0.2125657 secs] [Times: user=4.51 sys=0.35, real=0.21 secs]
2022-06-22T19:32:12.667+0800: [GC (Allocation Failure) [PSYoungGen: 1298524K->69107K(1380352K)] 1491823K->776885K(4079104K), 0.4118997 secs] [Times: user=12.93 sys=1.20, real=0.41 secs]
2022-06-22T19:32:30.661+0800: [GC (Allocation Failure) [PSYoungGen: 1363073K->305779K(1643520K)] 2070852K->1248436K(4342272K), 0.2067380 secs] [Times: user=6.53 sys=0.68, real=0.21 secs]
2022-06-22T19:32:49.327+0800: [GC (Allocation Failure) [PSYoungGen: 1583420K->380843K(1685504K)] 2526077K->1558689K(4384256K), 0.2134726 secs] [Times: user=6.50 sys=1.14, real=0.21 secs]
2022-06-22T19:32:57.628+0800: [GC (Allocation Failure) [PSYoungGen: 1677943K->386985K(1469440K)] 2855790K->1938110K(4168192K), 0.1938505 secs] [Times: user=6.17 sys=0.87, real=0.19 secs]
2022-06-22T19:33:10.943+0800: [GC (Allocation Failure) [PSYoungGen: 1424669K->489773K(1547776K)] 2975793K->2158027K(4246528K), 0.1824065 secs] [Times: user=6.34 sys=0.27, real=0.19 secs]
2022-06-22T19:33:18.556+0800: [GC (Allocation Failure) [PSYoungGen: 1523628K->501866K(1313280K)] 4240457K->3578994K(5061120K), 0.1838270 secs] [Times: user=5.74 sys=0.84, real=0.18 secs]
2022-06-22T19:33:19.956+0800: [GC (Allocation Failure) [PSYoungGen: 1214502K->632842K(1397248K)] 4291630K->3972122K(5145088K), 0.2161871 secs] [Times: user=7.20 sys=0.64, real=0.21 secs]
2022-06-22T19:33:20.172+0800: [Full GC (Ergonomics) [PSYoungGen: 632842K->0K(1397248K)] [ParOldGen: 3339280K->3514303K(4194304K)] 3972122K->3514303K(5591552K), [Metaspace: 136487K->136476K(1177600K)], 0.6284626 secs] [Times: user=6.74 sys=3.98, real=0.63 secs]
2022-06-22T19:33:22.153+0800: [GC (Allocation Failure) [PSYoungGen: 726892K->459232K(1398272K)] 4241195K->3973535K(5592576K), 0.0348947 secs] [Times: user=0.96 sys=0.00, real=0.04 secs]
2022-06-22T19:33:23.347+0800: [GC (Allocation Failure) [PSYoungGen: 1158624K->656153K(1398272K)] 4672927K->4367065K(5592576K), 0.1967581 secs] [Times: user=6.70 sys=0.44, real=0.19 secs]
2022-06-22T19:33:23.544+0800: [Full GC (Ergonomics) [PSYoungGen: 656153K->131072K(1398272K)] [ParOldGen: 3710911K->4169346K(4194304K)] 4367065K->4300418K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 1.7445365 secs] [Times: user=46.91 sys=10.81, real=1.75 secs]
2022-06-22T19:33:26.442+0800: [Full GC (Ergonomics) [PSYoungGen: 830464K->524355K(1398272K)] [ParOldGen: 4169346K->4169283K(4194304K)] 4999810K->4693638K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 0.5643075 secs] [Times: user=14.75 sys=0.14, real=0.57 secs]
2022-06-22T19:33:27.323+0800: [Full GC (Ergonomics) [PSYoungGen: 664059K->589892K(1398272K)] [ParOldGen: 4169283K->4169282K(4194304K)] 4833342K->4759175K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 0.3743719 secs] [Times: user=10.16 sys=0.05, real=0.38 secs]
2022-06-22T19:33:27.909+0800: [Full GC (Ergonomics) [PSYoungGen: 699392K->655430K(1398272K)] [ParOldGen: 4169282K->4169282K(4194304K)] 4868674K->4824713K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 0.4272478 secs] [Times: user=11.16 sys=0.05, real=0.43 secs]
2022-06-22T19:33:28.382+0800: [Full GC (Ergonomics) [PSYoungGen: 668779K->655430K(1398272K)] [ParOldGen: 4169282K->4169282K(4194304K)] 4838062K->4824713K(5592576K), [Metaspace: 136486K->136486K(1177600K)], 0.2751700 secs] [Times: user=6.67 sys=0.03, real=0.28 secs]
2022-06-22T19:33:28.657+0800: [Full GC (Allocation Failure) [PSYoungGen: 655430K->655430K(1398272K)] [ParOldGen: 4169282K->4162677K(4194304K)] 4824713K->4818107K(5592576K), [Metaspace: 136486K->135746K(1177600K)], 0.6008903 secs] [Times: user=17.76 sys=0.08, real=0.60 secs]
2022-06-22T19:33:29.260+0800: [Full GC (Ergonomics) [PSYoungGen: 659800K->655438K(1398272K)] [ParOldGen: 4162677K->4162674K(4194304K)] 4822477K->4818112K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 1.4037111 secs] [Times: user=46.99 sys=0.27, real=1.40 secs]
2022-06-22T19:33:30.664+0800: [Full GC (Allocation Failure) [PSYoungGen: 655438K->655431K(1398272K)] [ParOldGen: 4162674K->4162674K(4194304K)] 4818112K->4818105K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 0.1268273 secs] [Times: user=1.35 sys=0.02, real=0.13 secs]
2022-06-22T19:33:30.792+0800: [Full GC (Ergonomics) [PSYoungGen: 658317K->655447K(1398272K)] [ParOldGen: 4162674K->4162674K(4194304K)] 4820992K->4818121K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 1.2769239 secs] [Times: user=42.48 sys=0.27, real=1.28 secs]
2022-06-22T19:33:32.069+0800: [Full GC (Allocation Failure) [PSYoungGen: 655447K->655440K(1398272K)] [ParOldGen: 4162674K->4162674K(4194304K)] 4818121K->4818114K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 0.2098295 secs] [Times: user=2.81 sys=0.02, real=0.21 secs]
2022-06-22T19:33:32.282+0800: [Full GC (Ergonomics) [PSYoungGen: 657391K->655457K(1398272K)] [ParOldGen: 4162674K->4162673
Cause analysis :
In fact, seeing this, I know where the problem is , Out of memory , Under adjustment executor Memory and driver Memory , Generally, it can be solved
But I'm still reviewing the radio join Well
1. radio broadcast join principle
Spark join Strategy , If a small table is small enough and can be cached into memory first , Then you can use Broadcast Hash Join, The principle is to
Aggregate small tables into driver End, Then broadcast to each large table partition , So do it again join When , Compare the data of each partition of the large table with the small table locally join, Thus avoiding shuffle.
#1, Specify auto broadcast by parameter
radio broadcast join The default value is 10MB, from spark.sql.autoBroadcastJoinThreshold Parameter control .
SparkConf().set("spark.sql.autoBroadcastJoinThreshold","10m") // Turn on
SparkConf().set("spark.sql.autoBroadcastJoinThreshold","-1") // Ban
#2, Forcibly turn on the broadcast join
#SQL Hint The way
#sc Must be join My little watch
select /*+ BROADCASTJOIN(sc) */ or /*+ BROADCAST(sc) */ or /*+ MAPJOIN(sc) */
2, Tell me about my problem
It says radio join The data of the small table Pull to driver paragraph , therefore driver Memory cannot be too small , If you give too little, you will report an error
however , I put driver The problem is still unsolved after the memory is increased
Because my small table has too much data , We can't give too much memory to the cluster , but
Solution :
To do that ?
Then don't broadcast join 了 , Just ordinary join Well, it's slower But the hardware resources are there. There is no way
The last two tables join For two hours QAQ
边栏推荐
猜你喜欢

ESXI6.7.0 升级到7.0U3f(2022年7月12 更新)

Single or multiple human posture estimation using openpose

Boosting之GBDT源码分析

Splice a field of the list set into a single string

Outline and box shadow to achieve the highlight effect of contour fillet

什么是物联网

bridge-nf-call-ip6tables is an unknown key异常处理

Visual Studio 2022 查看类关系图

Leetcode combination sum + pruning

How to solve the problem of scanf compilation error in Visual Studio
随机推荐
Promise object and macro task, micro task
C#,C/S升级更新
简易轮播图和打地鼠
Rediscluster setup and capacity expansion
C语言函数复习(传值传址【二分查找】,递归【阶乘,汉诺塔等】)
sql server强行断开连接
图片的懒加载
SPI传输出现数据与时钟不匹配延后问题分析与解决
记一次redis超时
MySql的安装配置超详细教程与简单的建库建表方法
如何解决Visual Stuido2019 30天体验期过后的登陆问题
解决DBeaver SQL Client 连接phoenix查询超时
Bridge NF call ip6tables is an unknown key exception handling
Meanshift clustering-01 principle analysis
防抖(debounce)和节流(throttle)
VMware Workstation fails to start VMware authorization service when opening virtual machine
bridge-nf-call-ip6tables is an unknown key异常处理
Vs2010添加wap移动窗体模板
C, c/s upgrade update
Spark SQL空值Null,NaN判断和处理