当前位置:网站首页>Spark BUG實踐(包含的BUG:ClassCastException;ConnectException;NoClassDefFoundError;RuntimeExceptio等。。。。)
Spark BUG實踐(包含的BUG:ClassCastException;ConnectException;NoClassDefFoundError;RuntimeExceptio等。。。。)
2022-06-27 22:51:00 【wr456wr】
環境
scala版本:2.11.8
jdk版本:1.8
spark版本:2.1.0
hadoop版本:2.7.1
ubuntu版本:18.04
window版本:win10
scala代碼在windows端編程,ubuntu在虛擬機安裝,scala,jdk,spark,hadoop都安裝在ubuntu端
問題一
問題描述:在使用wget下載scala時,出現 Unable to establish SSL connection

解决:
加上跳過驗證證書的參數--no-check-certificate
問題二
問題描述:在使用scala程序測試wordCount程序時出現錯誤:
(scala程序在主機,spark安裝在虛擬機)
22/06/20 22:35:38 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://192.168.78.128:7077...
22/06/20 22:35:41 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 192.168.78.128:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
...
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /192.168.78.128:7077
Caused by: java.net.ConnectException: Connection refused: no further information

解决:
對spark下的conf目錄內的spark-env.sh進行配置如下

配置後在spark目錄下啟動master和worker
bin/start-all.sh
之後再次運行wordCount程序,出現如下錯誤
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/06/20 22:44:09 INFO SparkContext: Running Spark version 2.4.8
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration$DeprecationDelta
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:123)
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-935LgtJu-1655814306434)(開發流程.assets/1655736581056.png)]](/img/05/ced290f170c5a54cf5f6e8b3085318.png)
在pom文件內引入hadooop依賴
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
刷新依賴後運行程序,出現:
22/06/20 22:50:31 INFO spark.SparkContext: Running Spark version 2.4.8
22/06/20 22:50:31 INFO spark.SparkContext: Submitted application: wordCount
22/06/20 22:50:31 INFO spark.SecurityManager: Changing view acls to: Administrator
22/06/20 22:50:31 INFO spark.SecurityManager: Changing modify acls to: Administrator
22/06/20 22:50:31 INFO spark.SecurityManager: Changing view acls groups to:
22/06/20 22:50:31 INFO spark.SecurityManager: Changing modify acls groups to:
22/06/20 22:50:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); groups with view permissions: Set(); users with modify permissions: Set(Administrator); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
at org.apache.spark.network.util.NettyMemoryMetrics.<init>(NettyMemoryMetrics.java:76)

更新spark-core依賴版本,原本是2.1.0,現在更新為2.3.0,spark-core的pom依賴如下
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.3.0</version>
</dependency>
刷新依賴後再次運行又出現連接問題
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-5aCGosbR-1655814306436)(開發流程.assets/1655737140250.png)]](/img/62/19cc73fb261628bd758bd1cc02ac9e.png)
修改pom依賴為
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mesos_2.11</artifactId>
<version>2.1.0</version>
</dependency>
刷新後再次報錯:java.lang.RuntimeException: java.lang.NoSuchFieldException: DEFAULT_TINY_CACHE_SIZE
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-a2eBPfzi-1655814306436)(開發流程.assets/1655779696035.png)]](/img/ea/521961d605216fe83d5422faffc33f.png)
添加io.netty依賴
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.0.52.Final</version>
</dependency>
運行後再次出現連接問題
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-b9NjY23a-1655814306437)(開發流程.assets/1655780018354.png)]](/img/63/09ae16b7418166366aa161ee9177ee.png)
查看虛擬機的spark的master啟動情况
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-TAi3JuIs-1655814306437)(開發流程.assets/1655780121179.png)]](/img/5f/6b896bd68648184d028f9cc85eb78d.png)
確實都啟動成功,排除未啟動成功原因
修改spark的conf下的spark-env .sh文件
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-VY3mBPyk-1655814306437)(開發流程.assets/1655780670484.png)]](/img/ee/66bbe637481253ae3a9ecd0df678e9.png)
重新啟動spark
sbin/start-all.sh
啟動程序後成功連接到虛擬機的spark的master
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-LnC2GfcA-1655814306438)(開發流程.assets/1655780808544.png)]](/img/dd/2ccd07fea591e7b0950517d92d97ab.png)
問題三
問題描述:
運行scala的wordCount出現:com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.13.0
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-BFx03mtx-1655814306438)(開發流程.assets/1655781028153.png)]](/img/46/b28492219cd898cf3cc30b2e7b227e.png)
這是由於Jackson這個工具庫的版本不一致導致的。解决方案:首先在Kafka的依賴項中,排除對於Jackon的依賴,從而阻止Maven自動導入高版本的庫,隨後手動添加較低版本Jackon庫的依賴項,重新import即可。
添加依賴:
<dependency>
<groupId> org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>1.1.1</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.6</version>
</dependency>
導入後重新運行程序,再次出現問題:
NoClassDefFoundError: com/fasterxml/jackson/core/util/JacksonFeature
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-SLr4iDc6-1655814306438)(開發流程.assets/1655781408043.png)]](/img/44/bac3f27b5292f815a7d430dfbbd675.png)
原因是jackson依賴不全,導入jackson依賴
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.7</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.6.7</version>
</dependency>
再次運行出現:Exception in thread “main” java.net.ConnectException: Call From WIN-P2FQSL3EP74/192.168.78.1 to 192.168.78.128:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

推測這應該是hadoop連接拒絕,而不是spark的master連接問題
修改hadoop的etc下的core-site.xml文件

之後重新啟動hadoop運行程序,出現新的問題:
WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 0.0.0.0, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2411)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-N6EssLDt-1655814306440)(開發流程.assets/1655786575000.png)]](/img/17/4845ada83ca6314a4380e990c95499.png)
從問題截圖應該可以看到連接應該是沒有問題的,以及開始了預定的wordCount工作,這次可能是代碼層面的問題。
wordCount完整scala代碼:
import org.apache.spark.{
SparkConf, SparkContext}
object WordCount {
def main(arg: Array[String]): Unit = {
val ip = "192.168.78.128";
val inputFile = "hdfs://" + ip + ":9000/hadoop/README.txt";
val conf = new SparkConf().setMaster("spark://" + ip + ":7077").setAppName("wordCount");
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCount.foreach(println)
}
}
需要設置jar包,將項目進行打包
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-1fgMuGMF-1655814306440)(開發流程.assets/1655812962723.png)]](/img/10/7a769ec1c6a72cc1ccb9090a8d5163.png)
然後打包後的項目在target路徑下,找到對應jar包比特置,複制
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-nnVVC8Xn-1655814306440)(開發流程.assets/1655813012337.png)]](/img/34/6dc70b2b48b184ebcaa6f745cd341e.png)
複制路徑添加到配置setJar方法內,完整Scala WordCount代碼
import org.apache.spark.{
SparkConf, SparkContext}
object WordCount {
def main(arg: Array[String]): Unit = {
//打包後的jar包地址
val jar = Array[String]("D:\\IDEA_CODE_F\\com\\BigData\\Proj\\target\\Proj-1.0-SNAPSHOT.jar")
//spark虛擬機地址
val ip = "192.168.78.129";
val inputFile = "hdfs://" + ip + ":9000/hadoop/README.txt";
val conf = new SparkConf()
.setMaster("spark://" + ip + ":7077") //master節點地址
.setAppName("wordCount") //spark程序名
.setSparkHome("/root/spark") //spark安裝地址(應該可以不用)
.setIfMissing("spark.driver.host", "192.168.1.112")
.setJars(jar) //設置打包後的jar包
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1)).reduceByKey((a, b) => a + b)
val str1 = textFile.first()
println("str: " + str1)
val l = wordCount.count()
println(l)
println("------------------")
val tuples = wordCount.collect()
tuples.foreach(println)
sc.stop()
}
}
運行的大致結果:
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-i1D32Pdr-1655814306441)(開發流程.assets/1655814085565.png)]](/img/b2/c56726e066147097086358d8bec899.png)
md,csdn什麼時候可以直接導入markdown完整文件啊,每次本機寫完導入圖片都無法直接導入,還要一個一個截圖粘貼上去
边栏推荐
- 通过tidymodels使用XGBOOST
- Conversation Qiao Xinyu: l'utilisateur est le gestionnaire de produits Wei Brand, zéro anxiété définit le luxe
- 元气森林的5元有矿之死
- OData - SAP S4 OP 中使用SAP API Hub 的API
- Livox lidar+ Hikvision camera real-time 3D reconstruction based on loam to generate RGB color point cloud
- The problem of minimum modification cost in two-dimensional array [conversion question + shortest path] (dijkstra+01bfs)
- 月薪3万的狗德培训,是不是一门好生意?
- Web Worker介绍及使用案例
- mongodb基础操作之聚合操作、索引优化
- One to many association in MySQL to obtain the latest data in multiple tables
猜你喜欢

Workflow automation low code is the key

Yolov6: the fast and accurate target detection framework is open source

Spark BUG实践(包含的BUG:ClassCastException;ConnectException;NoClassDefFoundError;RuntimeExceptio等。。。。)

"I make the world cooler" 2022 Huaqing vision R & D product launch was a complete success

月薪3万的狗德培训,是不是一门好生意?

九九乘法表——C语言

元气森林的5元有矿之死

解决本地连接不上虚拟机的问题
![The problem of minimum modification cost in two-dimensional array [conversion question + shortest path] (dijkstra+01bfs)](/img/e6/4eb2ddf4d9bac5e40bf2e96656d294.png)
The problem of minimum modification cost in two-dimensional array [conversion question + shortest path] (dijkstra+01bfs)

结构化机器学习项目(一)- 机器学习策略
随机推荐
从学生到工程师的蜕变之路
C # QR code generation and recognition, removing white edges and any color
Hiplot 在线绘图工具的本地运行/开发库开源
初识C语言 第二弹
Crawler notes (1) - urllib
爬虫笔记(1)- urllib
Using the cucumber automated test framework
Windwos 8.1系统安装vmware tool插件报错的解决方法
Consumer finance app user insight in the first quarter of 2022 - a total of 44.79 million people
99 multiplication table - C language
“顶流爱豆制造机”携手四个产业资本,做LP
"I make the world cooler" 2022 Huaqing vision R & D product launch was a complete success
How to open an account for agricultural futures? How much is the handling charge for opening an account for futures? Who can give you a preferential handling charge?
月薪3万的狗德培训,是不是一门好生意?
mongodb基础操作之聚合操作、索引优化
Typescript learning
Solution to the error of VMware tool plug-in installed in Windows 8.1 system
信通院举办“业务与应用安全发展论坛” 天翼云安全能力再获认可
Passerelle de service pour les microservices
Learn rnaseq analysis by following the archiving tutorial (I)