当前位置:网站首页>Spark submission parameter -- use of files
Spark submission parameter -- use of files
2022-06-24 10:31:00 【The south wind knows what I mean】
Project scenario :
We have two clusters (ps: Computing Cluster / Storage cluster ), Now there is a need for , Computing cluster runs Spark Mission , from kafka Write the data to the storage cluster hive
Problem description
Read and write data across clusters , We tested writing hbase It can be written from the computing cluster to the storage cluster , And it can be written in .
But once you write hive He just doesn't write about storage clusters hive in , Each time, it only writes about the computing cluster hive in .
It's hard for me to understand , And I am here IDEA During the test , Can be written to the storage cluster hive in , Once you get on the dolphin, put it on the cluster and run He wrote that he had deviated , It is written to the computing cluster hive Inside the . I am here resource The folder also contains the storage cluster core-site.xml hdfs-site.xml hive-site.xml The file , I also wrote in the code changeNameNode The method . But the program still seems unable to switch to the storage cluster when running NN Up 
/*** * @Author: lzx * @Description: * @Date: 2022/5/27 * @Param session: bulid well Sparkssion * @Param nameSpace: The namespace of the cluster * @Param nn1: nn1_ID * @Param nn1Addr: nn1 Corresponding IP:host * @Param nn2: nn2_ID * @Param nn2Addr: nn2 Corresponding IP:host * @return: void **/
def changeHDFSConf(session:SparkSession,nameSpace:String,nn1:String,nn1Addr:String,nn2:String,nn2Addr:String): Unit ={
val sc: SparkContext = session.sparkContext
sc.hadoopConfiguration.set("fs.defaultFS", s"hdfs://$nameSpace")
sc.hadoopConfiguration.set("dfs.nameservices", nameSpace)
sc.hadoopConfiguration.set(s"dfs.ha.namenodes.$nameSpace", s"$nn1,$nn2")
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn1", nn1Addr)
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn2", nn2Addr)
sc.hadoopConfiguration.set(s"dfs.client.failover.proxy.provider.$nameSpace", s"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
}
Cause analysis :
1. I'll go first Spark On the running interface of Environment Under the hadoop Parameters of , I searched nn1 Went to have a look , Look at my changenamenode Has the method worked for me
2, result dfs.namenode.http-address.hr-hadoop.nn1 The value of the or node03( Calculate the cluster ) No node118( Storage cluster ) Explain that the method is still not effective
Why not take effect ???
Configuration conf=new Configuration();
Create a Configuration Object time , Its construction method will load... By default hadoop Two configuration files in , Namely hdfs-site.xml as well as core-site.xml, There will be access in these two files hdfs Required parameter values
I have this in my code , Why didn't you load it ??
3, After analysis, I found that , The code is submitted to the cluster for execution , It loads the... On the cluster core/hdfs-site.xml file , Directly discard the configuration file in the code
Solution :
1. In the code , Replace the cluster configuration file with your own configuration file , In this way, you can find the information of the storage cluster
val hadoopConf: Configuration = new Configuration()
hadoopConf.addResource("hdfs-site.xml")
hadoopConf.addResource("core-site.xml")
If both configuration resources contain the same configuration item , And the configuration item of the previous resource is not marked as final, that , The following configuration will overwrite the previous configuration . In the example above ,core-site.xml The configuration in will override core-default.xml Configuration with the same name in . If in the first resource (core-default.xml) A configuration item in is marked as final, that , When loading the second resource , There will be a warning .
2, It's not possible to do that just above , It says , Once packaged and run on the cluster , He will put resource Under the folder core/hdfs-site.xml File discard , then .addResource(“hdfs-site.xml”) I can't find my own document , Go to the configuration file of the cluster
3, Put your two configuration files in the execution directory , Submit again spark When the task , Specify in the submission parameters
--files /srv/udp/2.0.0.0/spark/userconf/hdfs-site.xml,/srv/udp/2.0.0.0/spark/userconf/core-site.xml \
4, Expand :
--files Transferred files :
If you are in the same cluster as the current submission cluster , It will prompt that the current data source is the same as the target file storage system , The copy is not triggered at this time
INFO Client: Source and destination file systems are the same. Not copying
If you are in a different cluster from the current submission cluster , The source file is updated from the source path to the current file storage system
INFO Client: Uploading resource
边栏推荐
- Leetcode interview question 16.06: minimum difference
- Learn to use the phpstripslush function to remove backslashes
- [resource sharing] the 5th International Conference on civil, architectural and environmental engineering in 2022 (iccaee 2022)
- 用扫描的方法分发书稿校样
- SSM整合
- 机械臂速成小指南(一):机械臂发展概况
- tf.contrib.layers.batch_norm
- 正规方程、、、
- SQL Server AVG function rounding
- 分布式事务原理以及解决分布式事务方案
猜你喜欢

JMeter接口测试工具基础— 使用Badboy录制JMeter脚本

Outils de capture de paquets

2. login and exit function development

Machine learning - principal component analysis (PCA)

线程的 sleep() 方法与 wait() 方法的区别

2022 the most complete and detailed JMeter interface test tutorial and detailed interface test process in the whole network - JMeter test plan component (thread < user >)

使用swiper左右轮播切换时,Swiper Animate的动画失效,怎么解决?

整理接口性能优化技巧,干掉慢代码

24. 图像拼接大作业

Flink checkpoint and savepoint
随机推荐
抓包工具charles實踐分享
[ei sharing] the 6th International Conference on ship, ocean and Maritime Engineering in 2022 (naome 2022)
uniapp实现点击拨打电话功能
【资源分享】2022年第五届土木,建筑与环境工程国际会议(ICCAEE 2022)
Leetcode-1051: height checker
机械臂速成小指南(二):机械臂的应用
希尔排序图文详解+代码实现
Machine learning - principal component analysis (PCA)
线程的 sleep() 方法与 wait() 方法的区别
Status of the thread pool
Outils de capture de paquets
web网站开发,图片懒加载
5. dish management business development
Appium自动化测试基础 — 移动端测试环境搭建(一)
Leetcode interview question 16.06: minimum difference
Learn to use the phpstripslush function to remove backslashes
np.float32()
Difference between package type and basic type
Role of message queuing
numpy.logical_and()