当前位置:网站首页>Spark submission parameter -- use of files
Spark submission parameter -- use of files
2022-06-24 10:31:00 【The south wind knows what I mean】
Project scenario :
We have two clusters (ps: Computing Cluster / Storage cluster ), Now there is a need for , Computing cluster runs Spark Mission , from kafka Write the data to the storage cluster hive
Problem description
Read and write data across clusters , We tested writing hbase It can be written from the computing cluster to the storage cluster , And it can be written in .
But once you write hive He just doesn't write about storage clusters hive in , Each time, it only writes about the computing cluster hive in .
It's hard for me to understand , And I am here IDEA During the test , Can be written to the storage cluster hive in , Once you get on the dolphin, put it on the cluster and run He wrote that he had deviated , It is written to the computing cluster hive Inside the . I am here resource The folder also contains the storage cluster core-site.xml hdfs-site.xml hive-site.xml The file , I also wrote in the code changeNameNode The method . But the program still seems unable to switch to the storage cluster when running NN Up 
/*** * @Author: lzx * @Description: * @Date: 2022/5/27 * @Param session: bulid well Sparkssion * @Param nameSpace: The namespace of the cluster * @Param nn1: nn1_ID * @Param nn1Addr: nn1 Corresponding IP:host * @Param nn2: nn2_ID * @Param nn2Addr: nn2 Corresponding IP:host * @return: void **/
def changeHDFSConf(session:SparkSession,nameSpace:String,nn1:String,nn1Addr:String,nn2:String,nn2Addr:String): Unit ={
val sc: SparkContext = session.sparkContext
sc.hadoopConfiguration.set("fs.defaultFS", s"hdfs://$nameSpace")
sc.hadoopConfiguration.set("dfs.nameservices", nameSpace)
sc.hadoopConfiguration.set(s"dfs.ha.namenodes.$nameSpace", s"$nn1,$nn2")
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn1", nn1Addr)
sc.hadoopConfiguration.set(s"dfs.namenode.rpc-address.$nameSpace.$nn2", nn2Addr)
sc.hadoopConfiguration.set(s"dfs.client.failover.proxy.provider.$nameSpace", s"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
}
Cause analysis :
1. I'll go first Spark On the running interface of Environment Under the hadoop Parameters of , I searched nn1 Went to have a look , Look at my changenamenode Has the method worked for me
2, result dfs.namenode.http-address.hr-hadoop.nn1 The value of the or node03( Calculate the cluster ) No node118( Storage cluster ) Explain that the method is still not effective
Why not take effect ???
Configuration conf=new Configuration();
Create a Configuration Object time , Its construction method will load... By default hadoop Two configuration files in , Namely hdfs-site.xml as well as core-site.xml, There will be access in these two files hdfs Required parameter values
I have this in my code , Why didn't you load it ??
3, After analysis, I found that , The code is submitted to the cluster for execution , It loads the... On the cluster core/hdfs-site.xml file , Directly discard the configuration file in the code
Solution :
1. In the code , Replace the cluster configuration file with your own configuration file , In this way, you can find the information of the storage cluster
val hadoopConf: Configuration = new Configuration()
hadoopConf.addResource("hdfs-site.xml")
hadoopConf.addResource("core-site.xml")
If both configuration resources contain the same configuration item , And the configuration item of the previous resource is not marked as final, that , The following configuration will overwrite the previous configuration . In the example above ,core-site.xml The configuration in will override core-default.xml Configuration with the same name in . If in the first resource (core-default.xml) A configuration item in is marked as final, that , When loading the second resource , There will be a warning .
2, It's not possible to do that just above , It says , Once packaged and run on the cluster , He will put resource Under the folder core/hdfs-site.xml File discard , then .addResource(“hdfs-site.xml”) I can't find my own document , Go to the configuration file of the cluster
3, Put your two configuration files in the execution directory , Submit again spark When the task , Specify in the submission parameters
--files /srv/udp/2.0.0.0/spark/userconf/hdfs-site.xml,/srv/udp/2.0.0.0/spark/userconf/core-site.xml \
4, Expand :
--files Transferred files :
If you are in the same cluster as the current submission cluster , It will prompt that the current data source is the same as the target file storage system , The copy is not triggered at this time
INFO Client: Source and destination file systems are the same. Not copying
If you are in a different cluster from the current submission cluster , The source file is updated from the source path to the current file storage system
INFO Client: Uploading resource
边栏推荐
- Caching mechanism for wrapper types
- [IEEE publication] 2022 International Conference on service robots (iwosr 2022)
- 分布式事务原理以及解决分布式事务方案
- Safety and food security for teachers and students of the trapped Yingxi middle school
- 6. package management business development
- Using pandas to read SQL server data table
- 进程与多线程
- 88.合并有序数组
- np. float32()
- 2022全网最全最细的jmeter接口测试教程以及接口测试流程详解— JMeter测试计划元件(线程<用户>)
猜你喜欢

1. project environment construction

抓包工具charles实践分享

2.登陆退出功能开发

Customize the toolbars of the kindeditor editor. Items removes unnecessary toolbars or retains some toolbars

Leetcode-2221: triangular sum of arrays

百度网盘下载一直请求中问题解决

2022 International Symposium on intelligent robots and systems (isoirs 2022)

3.员工的增删改查

6. package management business development

机械臂速成小指南(零):指南主要内容及分析方法
随机推荐
leetCode-929: 独特的电子邮件地址
[resource sharing] the 5th International Conference on civil, architectural and environmental engineering in 2022 (iccaee 2022)
numpy. linspace()
leetCode-1051: 高度检查器
1.项目环境搭建
Sort out interface performance optimization skills and kill slow code
4.分类管理业务开发
记录一下MySql update会锁定哪些范围的数据
正规方程、、、
解决DBeaver SQL Client 连接phoenix查询超时
Uniapp develops a wechat applet to display the map function, and click it to open Gaode or Tencent map.
HBuilder制作英雄皮肤抽奖小游戏
How to use multiple kindeditor editors on a page and pass values to the server
tf.contrib.layers.batch_norm
Practice sharing of packet capturing tool Charles
希尔排序图文详解+代码实现
Role of message queuing
【资源分享】2022年环境工程与生物技术国际会议(CoEEB 2022)
Machine learning - principal component analysis (PCA)
International Symposium on energy and environmental engineering in 2022 (coeee 2022)