当前位置:网站首页>Flink on Yan ha mode construction problem
Flink on Yan ha mode construction problem
2022-07-16 07:33:00 【Vision-wang】
1. Problem description
Hadoop edition 2.6.5 Flink edition 1.11.6
In front of , In the building Standalone There is no problem during cluster startup
In the building Flink On Yarn when , There's no problem
build Flink On Yarn But the implementation has failed
2.Flink On Yarn HA To configure
1. to yarn-site.xml: Add the following configuration to the file
<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>10</value>
</property>This means to resourcemanager Submit tasks ,Yarn Give us the number of automatic retries after maintenance failure
2. modify FLINK_HOME/conf Under the flink-conf.yaml file :
Make sure to see the last , So equipped with problems !!!!!
high-availability: zookeeper
high-availability.storageDir: hdfs://node02:9000/flink/ha/ What I'm using here is node02
high-availability.zookeeper.quorum: node02:2181,node03:2181,node04:2181
3. Download support Hadoop Plug in and copy to each node FLINK_HOME/lib Under the table of contents , Because in Flink This version no longer carries and by default Hadoop The interaction of jar file , You need to import
Download address :
3. Recurrence of solution process
First check the command line
2022-07-05 20:43:01,918 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli [] - Error while running the Flink session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:392) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:636) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:895) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_152]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_152]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:895) [flink-dist_2.11-1.11.6.jar:1.11.6]
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1657020068308_0004 failed 2 times due to AM Container for appattempt_1657020068308_0004_000004 exited with exitCode: 1
For more detailed output, check application tracking page:http://node05:8088/proxy/application_1657020068308_0004/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1657020068308_0004_04_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This is an error from the command line , It roughly means container container Startup error , Why is the mistake not clearly described , He said that the detailed information can be checked through the following address http://node05:8088/proxy/application_1657020068308_0004/Then
View container log 
Found no information for birds , The same as the command line
But you can see that the container has tried many times when it started , You can see why it fails every time through the log of each attempt to start
Check the retry log

I have one here error, I instinctively check this log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1657020068308_0004/filecache/15/log4j-slf4j-impl-2.16.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]What is described here is a SLF4j Package version conflict log for , Then find the specific location of the two packages , Replace, rename, replace, and a series of operations , And then I found this error It's gone , however The cluster is not up yet !!!!!!
It took me a long time to realize that I missed a log , It's careless , Instinctive reactions are likely to lead us astray
see jobmanager.log Full log
I can't see it all , Point here , On the surface, there is nothing wrong
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnSessionClusterEntrypoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:200) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:577) [flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint.main(YarnSessionClusterEntrypoint.java:82) [flink-dist_2.11-1.11.6.jar:1.11.6]
Caused by: java.net.ConnectException: Call From node04/192.168.76.96 to node03:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_152]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_152]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_152]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_152]
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.ipc.Client.call(Client.java:1474) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at com.sun.proxy.$Proxy26.mkdirs(Unknown Source) ~[?:?]
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_152]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_152]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_152]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_152]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at com.sun.proxy.$Proxy27.mkdirs(Unknown Source) ~[?:?]
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2742) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2713) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1819) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.mkdirs(HadoopFileSystem.java:172) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.blob.FileSystemBlobStore.<init>(FileSystemBlobStore.java:64) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:98) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:76) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:115) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:335) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:293) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:223) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:177) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_152]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_152]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) ~[flink-shaded-hadoop-2-uber-2.6.5-10.0.jar:2.6.5-10.0]
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:174) ~[flink-dist_2.11-1.11.6.jar:1.11.6]
... 2 more
Caused by: java.net.ConnectException: Connection refused
We got a problem !! Can't connect HDFS As a result of , Because we are HA Configuration of the HDFS Address ,ZK Address, etc
3. solve
high-availability: zookeeper
high-availability.storageDir: hdfs://mycluster/flink/ha/Here, here , adopt IP+ The port is inaccessible HDFS Of
high-availability.zookeeper.quorum: node02:2181,node03:2181,node04:2181
Because we are hdfs-site.xml The following configurations have been made in , So you need to write the address as mycluster, The cluster helps us automatically find HDFS in NameNode The address of
<!-- Seek NN Configuration file for ,k-v mapping -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<!--NN The specific address of -->
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node02:8020</value>
</property>4. summary
There are many cluster configurations , There are also many logs , Troubleshooting is complicated , Don't give up. Look slowly and you'll always find , Especially learning to build by yourself , Many configuration methods on the network are wrong , It may only be with what you built before HDFS,Yarn Cluster configuration is conflicting , Troubleshooting all components as a whole is also a great progress !!!
边栏推荐
- Node. MySQL operation of JS
- 《MySQL数据库原理、设计与应用》课后习题及答案 黑马程序员编著
- [learning records on June 2]
- Multi graph detailed blocking queue - synchronousqueue
- 【LeetCode】307. Area and retrieval - array modifiable
- 【LeetCode】307. 区域和检索 - 数组可修改
- 【LeetCode】2024. 考试的最大困扰度
- 【Oracle】在docker中配置Oracle数据库
- 字节测试总监熬夜10天肝出来的测试岗面试秘籍,给你的大厂梦插上翅膀~
- SAP ABAP BAPI_ MATERIAL_ Availability query available inventory
猜你喜欢

六、数据备份软件的配置实验报告

向“钱”看~毕业两年,年薪30W的测试员的经历...

SAP ABAP selection screen selection screen is enough to read this article (continuous update)

When byte hung up, the interviewer asked me DDD, but I didn't know

New features of es6-es11 (this article is enough)
![[Oracle] configure Oracle database in docker](/img/eb/9f0cf44f3e55402a3f0814867daa20.png)
[Oracle] configure Oracle database in docker

Stock trading issues

“挤破脑袋进的腾讯,你凭什么要辞职?”

Mvcc multi version concurrency control

“终于我从字节离职了...“一个年薪40W的测试工程师的自白..
随机推荐
Unity3d~ tank AI (small exercise)
Recursion in binary tree
【LeetCode】954. Double pair array
Mid year summary - rotten
[sword finger offer] special summary of linked list
Implementation of hash table linear detection class template
五、Microsoft群集服务(MSCS)环境的搭建实验报告
JVM directory
Node. MySQL operation of JS
What is EventLoop?
【MySQL】分页查询踩坑
Go seckill system 3 -- project structure construction and commodity model development.
【LeetCode】933. Recent requests
Headfirst state mode source code
SAP BW extraction layer error s:aa 821 (bukrs)
二、实现软件RAID实验报告
Basic knowledge of redis - rookie tutorial
从软件测试培训班出来之后找工作的经历,教会了我这五件事...
Associative word matching - Summary
【LeetCode】728. Self divisor