当前位置:网站首页>High availability ResourceManager
High availability ResourceManager
2022-06-22 16:57:00 【ZH519080】
YARN The architecture of the figure

The picture shows that ,ResourceManager(RM) It is self-evident that it is important for the whole cluster . But it may be caused by many reasons ResourceManager Problems arise , Due to the cluster of units ResourceManager There are also problems , Today I will analyze ResourceManager Of High Availability( High availability ).
ResourceManager The role of : Responsible for coordinating the allocation of computing resources on the cluster , And NodeManager、MRApplicationMaster、HeartBeat And so on .
ResourceManager High availability : stay Hadoop-2.4 Before ,ResourceManager It is a single point of failure of the cluster .ResourceManager The high availability of is based on “Active/Standby( Activities / spare )” Add a node redundancy in the form of , And make use of Zookeeper colony , hold Active Of ResourceManager Status information is written Zookeeper Used to start Standby ResourceManager, To eliminate this single point of failure . As shown in the figure below .

ResourceManager HA It's through “Active/Standby” Architecture . One ResourceManager Active , One or more ResourceManager In standby state to achieve any condition that occurs in the takeover active state . When Active When a condition occurs, it needs to be enabled automatically Standby Of ResourceManager when , Through Failover-Controller( Failover controller ) To achieve .
Official website :
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
RM Failover controller
When automatic failover is not enabled , The administrator must manually set one of the ResourceManager from Standby State transition to Active state . From a ResourceManager Move to another ResourceManager It needs to be done first Active State of ResourceManager Convert to Standby State of ResourceManager, Only then can Standby state ResourceManager Replace with Active State of ResourceManager, These operations are all through “yarn rmadmin CLI” To complete .
Start on any node in the cluster Zookeeper Of zkfc Initialization state of :
sudo -u hdfs zkfc -formatZK
Start automatic failover ,start-dfs.sh The script will run at any time NameNode Start automatically on the host of ZKFC Daemon , once ZKFC When the startup is completed, one... Will be automatically selected Standby NameNode Latest Active NameNode. If you manually manage the services in the cluster , It can be installed on each set Standby NameNode Carry out the order :
sudo -u hadoop-daemon.sh start zkfc
Or the command of manual conversion :
sudo -u hdfs haadmin -transitionToActive/transitionToStandby
among haadmin The tool is used to run HDFS HA Management client tools
You can choose to embed based on Zookeeper Of ActiveStandbyElector To decide which RM It should be started , When RM When the activity of stops or there is no response , the other one RM Is automatically selected as Active RM, The resource provisioning function of the entire cluster is changed from the new Active RM To take over . however ,RM HA No need to be like HDFS Run the individual ZKFC Watch out for , Because embedded in RM Medium ActiveStandbyElector Act as a fault detector and Leader Elector, Not alone ZKFC process .
When there is more than one RM when , Put all the in the cluster yarn-site.xml Add all... To the configuration file RM The host name of or IP Address .MRApplicationMaster and NodeManager Try to connect in a circular fashion RM, Until they are connected Active RM, if ResourceManager Stop or no response , They will continue to poll until new Active until .
About ResourceManager High availability yarn-site.xml Partial configuration of the file :
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- RM Of Active/Standby Automatic switching >
<property>
<name>yarn.resourcemanager.ha.automatic-failover.recover.enabled</name>
<value>true</value>
</property
<!--RM Automatic fault recovery -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>zysdmaster000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>standbymaster000,standbymaster001</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>standbymaster000</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>standbymaster001</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>standbymaster000:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>standbymaster001:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zysdslave001:2181,zysdslave002:2181,zysdslave003:2181,zysdslave004:2181</value>
</property>
边栏推荐
- JS获取数据类型方法总结
- 为什么要买增额终身寿险?增额终身寿险安全可靠吗?
- Interface (optimization type annotation)
- 【微信小程序获取自定义tabbar的高度】绝对可用!!!
- MYSQL_ERRNO : 1205 MESSAGE :Lock wait timeout exceeded; try restarting transacti
- 什么是RESTful,REST api设计时应该遵守什么样的规则?
- Consumption monitoring of Prometheus monitoring [consult exporter]
- Special research on Intelligent upgrading of heavy trucks in China in 2022
- Why buy increased life insurance? Is increased life insurance safe and reliable?
- [deep anatomy of C language] keywords if & else & bool type
猜你喜欢

图计算Hama-BSP模型的运行流程

Linux system maintenance: mysql8.0.13 source code download and installation "fool" operation steps (Linux centos6.8) test available series

JSP learning (2) -- JSP script elements and instructions
![[wechat applet to obtain the height of custom tabbar] is absolutely available!!!](/img/ed/7ff70178f03b50cb7bec349c1be5e0.png)
[wechat applet to obtain the height of custom tabbar] is absolutely available!!!

JS获取数据类型方法总结

华为云招募工业智能领域合作伙伴,强力扶持+商业变现

Jsp Learning (2) - - jsp script Elements and instructions

Vhedt business development framework

Special research on Intelligent upgrading of heavy trucks in China in 2022

LETV group payment system architecture sharing for processing 100000 high concurrent orders per second
随机推荐
spark的NaiveBayes中文文本分类
scala之闭包函数浅知
spark-shuffle的读数据源码分析
[MYSQL]一台windows电脑安装多个mysql-不同版本
Spark性能调优之道——解决Spark数据倾斜(Data Skew)的N种姿势
什么是RESTful,REST api设计时应该遵守什么样的规则?
spark-cache的源码分析
[Alibaba cloud server - install MySQL version 5.6 and reinstall]
Linux system maintenance: mysql8.0.13 source code download and installation "fool" operation steps (Linux centos6.8) test available series
Add a millennial sign to a number (amount in millennia)
jsp學習之(二)---------jsp脚本元素和指令
scala-for推导:能够在for表达式中的最初部分定义值,并在(外面)后面的表达式中使用该值
How to add a "security lock" to the mobile office of government and enterprises?
Gridhome, a must-have static site generator for beginners
JSP学习之开发模式
Summary of Changan chain usage skills
win10的wifi断线后无法连接
JS method for judging data type of interview questions
What should I do if I can't hear a sound during a video conference?
从Application提交角度审视Executor