当前位置:网站首页>High availability ResourceManager

High availability ResourceManager

2022-06-22 16:57:00 ZH519080

YARN The architecture of the figure

The picture shows that ,ResourceManager(RM) It is self-evident that it is important for the whole cluster . But it may be caused by many reasons ResourceManager Problems arise , Due to the cluster of units ResourceManager There are also problems , Today I will analyze ResourceManager Of High Availability( High availability ).

ResourceManager The role of : Responsible for coordinating the allocation of computing resources on the cluster , And NodeManager、MRApplicationMaster、HeartBeat And so on .

ResourceManager High availability : stay Hadoop-2.4 Before ,ResourceManager It is a single point of failure of the cluster .ResourceManager The high availability of is based on “Active/Standby( Activities / spare )” Add a node redundancy in the form of , And make use of Zookeeper colony , hold Active Of ResourceManager Status information is written Zookeeper Used to start Standby ResourceManager, To eliminate this single point of failure . As shown in the figure below .

ResourceManager HA It's through “Active/Standby” Architecture . One ResourceManager Active , One or more ResourceManager In standby state to achieve any condition that occurs in the takeover active state . When Active When a condition occurs, it needs to be enabled automatically Standby Of ResourceManager when , Through Failover-Controller( Failover controller ) To achieve .

Official website :

http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

RM  Failover controller

When automatic failover is not enabled , The administrator must manually set one of the ResourceManager from Standby State transition to Active state . From a ResourceManager Move to another ResourceManager It needs to be done first Active State of ResourceManager Convert to Standby State of ResourceManager, Only then can Standby state ResourceManager Replace with Active State of  ResourceManager, These operations are all through “yarn rmadmin CLI” To complete .

Start on any node in the cluster Zookeeper Of zkfc Initialization state of :

sudo -u hdfs zkfc -formatZK

Start automatic failover ,start-dfs.sh The script will run at any time NameNode Start automatically on the host of ZKFC Daemon , once ZKFC When the startup is completed, one... Will be automatically selected Standby NameNode Latest Active NameNode. If you manually manage the services in the cluster , It can be installed on each set Standby NameNode Carry out the order :

sudo -u hadoop-daemon.sh start zkfc

Or the command of manual conversion :

sudo -u hdfs haadmin -transitionToActive/transitionToStandby

among haadmin The tool is used to run HDFS HA Management client tools

You can choose to embed based on Zookeeper Of ActiveStandbyElector To decide which RM It should be started , When RM When the activity of stops or there is no response , the other one RM Is automatically selected as Active RM, The resource provisioning function of the entire cluster is changed from the new Active RM To take over . however ,RM HA No need to be like HDFS Run the individual ZKFC Watch out for , Because embedded in RM Medium ActiveStandbyElector Act as a fault detector and Leader Elector, Not alone ZKFC process .

When there is more than one RM when , Put all the in the cluster yarn-site.xml Add all... To the configuration file RM The host name of or IP Address .MRApplicationMaster and NodeManager Try to connect in a circular fashion RM, Until they are connected Active RM, if ResourceManager Stop or no response , They will continue to poll until new Active until .

About ResourceManager High availability yarn-site.xml Partial configuration of the file :

<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<!-- RM Of Active/Standby Automatic switching >
<property>  
<name>yarn.resourcemanager.ha.automatic-failover.recover.enabled</name>  
<value>true</value>  
 </property
<!--RM Automatic fault recovery -->  
<property>  
<name>yarn.resourcemanager.recovery.enabled</name>   
<value>true</value>   
</property> 
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>zysdmaster000</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>standbymaster000,standbymaster001</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>standbymaster000</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>standbymaster001</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>standbymaster000:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>standbymaster001:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>zysdslave001:2181,zysdslave002:2181,zysdslave003:2181,zysdslave004:2181</value>
</property>

 

原网站

版权声明
本文为[ZH519080]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206221523254163.html