当前位置:网站首页>How to build a "preemptive" remote control system (- - memory chapter)

How to build a "preemptive" remote control system (- - memory chapter)

2022-06-24 01:11:00 nonhalt_ 001

The COVID-19 has triggered a global health crisis , The global population is forced to work from home 、 Study 、 social contact 、 Conduct retail transactions 、 entertainment , Even meet with healthcare providers . Just like Microsoft CEO Satya · , (Satya Nadella) In the global health crisis 60 It's a famous saying about days ,“ We witnessed two years of digital transformation in two months .”

Now , All social media 、 Videoconferencing 、 Cloud collaboration platform 、 Electronic Commerce 、 telemedicine 、 Both online education and online entertainment rely on highly available data centers and reliable server hardware . The data center is now correctly listed as an important infrastructure by governments all over the world . Our data centers and the hardware that resides there need to stay online more than ever , So that the digital economy can run normally .

continuity of business

According to the Uptime Institute (Uptime Institute) Of 2020 Annual data center survey , Compared with the previous years , After the outbreak , There are many things in work and life that have changed from offline to online .“ The frequency of downtime is disturbing , Larger outages are becoming more disruptive and costly ”.

Shanghai Hongji is committed to providing business continuity solutions for industry customers , Simplify and intelligentize operation and maintenance by innovative means .

new generation edgeCentralMX Agile remote control system , Intel Corporation MFP(Memory Failure Prediction) The memory failure prediction scheme is integrated into the management system , The centralized and unified visual interface enables remote control and predictive maintenance of servers in widely connected data centers and edge computing scenarios .

As one of the three major hardware failures in the data center , Memory failure directly affects the reliability of the server . Besides , Memory failure can have a devastating effect , How to provide data center operators with early enough warnings of future outages , In order to take pre emptive action ? It is an urgent problem to be solved at present .

Using machine learning to analyze real-time memory health data , Such failures can be predicted in advance . Machine learning is a kind of data analysis method that can automatically build analysis model , The algorithm it uses is iterative learning from data , So the computer can find hidden insights , Without explicitly programming where to find these insights .

The ability to analyze real-time memory health data and avoid memory failures will ultimately lead to a better experience for customers . This is especially true for organizations such as online service platforms and cloud service providers , They rely heavily on the reliability of the server hardware 、 Availability and maintainability . It is these types of enterprises that are experiencing today's soaring demand .

By deploying memory failure prediction solutions in their data centers and integrating them into existing management systems ,IT Employees can analyze their server memory failures , Reduce downtime , And improve its current dual inline memory module (DIMM) Change strategy .

This memory failure prediction solution uses machine learning to analyze server memory errors , until DIMM、 Storage groups 、 Column 、 Row and cell levels , To generate each DIMM Memory health score for . as time goes on , Changes in health scores can signal problems before the impact occurs , To transfer workload and / Or take other actions to provide sufficient lead time .

To better understand how memory health scores are generated , It is necessary to understand that the memory failure prediction engine is placed in BIOS In firmware , And receive an alarm when a memory error occurs . When the server encounters a burst error in a specific memory area , Will check DIMM Health assessment model (DHAM), To assess whether the affected DIMM Health score for . If so , The score is changed accordingly and passed to the substrate management controller (BMC). Finally through IPMI over LAN Pass to edgeCentral MX Agile remote control system .

Some user test deployments show that , If deployed on its entire server network edgeCentral MX Agile remote control system and Intel Corporation MFP(MemoryFailure Prediction) Solution , Server crashes caused by hardware failures can be reduced 50% above .

原网站

版权声明
本文为[nonhalt_ 001]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/11/20211120164931647g.html