当前位置:网站首页>Nearly 90% of servers can be saved, but the anti fraud efficiency has greatly increased. Why is PayPal's plan to break the "Ai memory wall" so cost-effective?

Nearly 90% of servers can be saved, but the anti fraud efficiency has greatly increased. Why is PayPal's plan to break the "Ai memory wall" so cost-effective?

2022-06-21 23:56:00 Zhiyuan community

People often say that , The new generation of artificial intelligence is driven by data 、 Algorithm and computing power . In recent years, the explosive growth of model parameters has shown us the fundamental role of computational force .


In order to meet the strong demand of enterprise users for computing power , Many at present AI Hardware ( such as GPU) They all worked hard to improve the peak computing power , But this kind of promotion usually simplifies or deletes other parts ( For example, the hierarchical architecture of memory ) At the cost of [1], This can make  AI The development of hardware memory lags far behind the growth of computing power .

SOTA Transformer Model parameters ( Red dot ) and AI Hardware memory size ( Green dot ) Growth trend comparison .

Picture source :https://github.com/amirgholami/ai_and_memory_wall/blob/main/imgs/pdfs/model_size_scaling.pdf

therefore , When encountering the training and reasoning of large models , Users always feel that there is not enough video memory or memory , That's what's called 「 Memory wall 」 problem .


To break the memory wall , People think of many ways , For example, there was a fire some time ago Colossal-AI The project is a method suitable for the training stage . In this project , Developers make efficient use of 「GPU+CPU Heterogeneous memory 」 The strategy of , So that a consumer graphics card can be trained 180 A big model with 100 million parameters .


And in the reasoning stage , The main requirement of the model for hardware is to load all parameters of the model , Therefore, the requirements for computing power are relatively low . Generally, for computing intensive models , We can use INT8 Strategies such as quantification or model parallelism , Use multiple sheets GPU And its memory resources to infer a single model . But actually , There are also many machine learning or deep learning models for industrial application scenarios that can be used CPU And memory to do reasoning , For example, recommendation system 、 Click estimate, etc .


For these models , In addition to our demands on memory capacity , You may also need to consider the data recovery time under abnormal conditions 、 Hardware cost 、 Maintenance costs and other issues , This also puts forward new requirements for the choice of solutions to crack the memory wall .


Industrial reasoning stands in the way : Memory wall
In an industrial scenario , Huge amounts of data 、 High dimensional models do bring better results , But the high dimensions of these data 、 Sparse features bring great challenges to computing and storage . After all, models like recommendation systems , The size of the hidden layer can be in the order of millions , The total parameters can even reach the order of ten trillion , yes GPT-3 A hundred times the size of , Therefore, its users often need a particularly powerful memory support system to achieve better online reasoning ability .


Since there is not enough memory , Isn't that a direct heap memory module ( Such as DRAM) That's enough ? This is feasible in principle , But on the one hand DRAM The price of memory is not cheap , This kind of model does not require hundreds of memory GB, Instead, it rushes to dozens of places TB, And single DRAM Memory is usually only a few dozen GB, Few surpass 128GB Of . therefore , Make an overall calculation , Whatever the cost , Or the capacity expansion , This plan is not easy to be accepted by everyone .


Besides ,DRAM There is another problem with memory , That is, data is volatile , Or say : Data will be lost as soon as power is cut off . Sometimes when the model is restarted or the fault is eliminated , You can only re weight from a slower storage device , Such as SSD Or the mechanical hard disk is loaded into the memory , Very late , This is hard to tolerate for online reasoning business .


Break the inferential memory wall , no need DRAM What kind of ?
that , In addition to the purchase DRAM In addition to this less cost-effective option , Enterprises that provide online reasoning services or use such applications should break the memory wall , Are there any other options ?


If you carefully compare the capacity and latency data of different storage tiers , We can find out ,DRAM Memory and solid state disk / There is a big gap between hard disk storage . If we can develop a new storage component or device to fill this gap , The memory wall problem may be alleviated .

This is it. Intel Haughty Persistent memory (Intel Optane Persistent Memory, abbreviation PMem) Birth background . Its unique Ao Teng   The storage medium is combined with advanced memory controller and other software and hardware technologies , Make it Close to... In performance DRAM Memory , The capacity can be increased several times ( The capacity of a single can reach 512GB), Used in the third generation Intel To the strong On a two-way platform with scalable processors , Theoretically up to 12TB Total memory capacity (4TB DRAM+8TB Persistent memory ), by comparison , Based on pure DRAM Not only is the capacity expansion capability of the scheme far inferior to , The cost is also unbearable .


Besides , Haughty Persistent memory also has two important features : Byte addressable and data persistent , It brings together the advantages of memory and storage respectively . Traditional storage requires block based read / write addressing , It's like going to the library to borrow books , The books on the whole shelf with the target books must be carried home for sorting , Memory addressing by bytes is equivalent to accurately locating the location of the target book and only lending it out .  

Haughty   The location and role of persistence in the storage hierarchy . 

Use the storage and search of books as an example , To understand the characteristics of different storage tiers .


Data persistence complements DRAM The inborn shortage of memory , Data can be retained even after power failure , This makes the large-scale memory database in case of planned or unplanned downtime of the computing system , Recovery of data and services is greatly accelerated , After all, the middle saves hundreds from solid-state disks or hard disks GB, Even TB The time it takes to read level data back to memory .

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206212204169545.html