当前位置：网站首页>Netcf summary

Netcf summary

2022-06-23 19:07:00 【Bachuan Xiaoxiaosheng】

NetFC: Enable precise floating-point operations on programmable switches

NetFC: Enabling Accurate Floating-point Arithmetic on Programmable Switches

NetFC Importance

In modern data centers , Many data intensive applications （ Such as big data analysis 、 Distributed deep learning 、 Picture processing 、 Real time streaming . Due to frequent data exchange ） Performance may be degraded due to a large amount of network communication overhead . Reducing network communication has become a key factor in accelerating data intensive applications . And the network has been able to provide computing power . therefore , Some computing tasks traditionally performed on the host side can be transferred to network devices . In the process , Network traffic can be intercepted and processed by network devices in real time before reaching the host . Intra network computing The attraction of is ：1、 Packets can be consumed and processed during data transmission , This greatly reduces the overhead of the network （ Such as network queuing delay and IO expenses ） 2、 Transfer the computing load to the network , Can reduce the server CPU burden （ Such as gradient aggregation in the network , Network telemetry system ）.

Challenge

But the computing power of the network is very limited , Even the most advanced programmable switches only support simple integer arithmetic operations ( Such as addition and subtraction ).

The traditional floating-point operation method cannot be directly deployed on the programmable switch , because

Limited computing power : Programmable switches only support some simple integer algorithms . in other words , Floating point numbers and multiplication 、 Arithmetic operations such as division have exceeded the capacity of the switch
Scarce on-chip memory ： The on-chip memory of the switch is very small , So it's impossible to provide huge memory for floating-point operations . Please note that , A portion of memory must be reserved for forwarding rule storage and lookup , This further exacerbates the problem .
Limited pipeline stage ： The exchange data plane usually consists of multiple stages , Each stage is a packet processing unit with certain computing and storage resources . However , The number of stages is small , Any two dependent packet processing operations cannot be assigned to the same stage .

This has become an obstacle to the acceleration of applications in the network , Because many applications usually need to deal with complex floating-point data and arithmetic operations ( Such as multiplication and division ). Previous studies mainly used two different ways to indirectly support floating-point operation to overcome this obstacle . One is to convert floating-point numbers to integers according to the complex negotiation mechanism on the server side , Floating point multiplication and division are not supported . The other is to unload the computing task to the local server of the switch CPU, But it introduces significant delays . At present, there is a lack of a scheme that can realize real-time floating-point arithmetic operation in the network with almost no loss of accuracy on the programmable switch .

May adopt Look-up table method To support floating-point operations on programmable switches . Intuitively speaking , A simple and direct way is to use a table to list all possible calculations . For an arithmetic operation , You can use its two operands as keys to look up the table , The corresponding value is the result .

However, the generated table is too large to be installed on a programmable switch , Because it needs to traverse all operands and enumerate their various combinations （ For the two 16 Bit floating point operand , About need 8GB Memory ）.

programme

To solve the problem that the table is too large ,NetFC Adopted Divide and conquer method .

Insert picture description here

say concretely , It uses logarithmic projection and transformation to convert the original large table into several much smaller tables , These tables use built-in integer operations ( That is, addition and subtraction ) To operate .

NetFC Further adopted Scaling factor mechanism To improve the calculation accuracy . because NetFC Use $\lfloor log2(x)\rfloor$ To approximate $l o g 2 (x)$ , This inevitably leads to a loss of accuracy , because $l o g 2 (x)$ The decimal part of is ignored . To solve this problem ,NetFC Use a scale factor $k$ And $l o g 2 (x)$ Multiply , To enlarge its decimal part and avoid being ignored .NetFC The scale factor is also divided into subsequent steps , To ensure the correctness of floating-point operation .

And use Prefix based lossless compression Method to reduce the use of on-chip memory . say concretely , about NetFC One of the watches in , There may be many consecutive table entries with the same value , So their corresponding keys can be merged .

Insert picture description here

Open questions

Multiple floating point operations

NetFC Multiple floating-point operations can be supported by sequentially deploying lookup tables of different calculation types . for example , We can deploy addition and multiplication lookup tables sequentially , In order to realize, the first is addition , Then there is the operation of multiplication . Of course , There will be more stages . in other words ,NetFC The number of floating-point operations that can be supported for each packet depends on the available stages of the data plane . Besides , We can make further use of Barefoot Tofino The recirculation operation provided by the switch , Change the order of different floating-point operations .

32 Bit floating point operations

Due to the limitation of on-chip memory , at present NetFC The implementation of is not supported 32 Bit floating point . Theoretically, an approximate method based on Taylor series can be used to reduce memory consumption and support 32 Bit floating point operations . We'll save it for later work .

opinion

Intra network computing is an emerging trend to reduce network overhead by transferring some tasks to programmable switches . However , It is limited by the limited computing power of programmable switches ( For example, floating point operations ). To solve this problem , Designed NetFC, A table lookup method , In order to realize the dynamic floating-point operation with little loss of precision in the network .NetFC Prefix based lossless compressor system is adopted to reduce memory consumption . Experimental results show that ,NetFC The average accuracy of exceeds 99.94%, The memory consumption is only 448KB. Besides , The author will NetFC Integrated into the Sonata Medium test Slowloris attack , The detection delay is significantly reduced .NetFC It is expected to become the cornerstone of Network Computing .

原网站

版权声明
本文为[Bachuan Xiaoxiaosheng]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/174/202206231737022494.html