当前位置:网站首页>Netcf summary
Netcf summary
2022-06-23 19:07:00 【Bachuan Xiaoxiaosheng】
NetFC: Enable precise floating-point operations on programmable switches
NetFC: Enabling Accurate Floating-point Arithmetic on Programmable Switches
NetFC Importance
In modern data centers , Many data intensive applications ( Such as big data analysis 、 Distributed deep learning 、 Picture processing 、 Real time streaming . Due to frequent data exchange ) Performance may be degraded due to a large amount of network communication overhead . Reducing network communication has become a key factor in accelerating data intensive applications . And the network has been able to provide computing power . therefore , Some computing tasks traditionally performed on the host side can be transferred to network devices . In the process , Network traffic can be intercepted and processed by network devices in real time before reaching the host . Intra network computing The attraction of is :1、 Packets can be consumed and processed during data transmission , This greatly reduces the overhead of the network ( Such as network queuing delay and IO expenses ) 2、 Transfer the computing load to the network , Can reduce the server CPU burden ( Such as gradient aggregation in the network , Network telemetry system ).
Challenge
But the computing power of the network is very limited , Even the most advanced programmable switches only support simple integer arithmetic operations ( Such as addition and subtraction ).
The traditional floating-point operation method cannot be directly deployed on the programmable switch , because
- Limited computing power : Programmable switches only support some simple integer algorithms . in other words , Floating point numbers and multiplication 、 Arithmetic operations such as division have exceeded the capacity of the switch
- Scarce on-chip memory : The on-chip memory of the switch is very small , So it's impossible to provide huge memory for floating-point operations . Please note that , A portion of memory must be reserved for forwarding rule storage and lookup , This further exacerbates the problem .
- Limited pipeline stage : The exchange data plane usually consists of multiple stages , Each stage is a packet processing unit with certain computing and storage resources . However , The number of stages is small , Any two dependent packet processing operations cannot be assigned to the same stage .
This has become an obstacle to the acceleration of applications in the network , Because many applications usually need to deal with complex floating-point data and arithmetic operations ( Such as multiplication and division ). Previous studies mainly used two different ways to indirectly support floating-point operation to overcome this obstacle . One is to convert floating-point numbers to integers according to the complex negotiation mechanism on the server side , Floating point multiplication and division are not supported . The other is to unload the computing task to the local server of the switch CPU, But it introduces significant delays . At present, there is a lack of a scheme that can realize real-time floating-point arithmetic operation in the network with almost no loss of accuracy on the programmable switch .
May adopt Look-up table method To support floating-point operations on programmable switches . Intuitively speaking , A simple and direct way is to use a table to list all possible calculations . For an arithmetic operation , You can use its two operands as keys to look up the table , The corresponding value is the result .
However, the generated table is too large to be installed on a programmable switch , Because it needs to traverse all operands and enumerate their various combinations ( For the two 16 Bit floating point operand , About need 8GB Memory ).
programme
To solve the problem that the table is too large ,NetFC Adopted Divide and conquer method .

say concretely , It uses logarithmic projection and transformation to convert the original large table into several much smaller tables , These tables use built-in integer operations ( That is, addition and subtraction ) To operate .
NetFC Further adopted Scaling factor mechanism To improve the calculation accuracy . because NetFC Use ⌊ l o g 2 ( x ) ⌋ \lfloor log2(x)\rfloor ⌊log2(x)⌋ To approximate l o g 2 ( x ) log2(x) log2(x), This inevitably leads to a loss of accuracy , because l o g 2 ( x ) log2(x) log2(x) The decimal part of is ignored . To solve this problem ,NetFC Use a scale factor k k k And l o g 2 ( x ) log2(x) log2(x) Multiply , To enlarge its decimal part and avoid being ignored .NetFC The scale factor is also divided into subsequent steps , To ensure the correctness of floating-point operation .
And use Prefix based lossless compression Method to reduce the use of on-chip memory . say concretely , about NetFC One of the watches in , There may be many consecutive table entries with the same value , So their corresponding keys can be merged .

Open questions
Multiple floating point operations
NetFC Multiple floating-point operations can be supported by sequentially deploying lookup tables of different calculation types . for example , We can deploy addition and multiplication lookup tables sequentially , In order to realize, the first is addition , Then there is the operation of multiplication . Of course , There will be more stages . in other words ,NetFC The number of floating-point operations that can be supported for each packet depends on the available stages of the data plane . Besides , We can make further use of Barefoot Tofino The recirculation operation provided by the switch , Change the order of different floating-point operations .
32 Bit floating point operations
Due to the limitation of on-chip memory , at present NetFC The implementation of is not supported 32 Bit floating point . Theoretically, an approximate method based on Taylor series can be used to reduce memory consumption and support 32 Bit floating point operations . We'll save it for later work .
opinion
Intra network computing is an emerging trend to reduce network overhead by transferring some tasks to programmable switches . However , It is limited by the limited computing power of programmable switches ( For example, floating point operations ). To solve this problem , Designed NetFC, A table lookup method , In order to realize the dynamic floating-point operation with little loss of precision in the network .NetFC Prefix based lossless compressor system is adopted to reduce memory consumption . Experimental results show that ,NetFC The average accuracy of exceeds 99.94%, The memory consumption is only 448KB. Besides , The author will NetFC Integrated into the Sonata Medium test Slowloris attack , The detection delay is significantly reduced .NetFC It is expected to become the cornerstone of Network Computing .
边栏推荐
- #20Set介绍与API
- sed replace \tPrintf to \t//Printf
- Jerry's broadcast MP3 prompt sound function [chapter]
- Programmable data plane (paper reading)
- Shunted Self-Attention | 源于 PvT又高于PvT,解决小目标问题的ViT方法
- 申请多域名SSL证书的要求及注意事项
- 外卖江湖格局将变,美团“大哥”不好当
- 指标(复杂指标)定义和模型
- Taolue biology rushes to the scientific innovation board: the actual controllers with annual losses of more than 100 million are Zhang Dawei and his wife, who are American nationals
- 涂鸦智能通过聆讯:拟回归香港上市 腾讯是重要股东
猜你喜欢
![Une fois que le port série de Jerry est réglé, le Code aléatoire est imprimé, et le cristal interne n'est pas étalonné [chapitre]](/img/6d/96b3326a201bf17d436c1af7834232.png)
Une fois que le port série de Jerry est réglé, le Code aléatoire est imprimé, et le cristal interne n'est pas étalonné [chapitre]

涂鸦智能通过聆讯:拟回归香港上市 腾讯是重要股东

从零开发小程序和公众号【第二期】

Halcon knowledge: contour operator on region (1)

20set introduction and API

Product design - Requirements Analysis

8. AI doctor case

杰理之串口设置好以后打印乱码,内部晶振没有校准【篇】

This year, Anhui master fund exploded

对比学习(Contrastive Learning)综述
随机推荐
从零开发小程序和公众号【第一期】
Cloud security daily 220623: the red hat database management system has found an arbitrary code execution vulnerability and needs to be upgraded as soon as possible
【NOI2014】15.起床困難綜合症【二進制】
A review of comparative learning
qgis导入WMS OR WMTS
高级计网笔记(七)
(10)二叉树
南芯半导体冲刺科创板:年营收9.8亿 顺为红杉小米OPPO是股东
(10) Binary tree
高级计网笔记(三)
【对比学习】koa.js、Gin与asp.net core——中间件
Taolue biology rushes to the scientific innovation board: the actual controllers with annual losses of more than 100 million are Zhang Dawei and his wife, who are American nationals
产品反馈机制
Jericho Forced upgrade [chapter]
【One by One系列】IdentityServer4(二)使用Client Credentials保护API资源
外卖江湖格局将变,美团“大哥”不好当
Advanced network accounting notes (6)
Advanced network accounting notes (VII)
杰理之串口通信 串口接收 IO 需要设置数字功能【篇】
[one by one series] identityserver4 (II) using client credentials to protect API resources