SmartNIC Architecture Design: MP VS FPGA VS ASIC

From the perspective of core processors, there are currently three main types of SmartNIC architectures, which are based on FPGA, MP (mulTI-coreprocessors) and ASIC. Microsoft Research uses FPGA as an important representative of the core programmable processor of smart NICs. The three diagrams below describe the evolution of a series of design architectures at Microsoft.

SmartNIC Architecture Design MP VS FPGA VS ASIC

In 2014, Microsoft proposed a reconfigurable data center cloud service acceleration solution based on high-end FPGA – Altera StraTIx V D5 Shell (general logic) + Role (reconfigurable processing logic), to solve the problem that commercial servers cannot meet the rapid growth The business needs of the data center, the high cost of customized accelerators, and the lack of flexibility.

Microsoft Catapult architecture
Microsoft Catapult architecture
Microsoft SmartNIC V1 architecture
Microsoft SmartNIC V1 architecture
Microsoft SMartNICV2 Architecture
Microsoft SMartNICV2 Architecture

1. FPGA-based architecture

As shown in Figure (a), Shell is a reusable general logic for communication, management, configuration, etc., including 2 DRAM controllers (managing 2 DRAMs on the FPGA), 4 10 Gbps lightweight inter-FPGA serial interfaces Communication interface SerialLite 3, PCIe core to manage DMA communication, routing logic (for managing data from PCIe, Role, SerialLite 3), reconfiguration logic (for read, write, configure Flash), event flip logic (for stage The role is located in a fixed area of the FPGA chip, which is closely related to the user acceleration application logic, and the Bing search sorting logic can be mapped to the role for acceleration.

In the Catapult design, considering the management and use of FPGAs, all FPGAs under the same rack form a new network in the form of a 6×8 2-dimensional Torus network topology for connection, which can connect all FPGAs under the same rack. Used as an acceleration resource. However, using the way the second set of networks is designed

On the one hand, it increases the network overhead and fault-tolerant management; on the other hand, it can only provide limited acceleration for network flow, storage flow, and distributed applications. In addition, the 2D Torus direct connection within the rack makes it impossible for users to make efficient use of FPGA resources across racks.

Microsoft improved Catapult in its research work in 2016, integrating the FPGA network with the data center network, and proposed a new cloud acceleration architecture design. As shown in Figure (b), two 40 Gbps QSFP ports are designed on the StraTIx VD5 FPGA board, which are respectively connected to the existing common network card and top-of-rack switch (ToR) on the host side. Correspondingly, in the new In the Shell design, the original Catapult’s 4-port SerialLite 3 was replaced with a Lightweight Transport Layer (LTL) engine to handle two 40Gbps ports.

In a 2018 study, Microsoft offloaded the software-defined networking (SDN) stack to its second-generation SmartNICs to better support SR-IOV. As shown in Figure (c), at this time, the second-generation smart network card has integrated a general-purpose network card and a high-end Intel Arria 10 FPGA into one board, and the external ToR port has reached 50 Gbps, but there is no architectural The substantive change is still the design of placing the FPGA between the general network card and the ToR data path to efficiently process the data flow, provide network functions on the path, and accelerate specific applications. Microsoft pointed out in later research that, given the current hardware support of programmable network cards and programmable switches, it will become a trend to make full use of programmable network devices to form an efficient network-wide programmable cloud.

In addition to Microsoft, Mellanox, Intel, Xilinx, etc. have also successively launched FPGA-based smart network card products:

1) Mellanox launched the Innova series of smart network cards based on Xilinx Kintex UltraScale high-end FPGAs, including two generations of Innova and Innova-2 Flex products.

2) Intel has launched two types of programmable PCIe accelerator cards, among which the Arria10/Arria10 GX FPGA-based programmable accelerator card Intel FPGA PAC N3000 is used to accelerate protocol stack processing, NFV and other applications [1]; There is a programmable acceleration card IntelFPGA PAC D5005 based on StraTIx 10 SX, which is oriented to data stream analysis, video encoding and conversion, finance, artificial intelligence, genetic analysis and other fields.

3) The network cards introduced by Xilinx include XtremeScale X2 and 8000, a total of 2 series Ethernet cards. Among them, the X2 series products are designed for data centers, with a bandwidth of 10/25/40/100 Gbps. The combination of its Cloud Onload bypass kernel technology, TCP-Direct technology and X2 can be used in load balancing, database caching, container applications, In terms of web services, the overhead of the operating system is reduced and the performance is improved.

2. MP-based architecture

Another smart NIC design method recognized by the industry is to use the on-chip multi-core method to perform programmable acceleration processing of network data. Most of them use a system on chip (SoC) implementation scheme. Dedicated network processor (NP), such as Netronome NFP series, Cavium Octeon series, or general processor (GP), such as ARM. The following sections will introduce both network processors and general-purpose processors.

A. NP-SoC based smart NIC

Netronome early in 2016 launched the NFE-3240 series of smart NICs for network security-related applications, which can process data packets at a C-programmable line rate of 20 Gbps. In 2018 and 2019, Netronome has successively launched 3 series of Agilio smart network cards:


①Agilio CX for computing nodes, based on NFP-4000 or NFP-5000 network processors, can completely offload the data plane processing in the network function by the virtual switch and offload typical computing-intensive tasks;


②Agilio FX for Bare-Metal server, based on NFP-4000 network processor and 4-core ARM v8 Cortex-A72CPU (runable Linux OS);


③ Agilio LX for service nodes, based on NFP-6000 network processor, is mainly used for virtualized and non-virtualized X86 service nodes and WAN gateways. Agilio series products support flexible packet parsing and Match-Action processing, and can perform eBPF, C, P4 programming.


Cavium introduces the LiquidIO series of smart NICs based on the cnMIPS III network processor. Among them, cnMPIS III is the third-generation product of the Octeon series based on the MIPS64 instruction set architecture (ISA) implemented by Cavium. In addition, there are ARM-based products in the Octeon series.

B. Based on GP-SoC smart network card

 In addition to the FPGA-based Innova series programmable smart network cards, Mellanox also launched the BlueField IPU (I/O processing unit) series programmable smart network cards, supporting Ubuntu and Centos systems. The first generation of BlueField products integrates ConnectX-5 controller, ARM v8 A72 processor array (up to 16 cores, 0.8 GHz), 8/16 GBpsDDR4 memory controller, and supports dual-port 25/50/100 Gbps Ethernet or Infiniband network connection.

BlueField-2 (also a DPU) integrates the latest ConnectX-6 controller, still uses ARM processor array, can support single-port 200Gbps Ethernet or Infiniband network connection, this series of smart network cards can be used to accelerate data center or super network connection. Offload and accelerate security, storage, network protocols and functions in computing.

Design Framework of Smart NIC Based on MP

The MP-based smart NIC design framework is shown in the figure, and contains the following key modules:


①A variety of mature acceleration components, such as Hash calculation, encryption and decryption (Crypto), etc.;


② PCIe interface for communication with the host, most of which support SR-IOV;


③ A variety of interfaces to communicate with peripherals, such as I2C, JTAG, etc.;


④Access the controller of the memory on the smart network card;


⑤ On-chip NP or GP multi-core, used for OVS, RSS (receive side scaling) and other network functions, as well as user-defined functions. The specific on-chip layout of NP or GP multi-core will be different. Most designs use the Mesh method, but there are exceptions. For example, MPPA uses the method of multiple Clusters, and the internal memory of the Cluster is shared.

In addition, some NPs contain multiple processor cores. For example, the Netronome NFP series NPs have two categories: packet processor cores and stream processor cores, which are used for packet parsing, classification, and data flow processing.

3. ASIC-based architecture

At present, there are not many ASIC-based smart network cards. ASIC chips mainly appear in smart network cards as network controllers, such as Mellanox’s ConnectX series, Broadcom’s NetXtreme series, and Cavium’s FastLinQ series. In addition to meeting the processing requirements of traditional network protocols (such as TCP and RoCE), such ASIC network chips also have certain offloading CPU processing capabilities and programmability.

Take Mellanox’s latest ConnectX-6 product as an example, it provides programmable processing and hardware acceleration of the data plane to a certain extent, provides support for virtualization and SDN, and can hardware offload protocols such as VxLAN and NVGRE in network virtualization. Offloads some encryption and decryption operations in network security, supports storage protocol processing for storage scenarios such as NVMe-oF, and supports low-latency communication of zero-copy data in machine learning application scenarios such as GPU-Direct.

4. Comparison of three architectures

The comparison of the above three main architectures is as follows:


(1) In terms of cost performance, ASIC-based smart network cards can basically meet the application scenarios of most general-purpose network processing, can perform programmable processing on the data plane within a predefined range, and provide hardware acceleration support within a limited range. If it is used in batches, it will have a greater advantage in cost performance.


(2) In terms of programming complexity, although ASIC-based smart NICs are not as simple as MP-based smart NICs, they are far easier than FPGA-based smart NICs.


(3) In terms of flexibility in use, ASIC-based smart NICs are the least flexible compared to other smart NICs, and they are not enough for more complex application scenarios. More specifically, ASIC-based smart NICs should be called This is to uninstall the NIC because its programmability is not complete.

From a long-term perspective, its customized logic can provide significant performance improvement for mature application scenarios, but over time, new application scenarios will put forward new functional requirements for smart network cards. At present, many manufacturers use ASIC+GP design methods to solve this problem, similar to Mellanox’s BlueField products (integrated with ConnectX-5 and ARM). At the same time, merchants continue to update ASIC products and customize mature technologies into network cards. For example, the ConnectX series has been updated to the sixth generation. It can be seen that the competition between flexibility and performance in the architecture continues.

Haoxinshengic is a pprofessional FPGA and IC chip supplier in China. We have more than 15 years in this field。 If you need chips or other electronic components and other products, please contact us in time. We have an ultra-high cost performance spot chip supply and look forward to cooperating with you.

If you want to know more about FPGA or want to purchase related chip products, please contact our senior technical experts, we will answer relevant questions for you as soon as possible