Network Protocol Accelerator Platform (NPAP) Demo on Xilinx ZCU102 Evaluation Board

Abstract

This demo provides a pre-built environment for evaluation of a network protocol accelerator platform fully implemented in hardware on a Xilinx ZCU102 evaluation board. The test environment implements a standard network performance test on top of a TCP/IP implementation.

Network Protocol Accelerator Platform

The Network Protocol Accelerator Platform (NPAP) allows to build customizable solutions for TCP/IP and UDP protocol acceleration. It is based on patent pending technology from German Fraunhofer Heinrich-Hertz Institute (HHI) and supports Full Acceleration in form of stream-processing of TCP and/or UDP at 1/10/25/50 GigE line rates.

This NPAP Example Design demonstrates how the Xilinx ZCU102 evaluation board could be connected to an Ethernet network providing TCP/IP based services. As an example this design includes a hardware implementation of most features of the netperf v2.6 network performance test tool, see the corresponding netperf github tag.

Note

Please keep in mind that our FPGA implementation is compatible with Netperf v2.6, and only with Version v2.6.

ZCU102 Evaluation Board with Xilinx Zynq UltraScale+ ZU9EG MPSoC

The Quickstart Guide walks you through the board bring-up and describes how to run the NPAP Example Design on the Xilinx ZCU102 platform.

The following sections provide a quick start into MLE’s NPAP Example Design on Xilinx ZCU102, see Fig. 1.

_images/zcu102.png

Fig. 1 ZCU102 Board Components.

Board Bringup

Please follow the steps to connect and power up the ZCU102 board. This section assumes that the following prerequisites have been satisfied:

  1. You have a Xilinx ZCU102 evaluation board, preferrably a Rev. 1 with production silicon.
  2. You have a host PC equipped with a 10GbE Network Interface Card (NIC), e.g. an Intel X520-2.
  3. The host PC used in the demo runs a Linux Ubuntu 14.4 OS.
  4. On the host you need to provide ethtool and netperf v2.6 installed.
  5. You have a 10GbE SFP+ cable, e.g. a passive direct attached copper cable (DAC).

Steps to bring up the board:

  1. Download the reference project from MLE. Please contact MLE via the “Request Information…” Button at for download instructions.
  2. Extract the tarball on your disk and enter the extracted folder.
  3. Prepare an SD card with an MBR partition table comprising a single, bootable FAT32 partition.
  4. Copy the pre-built files BOOT.BIN and image.ub from within the previously extracted folder onto the SD card.
  5. Insert the SD card into the ZCU102 SD card socket.
  6. Set the bootup mode DIP switch SW6 to SD card (off, on, off, on for REVB or on, off, off, off for REV 1.0) as shown in Fig. 2.
  7. Connect top right SFP cage to the host NIC with an SFP+ cable as shown in Fig. 4 and Fig. 3.
  8. Connect 12V Power to the ZCU102 6-Pin Molex connector.
  9. The board can now be powered up by turning on the power supply. Make sure that all power rail status LEDs according to Fig. 5 turn green. PS INIT should turn from red to green and PS DONE should light up to indicate that the FPGA bitstream has been loaded by the FSBL.
_images/boot_mode_sw.png

Fig. 2 Boot Mode DIP switch set to SD card, Rev. 1 on the left, Rev. BCDE on the right.

_images/zcu102_sfp.svg

Fig. 3 Top right SFP+ connector to attach the SFP+ cable to is shaded dark grey.

_images/sfp_conn.png

Fig. 4 SFP+ cable attached to the ZCU102 board.

_images/status_leds.png

Fig. 5 Status LEDs.

NPAP Example Design

Please follow these steps to run the Network Protocol Accelerator Platform (NPAP) Example Design.

Prerequisites:

This section assumes that the steps in section Board Bringup have been accomplished successfully.

The NPAP Example Design demonstrates how the ZCU102 board could be connected to an Ethernet network providing TCP/IP based services. As an example this design includes a hardware implementation of most features of the netperf v2.6 network performance test tool, see the corresponding netperf github tag.

This design provides three ways of exploring the network performance of a hardware TCP/IP implementation:

  1. ping the board
  2. use the TCP loopback via telnet or nc
  3. use netperf to do some benchmarking on UDP and TCP layers

To setup the host PC to be able to communicate with the ZCU102 Example Design, please set the IP adress of the connected network interface according to the setting shown below:

$ sudo ifconfig eth2 192.168.2.105

eth2      Link encap:Ethernet  HWaddr 90:e2:ba:4a:d9:ad
          inet addr:192.168.2.105  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe4a:d9ad/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:608 errors:2 dropped:0 overruns:0 frame:2
          TX packets:3215 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:135015 (135.0 KB)  TX bytes:352322 (352.3 KB)

Then check if the corresponding link is up use the following code:

$ sudo ethtool eth2

Settings for eth2:
      Supported ports: [ FIBRE ]
      Supported link modes:   10000baseT/Full
      Supported pause frame use: No
      Supports auto-negotiation: No
      Advertised link modes:  10000baseT/Full
      Advertised pause frame use: Symmetric
      Advertised auto-negotiation: No
      Speed: 10000Mb/s
      Duplex: Full
      Port: Direct Attach Copper
      PHYAD: 0
      Transceiver: external
      Auto-negotiation: off
      Supports Wake-on: d
      Wake-on: d
      Current message level: 0x00000007 (7)
                             drv probe link
      Link detected: yes

Ping

The simplest connectivity test is using the ICMP layer echo request/reply mechanism, widely known as ping and used by the program ping, which already gives an impression about the short and deterministic latency offered by NPAP:

$ $ ping -c 10 192.168.2.101
PING 192.168.2.101 (192.168.2.101) 56(84) bytes of data.
64 bytes from 192.168.2.101: icmp_seq=1 ttl=255 time=0.035 ms
64 bytes from 192.168.2.101: icmp_seq=2 ttl=255 time=0.027 ms
64 bytes from 192.168.2.101: icmp_seq=3 ttl=255 time=0.029 ms
64 bytes from 192.168.2.101: icmp_seq=4 ttl=255 time=0.027 ms
64 bytes from 192.168.2.101: icmp_seq=5 ttl=255 time=0.027 ms
64 bytes from 192.168.2.101: icmp_seq=6 ttl=255 time=0.026 ms
64 bytes from 192.168.2.101: icmp_seq=7 ttl=255 time=0.027 ms
64 bytes from 192.168.2.101: icmp_seq=8 ttl=255 time=0.031 ms
64 bytes from 192.168.2.101: icmp_seq=9 ttl=255 time=0.028 ms
64 bytes from 192.168.2.101: icmp_seq=10 ttl=255 time=0.026 ms

--- 192.168.2.101 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8997ms
rtt min/avg/max/mdev = 0.026/0.028/0.035/0.004 ms

Telnet

The TCP loopback implements a TCP echo server listening on TCP port 50001, which mirrors any incoming data received back to the sender. To interactively test the TCP loopback implementation telnet allows for connecting to the server as well as interactively sending and receiving data. Telnet sends out data when the return key is pressed:

$ telnet 192.168.2.101 50001
Trying 192.168.2.101...
Connected to 192.168.2.101.
Escape character is '^]'.
Hi MLE TCP Loopback on ZCU102
Hi MLE TCP Loopback on ZCU102

On Linux this local telnet command now needs to be killed as usually the server closes the connection based on a sent command. As the loopback server is no telnet server, it does not recognize this command and so the connection stays open, which keeps the telnet session running.

Netperf

Netperf tests the throughput of a network path, which in this case is a point to point connection between the host with its NIC and the ZCU102 board. Netperf comprises a server (netserver) and a client (netperf). The hardware provided by the NPAP package is a hybrid implementation, which may function as a server or a client, but not both at the same time. For now the netperf server (aquivalent to the netserver tool) is used as it is already setup to listen on the default netserver port (12865) for incoming test requests per default.

The Netperf tools provides multiple test modes, of which some are supported by the hardware implementation:

  1. TCP_STREAM
  2. TCP_MAERTS
  3. UDP_STREAM

The TCP_STREAM test implements a bandwidth performance test for a stream from the client to the server, in our case the host PC to the ZCU102 board:

$ netperf -t TCP_STREAM -H 192.168.2.101 -l 5
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.101 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

87380  16384  16384    5.00     9416.79

Note

The performance results of the netperf tests highly depend on the configuration of the host PC. For more information about how to configure the host Linux system, e.g. see RedHat Network Performance Tuning Guide.

Note

The final netperf handshake after a TCP_STREAM test sometimes does not correctly finish, so that another TCP_STREAM test afterwards is not possible anymore although other tests are still available. However, a reboot resolves this issue.