Myth-Busting Latency Numbers for TCP Offload Engines

February 24, 2026Application Notes, NetworkingAMD Ultrascale+, Intel/Altera Agilex 7, Network Acceleration, NPAP, TCP/IP

Myth-Busting Latency Numbers for TCP Offload Engines

TB20260224

When you shop around for a TCP IP-Core for TCP offload engines, don’t you ask yourself: What does that really mean when you see numbers like this?

This Technical Brief sheds light on TCP/IP processing in an FPGA, and in particular busts
some myths about latency numbers. Here, we will describe proper ways of measuring
latency numbers, and why latency numbers do matter when implementing TCP/IP in
FPGAs.

Based on MLE’s past projects experience, it all comes down to technical and economical
feasibility determined by FPGA resource costs!

Table of Contents

Myth-Busting Latency Numbers for TCP Offload Engines

A TCP/IP Stack for High-Performance Chip-to-Chip

April 3, 2025Application Notes, NetworkingAurora Protocol, AXI4-Stream, High-speed Data, Long-Distance Distributed System, Network Acceleration, NPAP, TCP/IP

A TCP/IP Stack for High-Performance Chip-to-Chip

TB20250403

Stream processing enables parallel data handling by creating pipelines of Stream Processing Elements (SPE) between a data source (typically a sensor) and a data sink (typically a decision maker), synchronized through back pressure. For multiple FPGA systems, the so-called AURORA protocol is frequently used in industrial, medical and scientific applications. However, TCP/IP over multi-Gigabit Ethernet can offer a more reliable and resource-efficient solution for distributed systems, in particular when FPGA devices are farther apart.

This Technical Brief shares our experiment data, comparing the reliability and resource efficiency of Aurora protocol and TCP/IP, an MLE NPAP Full Accelerator running TCP/IP inside the programmable logic.

Table of Contents

A TCP/IP Stack for High-Performance Chip-to-Chip
- Stream Processing Pipelines with TCP
- References

Analyzing Network Impairment and Signal Integrity in High-Speed TCP/UDP/IP Ethernet Networks

March 27, 2025Application Notes, NetworkingHigh-speed Data, Integrated Bit Error Rate Testing (iBERT), Latency Optimization, Netperf, Network Acceleration, Network Impairment Emulation, NPAP, RTL Simulation, TCP/IP, TCP/UDP/IP

Analyzing Network Impairment and Signal Integrity
in High-Speed TCP/UDP/IP Ethernet Networks

TB20250327

Summary

MLE has been supporting customers in implementing distributed systems. MLE NPAP, the TCP/UDP/IP Stack originating from Fraunhofer HHI, has been one of the underlying technologies to connect FPGAs with each other as well as with servers and sensors reliably.

Customer implementations range from systems using a purpose-built LAN, or systems exposed to high levels of electro-magnetic inference (EMI), or rotating nodes using slip rings, or wireless.

Engineering teams often focus on delivering high throughput (close to linerate) and very low latency (i.e. a small Bandwidth-Delay-Product). Network impairment can quickly derail those engineering efforts. Therefore, MLE NPAP has built-in means to diagnose the effects of real-life network impairment.

This Technical Brief expands on analyzing network impairments and explains how these can be emulated when using Field-Programmable Gate Arrays (FPGAs), leading to technical insights allowing you to deliver better systems, faster.

Table of Contents

Analyzing Network Impairment and Signal Integrityin High-Speed TCP/UDP/IP Ethernet Networks

MLE’s Novel Approach for Microsecond-level Time Synchronization Between 5G V2X Sidelink User Equipments (UEs)

January 14, 2025Application Notes, Networking5G, AMD Kintex 7, GNSS, Micro-second-level Time Synchronization, Precision Time Protocol, PTP v2.1, Radar, Sidelink, White Rabbit

MLE’s Novel Approach for Microsecond-level Time Synchronization Between 5G V2X Sidelink User Equipments (UEs)

TB20250114

GNSS is widely used for data transmission and synchronization, particularly in applications that merge recorded radar data and coordinate multiple nodes over the air. Accurate timestamping is crucial for ensuring data integrity, requiring a reliable and common time source. However, GNSS connectivity can be limited or entirely unavailable in certain environments, such as tunnels or urban canyons, posing a challenge for precise synchronization.

To address this, MLE is developing an innovative approach to synchronize two 5G User Equipment (UE) devices with microsecond-level accuracy. This solution leverages the PTPv2 High Accuracy (HA) profile, based on CERN’s White Rabbit technology, and builds upon the OpenAirInterface project.

Table of Contents

MLE’s Novel Approach for Microsecond-level Time Synchronization Between 5G V2X Sidelink User Equipments (UEs)

High-Speed Data Acquisition Systems

August 16, 2023Application Notes, Networking10/25/50/100G Ethernet, AMD, Intel, Network Acceleration, Network Protocol Accelerator, NPAP, Storage Acceleration, TCP/UDP/IP, Trenz

High-Speed Data Acquisition Systems

TB20230713

Challenges and design choices to network FPGAs and servers for high-speed data acquisition or retrieval. We discuss TCP/IP as a very fast transport when using TCP/IP full accelerators, in the sensor-side FPGA and in the server.

Table of Contents

High-Speed Data Acquisition Systems
- 1 Challenges in High Speed Data Acquisition
- 2 Design Choices for High Speed Data Acquisition

Put a TCP/UDP/IP Turbo Into Your FPGA-SmartNIC

May 23, 2023Application Notes, Networking10/25/50/100G Ethernet, Intel, Network Acceleration, Network Protocol Accelerator, NPAP, SmartNIC, TCP/IP

Put a TCP/UDP/IP Turbo Into Your FPGA-SmartNIC

TB20230523

How MLE and Fraunhofer HHI are breaking the 500 MHz fMax barrier in network protocol acceleration (TCP/IP stack) by using Intel Agilex FPGAs as an FPGA SmartNIC.

Summary

MLE provides full system stacks including FPGAs, with a focus on networking for HPC, datacenter, or telecommunications. Often, we implement so-called full accelerators where almost all protocol processing runs efficiently within the FPGA fabric.

Within this Technical Brief we elaborate on two key aspects for FPGA-based SmartNICs:

How to implement rapid, reliable connectivity from edge to cloud using FPGA-based SmartNICs with TCP/IP stack acceleration, and
how these implementations can benefit from modern FPGA technology, namely Intel HyperFlex, to deliver better performance, cost and power.

Our design choices for FPGA SmartNICs include the Corundum project, an open-source, high-performance FPGA-based Network Interface Card (NIC) platform. Corundum supports In-Network Processing for which we have integrated MLE’s Network Protocol Accelerator Platform (NPAP) based on the TCP/UDP/IP Full Accelerator from Fraunhofer HHI.

To enable network protocol processing at linerates of 100 Gbps, or faster, we have optimized this implementation for Intel HyperFlex architecture. The result is a “turbo charged” FPGA SmartNIC which combines several advantages:

NPAP with high throughput for those “heavy” TCP data streams which make up for most of the network traffic.
NPAP for those latency sensitive TCP connections where TCP round-trip time (RTT) may dominate the entire system’s response time.
Corundum processing in open-source Linux software for the rest, i.e. all those administrative and control TCP connections which hardly use any bandwidth and which are not latency sensitive.
Performance optimizations utilizing Intel HyperFlex to break the 500 MHz fMax barrier and to avoid FPGA resource “bloat”¹.

Table of Contents

Put a TCP/UDP/IP Turbo Into Your FPGA-SmartNIC

Latency Measurement of 10G/25G/50G/100G TCP-Cores using RTL Simulation

March 5, 2022Application Notes, Networking10/25/50/100G Ethernet, Network Acceleration, NPAP, TCP/UDP/IP

Latency Measurement of 10G/25G/50G/100G TCP-Cores using RTL Simulation

TB20220305

Distributed Systems-of-Systems which, for example, connect smart sensor hubs with centralized processing via Ethernet, require very low transport latencies in order to deliver short response times. This makes it difficult for system designers to evaluate. And, things get worse if the measurement setup and methodology is not clearly explained, neither can be reproduced. Therefore, in this Technical Brief we describe how we use the Questa Advanced Simulator from Siemens EDA to measure network latency and analyze latency in a network protocol processing system. And, we also provide the most recent latency values for NPAP, the TCP/IP Stack from Fraunhofer HHI which is, as it turns out, very competitive with other solutions. Being integrators ourselves, we believe we owe this to the FPGA ecosystem!

Table of Contents

Latency Measurement of 10G/25G/50G/100G TCP-Cores using RTL Simulation

Latency Analysis for NVMe/TSN

May 29, 2021Application Notes, NetworkingNVMe, Time-Sensitive Networking (TSN)

NVMe/TSN Latency Analysis

TB20210529

Version 1.0

NVMe/TSN is a range extension for NVM Express (NVMe) done by tunneling PCI Express (PCIe) over TCP/IP over Time Sensitive Networking (TSN). This MLE Technical Brief gives a quantitative analysis of the latency when tunneling NVMe over TCP/IP over TSN.

Table of Contents

NVMe/TSN Latency Analysis
5. Conclusion & Backgrounder

Deterministic Networking with TSN-10/25/50/100G

December 3, 2020Application Notes, Networking10/25/50/100G Ethernet, Network Acceleration, TCP/IP, TCP/UDP/IP, Time-Sensitive Networking (TSN)

Deterministic Networking with TSN-10/25/50/100G

TB20201203

Growing Demand for Deterministic Networking

We all observe a growing need to connect computers with each other with shorter delays (i.e. lower latencies) and higher bandwidth, in particular for High-Performance Computing (HPC) in the data center and in embedded systems such as advanced industrial robotics or autonomous vehicles, requiring the so-called deterministic networking. Processing of TCP/IP based network protocols at speeds of 10 Gbps and beyond demand kernel bypass solutions (such as Intel’s DPDK or Solarflare’s/Xilinx’ Onload or Mellanox/NVida VMA) and/or so-called TOEs (TCP Offload Engines).

Domain-Specific Architectures (DSA) use so-called heterogeneous computing elements, also known as Cores with the objective to put the compute burden where it belongs. This is a well established approach going back to the early days when an x86 CPU was partnered with an x87 for better floating-point processing. Today, it is common to deploy various flavors of Cores, for example:

DSP Cores for digital signal processing in telecommunications
Shader Cores optimized for image processing, as they can be found in modern Graphics Processing Units (GPU)
Tensor Processing Units (TPU) Cores which are optimized for Artificial Intelligence and Deep Learning

This is because such (special purpose) fixed-function or programmable function accelerator Cores are optimized for a particular domain and, when properly used, not only take processing load off the (general purpose) CPU but also deliver better overall performance (which is data processed per time) and better efficiency (which is performance per Watt).

Over the following pages we will make a case for processing TCP/IP over TSN over 10/25/50/100 Gigabit Ethernet on dedicated Cores which has significant advantages in particular for real-time Ethernet and Deterministic Networking. These so-called TCP-TSN-Cores can be integrated either in FPGAs or in SoCs (ASIC and ASSP). As we will show, TCP-TSN-Cores are more than just a TOE – the commonly used approach for network protocol acceleration. By running the entire network protocol stack from OSI Layer 2 to at least Layer 4 in a dedicated integrated circuit – a so-called Full Accelerator – we can remove (general purpose) CPUs entirely from the datapath.

Hence, TCP-TSN-Cores can deliver very low bounded and deterministic latency with predictable scalability needed for 10/25/50/100 Gigabit Deterministic Networking.

Table of Contents

Deterministic Networking with TSN-10/25/50/100G

Myth-Busting Latency Numbers for TCP Offload Engines

A TCP/IP Stack for High-Performance Chip-to-Chip

Analyzing Network Impairment and Signal Integrityin High-Speed TCP/UDP/IP Ethernet Networks

Summary

MLE’s Novel Approach for Microsecond-level Time Synchronization Between 5G V2X Sidelink User Equipments (UEs)

High-Speed Data Acquisition Systems

Put a TCP/UDP/IP Turbo Into Your FPGA-SmartNIC

Summary

Latency Measurement of 10G/25G/50G/100G TCP-Cores using RTL Simulation

NVMe/TSN Latency Analysis

Deterministic Networking with TSN-10/25/50/100G

Growing Demand for Deterministic Networking

Analyzing Network Impairment and Signal Integrity
in High-Speed TCP/UDP/IP Ethernet Networks