High-Speed Data Acquisition Systems

High-Speed Data Acquisition Systems

Challenges and design choices to network FPGAs and servers for high-speed data acquisition or retrieval. We discuss TCP/IP as a very fast transport when using TCP/IP full accelerators, in the sensor-side FPGA and in the server. 

Increase Speed and Save Resources with Simple Coding Style Changes

Increase Speed and Save Resources with Simple Coding Style Changes Our Mission: If It Is Packets, We Make It Go Faster! And with packets we mean: Networking using TCP/UDP/IP over 10G/25G/50G/100G Ethernet; PCI Express (PCIe), CXL, OpenCAPI; data storage using SATA, SAS, USB, NVMe; video image processing using HDMI, DisplayPort, SDI, FPD-III. Over the last decade, we have become experts in accelerating software-rich system stacks via offloading CPUs using so-called Domain-Specific Architectures for computing. For implementation, we make heavy use of heterogeneous processing devices such as FPGAs which we program using C++/C/SystemC as well as VHDL and Verilog HDL. ASIC vs. FPGA in Process Acceleration Compared to ASICs, FPGAs are a much more versatile option when it comes to accelerating processes with hardware, as an FPGA can be reconfigured as often as needed. However, one large benefit to ASICs is the possible maximum clock speed that can be reached. As its circuit is optimized for its specific function, it has a smaller footprint, resulting in a faster maximum clock speed. So one aspect of accelerating a process with FPGAs is not only just to redesign that process in hardware and hoping for faster results, but to smartly redesign that process to use as little hardware space as possible, resulting in a higher

Latency Measurement of 10G/25G/50G/100G TCP-Cores using RTL Simulation

Latency Measurement of 10G/25G/50G/100G TCP-Cores using RTL Simulation

Distributed Systems-of-Systems which, for example, connect smart sensor hubs with centralized processing via Ethernet, require very low transport latencies in order to deliver short response times. This makes it difficult for system designers to evaluate. And, things get worse if the measurement setup and methodology is not clearly explained, neither can be reproduced. Therefore, in this Technical Brief we describe how we use the Questa Advanced Simulator from Siemens EDA to measure network latency and analyze latency in a network protocol processing system. And, we also provide the most recent latency values for NPAP, the TCP/IP Stack from Fraunhofer HHI which is, as it turns out, very competitive with other solutions. Being integrators ourselves, we believe we owe this to the FPGA ecosystem!

Deterministic Networking with TSN-10/25/50/100G

Deterministic Networking with TSN-10/25/50/100G

Growing Demand for Deterministic Networking

We all observe a growing need to connect computers with each other with shorter delays (i.e. lower latencies) and higher bandwidth, in particular for High-Performance Computing (HPC) in the data center and in embedded systems such as advanced industrial robotics or autonomous vehicles, requiring the so-called deterministic networking. Processing of TCP/IP based network protocols at speeds of 10 Gbps and beyond demand kernel bypass solutions (such as Intel’s DPDK or Solarflare’s/Xilinx’ Onload or Mellanox/NVida VMA) and/or so-called TOEs (TCP Offload Engines). 

Domain-Specific Architectures (DSA) use so-called heterogeneous computing elements, also known as Cores with the objective to put the compute burden where it belongs. This is a well established approach going back to the early days when an x86 CPU was partnered with an x87 for better floating-point processing. Today, it is common to deploy various flavors of Cores, for example:

  • DSP Cores for digital signal processing in telecommunications
  • Shader Cores optimized for image processing, as they can be found in modern Graphics Processing Units (GPU) 
  • Tensor Processing Units (TPU) Cores which are optimized for Artificial Intelligence and Deep Learning

This is because such (special purpose) fixed-function or programmable function accelerator Cores are optimized for a particular domain and, when properly used, not only take processing load off the (general purpose) CPU but also deliver better overall performance (which is data processed per time) and better efficiency (which is performance per Watt).

Over the following pages we will make a case for processing TCP/IP over TSN over 10/25/50/100 Gigabit Ethernet on dedicated Cores which has significant advantages in particular for real-time Ethernet and Deterministic Networking. These so-called TCP-TSN-Cores can be integrated either in FPGAs or in SoCs (ASIC and ASSP). As we will show, TCP-TSN-Cores are more than just a TOE – the commonly used approach for network protocol acceleration. By running the entire network protocol stack from OSI Layer 2 to at least Layer 4 in a dedicated integrated circuit – a so-called Full Accelerator – we can remove (general purpose) CPUs entirely from the datapath. 

Hence, TCP-TSN-Cores can deliver very low bounded and deterministic latency with predictable scalability needed for 10/25/50/100 Gigabit Deterministic Networking. 

CONTACT MLE

Please fill in the form below, so we can give you access to the Remote Evaluation System.


    NPAP-10G Remote Eval.NPAP-25G Remote Eval.


    X
    CONTACT MLE