Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scope

Motivation

As network links keep getting faster, sustaining incoming packet flow and saturating outgoing bandwidth keeps getting harder. This is especially true for UDP communications, where untimely incoming traffic processing and improper outgoing traffic pacing can easily cause data loss.

Hardware and operating systems have organically grown a large amount of features to help with this challenge. But making the most of these features requires…

  • Careful operating system configuration (buffer sizes at several layers of the network stack, number and kind of hardware RX/TX queues + associated load balancing policies, careful CPU pinning of hardware IRQs and associated software post-processing, all with due attention paid to NUMA and cache locality concerns…)
  • Explicit software support, i.e. a simple POSIX server that does just a socket() + bind() + recv() loop will either not make the most of these features or not benefit at all. For optimal performance, use of special APIs (like io_uring on Linux) and configuration (like the many setsockopt() tunables from man 7 socket, man 7 ip and man 7 udp) is often required.
  • Willingness to either cut off support for older OS/hardware or add complexity in the form of fallback code paths that handle the absence of newer features.

Finally, the associated knowledge is not neatly collected in a nice centralized place, like a One True Networking book that every expert should read and keep at hand. It is rather scattered across many articles on the Internet, each of which is laser-focused on a particular fragment of the networking stack, and many of which are outdated with respect to the latest hardware and OS developments.

When all of this is combined, it is no wonder that building efficient UDP networking applications keeps getting harder dans harder. The goal of udipe is to make this easier, at least for some categories of applications.

Target audience

As mentioned above, udipe is not a general-purpose networking library. It is focused on UDP communication1, and many of its design choices are biased towards the needs of physics data acquisition systems. In those systems…

  • UDP is mainly used for the purpose of sending data out of electronics cards because TCP is too costly to implement on FPGAs. Therefore acquisition computers are mainly concerned with handling large volumes of incoming UDP traffic.
    • Some UDP control messages do get sent sometimes, however, and having a fast send path allows udipe benchmarks not to depend on e.g. iperf. So the send path should still be pretty fast, if not optimally so.
  • CPU processing of incoming packets is a common bottleneck, i.e. often the packet correctly gets from FPGA A to the NIC of computer B then gets lost because one buffer filled up on the path from NIC to OS kernel to user processing thread.
  • The number of peers that are sending data to a particular server can be small, reducing the effectiveness of hardware/OS automatic load balancing and parallelization.
  • The main production platform is dedicated Linux servers, which are managed by the acquisition system’s development team. This means that…
    • Recent versions of Linux distributions and packages can easily be used.
    • System settings can easily be tuned for optimal application performance.
    • Virtualization layers, which add overhead and make performance harder to control and reason about, can easily be avoided.
    • Shared resources like networked filesystems can easily be taken off the application’s hot path, further increasing control on system performance.
    • Reserving a large amount of system resources (e.g. several network interfaces, many CPU cores…) for the nearly exclusive use of a particular application is fair game.
    • Supporting Windows or macOS is not a strong requirement, merely a nice-to-have convenience for local testing on developer machines.
  • The developer audience presents a relatively high willingness to use unusual network APIs in the pursuit of optimal performance.

To the developers and maintainers of such systems, the udipe project wants to provides two things:

  • A C112 library called libudipe that, given some configuration (from sane defaults to very detailed manual tweaking for a specific workload), sets up a high-performance UDP network pipeline in your application.
  • A Linux system administration tool called udipe-setup that ingests the same configuration as libudipe and automatically configures the underlying system optimally for the intended network workload. This tool is supplemented by a udipe-setup.service systemd service to easily enable automatic boot-time machine configuration.

System support

As high-performance networking primitives are OS-specific and evolve fast even for a particular OS, supporting many OSes and OS versions is costly. As a resource-constrained project, udipe opts to provide multiple tiers of support:

  1. The main production platform is the latest Ubuntu LTS release, 24.04 at the time of writing. On this Linux distribution, udipe aims to achieve peak performance, with an installation that is as easy as possible (minimal deviation from the standard distribution package configuration). All udipe components are frequently built, tested and benchmarked on this platform.
  2. R&D is carried out using newer rolling release Linux distributions like Arch Linux, to evaluate new kernel/software features and assess the benefits of mid-cycle production platform updates like -hwe kernels or liburing updates. udipe is regularly tested on these platforms, but less extensively than on the production platform. On these distributions udipe may not yet leverage all the latest kernel and library network performance features.
  3. The minimal support tier is systems that provide the basic POSIX UDP interface (socket() + bind() + connect() + send()/sendto()/sendmsg() + recv()/recvfrom()/recvmsg()). The goal here is that it should be possible to build and run libudipe-based applications on these systems and they should behave correctly, but…
    • No effort is made to keep the build process easy, so e.g. some toolchain and library upgrades may be required.
    • The resulting application may not perform optimally because old network performance optimizations that have been superseded by newer ones (e.g. sendmmsg()/recvmmsg() which is mostly replaced by io_uring) may not be supported by udipe.
    • The udipe-setup system configuration assistant may not be usable (especially on non-Linux platforms like macOS).

  1. Non-UDP primitive, e.g. filesystem I/O or signal handling helpers, may be provided to expose OS features like sendfile() or ppoll() that cannot be used without a hook into the low-level OS I/O primitives that udipe abstract away. But these operations are not the core focus of udipe.

  2. …with occasional use of GNU extensions supported by GCC and clang in situations where standard C cannot express the desired semantics. MSVC compatibility is planned eventually, and will happen by either removing usage of these extensions or replacing them with MS equivalents.

libudipe design

In the following sections, we will describe some high-level architectural decisions that permeate the entire libudipe API, as complementary reading to the API reference documentation which focuses on a fine grained description of each individual entry point (struct, function, etc).

Logging & errors

Limits of error codes

C error codes, whether provided via negative integer return values or the errno thread-local variables, are meant to let applications that call a library understand what kind of error occured and take appropriate action that is tailored to the error at hand. But in practice…

  • Most errors are fatal/unrecoverable at the application level, in the sense that the only thing a typical application can do when the error is encountered is to log what happened somewhere (typically stderr) in the hope of easing debugging, then cleanly terminate.
  • That single error log, assuming the caller does remember to print it out, is often not very helpful in later debugging. It provides little context over which function was called, why it was called, or what kind of parameters were passed to it. To answer such questions without complex debugging tools, the developer needs extended logging that provides context over what the application was doing at the time where the error occured.
  • Part of the reason why perror()-style logs are unhelpful is that error codes as a function return value only provide a very imprecise description of what happened. errno is even worse as its standard codes are not adjusted to the needs of individual functions, and it easily gets overwritten on the error propagation path. More precise programmatic error descriptions are possible, but used sparingly as they expose implementation details and either…
    1. Increase the size of a function’s return value, thus pessimizing the performance in the common case where no error occured.
    2. Require awkward APIs such as out parameters that only get filled on the error path, thread-local state with reentrance support, POSIX signals & other callbacks…

Coming from this perspective, libudipe takes the following stance :

  • Unrecoverable errors require abnormal termination, abnormal termination requires debugging, and debugging requires precise information beyond what integer error codes can provide. Logging is the sanest way to provide the required context for debugging, avoiding extended error descriptors that require complex APIs and expose too many library internals that are subjected to future change.
  • Given the necessity of logging, traditional error codes become redundant, inferior to it due to their reduced precision with it, and a usability hazard as a caller may forget to check for them. They should thus only be kept in the situation where 1/some errors are recoverable and 2/there are multiple logically distinct recovery paths which require matching program control flow. Otherwise, there are simpler and safer strategies, as detailed below.

Error reporting policy

Due to the above, libudipe embraces the following logging and error reporting policy:

  • The most verbose logging level provided by udipe (TRACE) should allow someone to figure out the exact control flow path that was taken through libudipe code. Less verbose levels exist…
    • To reduce the performance overhead of logging until the INFO point, at and beyond which the logging overhead should be negligible in all expected usage scenarios.
    • To visually distinguish normal conditions (INFO), suspicious conditions (WARNING) and clearly bogous conditions (ERROR) so that the latter stand out in the logs.
    • To make logs easier to read in the common case where the application runs fine and precise control flow tracing is unnecessary. When an unexplained errors appears, the application can be re-run at a more verbose logging level until the error is encountered again and understood, at which point logging can be tuned back down.
  • We distinguish fatal and non-fatal errors. Fatal errors are those that a realistic application cannot recover from, which should result in application termination. All errors are fatal unless proven non-fatal by virtue of describing a realistic recovery scenario. Fatal errors are handled via exit(EXIT_FAILURE) after sending an appropriate ERROR log to explain what went wrong.
  • Non-fatal errors are further classified according to how many non-fatal error cases could require qualitatively different handling on the application side.
    • If from the application side there are only two handling scenarios, error and non-error, then the error is handled by returning a sentinel value, e.g. a null pointer for a function that returns a pointer. If the return type does not provide a sentinel value that is idiomatic to the seasoned C programmer (null pointers and negative signed integer -1), then this pattern will be used…

      typedef struct foo_result_s {
          bool valid;
          T success;
      } foo_result_t;
      
      foo_result_t foo();
      

      …to highlight the need to check for errors before examining the output value.

    • If from the application side there are multiple error cases that require qualitatively different handling with different control flow paths, then if the function naturally returns a positive integer that can safely be turned into a signed integer, the negative values are used to enumerate the various error conditions. Otherwise this pattern will be used…

      typedef enum bar_status_e {
        BAR_SUCCESS = 0,
        BAR_ERROR_OOPS,
        BAR_ERROR_MYBAD,
        BAR_ERROR_WHATEVER,
      } bar_status_t;
      
      typedef struct bar_result_s {
          bar_status_t status;
          T success;
      } bar_result_t;
      
      bar_result_t bar();
      

      …to highlight the need to check for errors before examining the output value.

Aside from this strong stance on fatal errors and the role of logging, libudipe otherwise tries to follow traditional C library design conventions, and its logging system features the usual log levels (TRACE, DEBUG, INFO, WARNING and ERROR) along with a way to redirect log messages to a sink of your choosing. When this is not done, or when the user-defined log sink is not available (e.g. during logger initialization), log messages go to stderr per Unix convention.

Input validation policy

Beyond the question of how errors should be reported lies the question of when error conditions should be checked for. Indeed, it is not uncommon for the implementation of a particular library function to transitively call other library functions, and repeatedly validating user inputs and system call outputs every step down the pipeline gets expensive from a performance perspective.

The libudipe policy here is that…

  • Error avoidance is the preferred strategy, so if there is a way to redesign a function such that it has fewer invalid inputs (e.g. by using unsigned integer types instead of signed ones when a positive value is expected), then that path should be preferred.
  • Every user-visible function (with a declaration in include/) where some parameter values are invalid (e.g. null pointer, out-of-range integer index…) must document invalid parameter values in its documentation, check for them, and handle them appropriately as described above.
  • Functions that are not user-visible and consume parameters which should have been pre-validated by user-visible frontend functions must still document invalid parameters in internal documentation. But they are allowed to only check for such invalid values in Debug builds i.e. when the NDEBUG define is not set, because such values should only be present when libudipe has an input validation bug. In Release builds with NDEBUG, conditions that should only result from libudipe validation bugs are allowed to have arbitrarily bad outcome, including undefined behavior, when performance considerations justify it.
  • Every part of the libudipe that calls into the operating system must handle all documented error cases from the underlying system call. Such error cases must be described in their internal documentation (referring to the original system call(s) documentation is fine for internal functions), recursively across callers, all the way to user-visible entry points where an easy-to-understand description of error cases is preferred.

API reference

The API reference documentation is available here.