EP-4120665-B1 - SCALABLE SOCKETS FOR QUIC

EP4120665B1EP 4120665 B1EP4120665 B1EP 4120665B1EP-4120665-B1

Inventors

BALASUBRAMANIAN, Praveen
OLSON, MATTHEW A.
BANKS, Nicholas A.
DAS, SOURAV
GRIFKA, NICHOLAS J.

Dates

Publication Date: 20260506
Application Date: 20190618

Claims (15)

A method for performing batched user datagram protocol (UDP) processing, the method comprising: receiving (302) a plurality of UDP packets; combining (304) the plurality of UDP packets into a packet batch based on at least a size of the packet batch dynamically determined by measuring system usage; attaching a UDP header to the packet batch; conveying the packet batch to a network stack at least by performing (306) a down call for an indicated socket based on at least the size of the packet batch; and sending (308) the packet batch from the network stack to a network adapter for transmission over a network.
The method of claim 1, wherein the down call is a single down call, and further comprising performing the single down call for the packet batch.
The method of claim 1, further comprising determining the size of the packet batch based at least on system usage.
The method of claim 1, further comprising executing a lookup operation in a data path once for the packet batch.
The method of claim 4, wherein each lookup operation comprises one of the following: network security inspection, address resolution, finding a data route.
The method of claim 1, wherein the size of the packet batch is smaller than a size of a maximum transmission unit, MTU.
The method of claim 1, further comprising attaching an Internet Protocol header to the packet batch.
A system for performing batched user datagram processing, UDP, the system comprising a processor programmed to perform: receive (302) a plurality of UDP packets; combine (304) the plurality of UDP packets into a packet batch based on at least a size of the packet batch dynamically determined by measuring system usage; attach a UDP header to the packet batch; convey the packet batch to a network stack at least by performing (306) a down call for an indicated socket based on at least the size of the packet batch; and send (308) the packet batch from the network stack to a network adapter for transmission over a network.
The system of claim 8, wherein the down call is a single down call, and wherein the processor is further programmed to perform the single down call for the packet batch.
The system of claim 8, wherein the processor is further programmed to determine the size of the packet batch based at least on system usage.
The system of claim 8, wherein the processor is further programmed to execute a lookup operation in a data path once for the packet batch.
The system of claim 11, wherein the lookup operation comprises one of the following: network security inspection, address resolution, or finding a data route.
The system of claim 8, wherein the size of the packet batch is smaller than a size of a maximum transmission unit, MTU.
The system of claim 8, wherein the processors is further programmed to attach an Internet Protocol header to the packet batch.
A computer-readable storage media comprising computer-executable instructions that, when executed by a processor, cause the processor to perform the method of one of claims 1 to 14.

Description

BACKGROUND Communication protocols define the end-to-end connection requirements across a network. QUIC is a recently developed networking protocol that defines a transport layer network protocol that is an alternative to the Transmission Control Protocol (TCP). QUIC supports a set of multiplexed connections over the User Datagram Protocol (UDP) and attempts to improve perceived performance of connection-oriented web applications that currently use TCP. For example, QUIC connections seek to reduce the number of round trips required when establishing a new connection, including the handshake step, encryption setup, and initial data requests, thereby attempting to reduce latency. QUIC also seeks to improve support for stream-multiplexing. Traditionally, all UDP applications are message oriented. As a result, the message boundary needs to be preserved across packetization on send and reconstructed on receive. Also, Internet Protocol (IP) fragmentation has large performance overhead on both the host and the network, so to avoid IP fragmentation, applications typically post sends that are smaller than a maximum transmission unit (MTU), such as one packet at a time, which results in very poor performance. The poor performance results because the entire data path from the application to the network interface card (NIC) is executed for each small packet (or send down call). Similarly on the receiver side, although the NIC can indicate multiple packets, each packet is indicated one at a time from the network stack to the application (in a receive up call). Thus, UDP performance problems due to applications posting one small send at a time to avoid fragmentation. Similarly, receive packets are indicated one at a time. In comparison, TCP performance allows batched operations as the data stream is configured as a byte stream. However, current UDP application programming interfaces (APIs) do not allow an application to take advantage of batch processing of packets. Additionally, UDP is a message oriented transport protocol and the socket APIs on various operating systems (including the Windows® operating system) expose use of UDP as datagram sockets. Use of TCP is exposed as stream sockets. One of the main differences between the APIs is that in the TCP stream socket on the server (listening) socket, there is a notion of the accept API for an incoming connection that results in a new socket object for the child connection. In comparison, for a UDP datagram socket, there is no notion of a listen or accept API. Hence, all incoming connection requests use the same socket object. This can cause problems including that the receive packet processing does not scale well and there is fate sharing among all child connections because of the shared receive buffers and locks. Thus, implementing any UDP server hits scale bottlenecks because all incoming connection requests share the same socket. This configuration can cause performance issues due to locking or other synchronization. The configuration can also cause performance issues due to fate sharing where one connection processing can stall others, or one connection uses up all the receive buffers causing packet drops for other connections. EP 2 768 200 A1 discloses a method for receiving data packet. At least one connection is defined between protocol entities hosted by network nodes in a packet data communication network. Data packets that include information identifying communicating protocol entities are received. The received data packets are stored into batches corresponding to the defined connections, on the basis of the information identifying communicating protocol entities. US 2016/173238 A1 discloses a method for redelivering a subset of messages in a packet to a receiver application. A partially received packet delivered to the application layer allows the application layer to decide whether it has to request for full or partial packet retransmission. The application layer of the receiver generates a PNACK (partial negative-acknowledgement) based on the subset of the messages consumed from within the partial packet. Marian Tudor et al: "NetSlices: Scalable multi-core packet processing in user-space", 2012 ACM/IEEE Symposium on Architectures for Networking and Communications System, ACM, 29 October 2012, discloses an operating system abstraction called NetSlice that tightly couples the hardware and software packet processing resources, and provides an application with control over these resources. NetSlice performs domain specific, coarse-grained, spatial partitioning of CPU cores, memory, and NICs. Moreover, it provides a streamlined communication channel between NICs and user-space. NetSlice API also provides batched (multi-) send/receive operations to amortize the cost of protection domain crossings. It is the object of the present invention to improve the performance of UDP processing systems. This object is solved by the subject matter of the independent claims.