CN-122027605-A - PCIe host for NTB cross-host address translation proxy
Abstract
The invention discloses a PCIe host for NTB cross-host address translation proxy. The NTB of the PCIe host is additionally provided with a VRC unit and a VD unit, the VRC unit and the VD unit are simulated to be VRC with ATS response capability when facing the local host, and can receive ATS requests initiated by the VF of the local equipment, the VD unit is simulated to be VD with SR-IOV function when facing the remote host, and the corresponding VF is provided for interaction with the remote RC. When the VF of the local host initiates an ATS request, the VRC of the NT port of the system receives the request, converts the virtual address into the ATS request of the VF in the remote VD, submits the ATS request to the remote RC, and then sends the ATS request to the remote IOMMU to complete authority verification and actual virtual address to physical address translation, and finally returns the translation result to the VF of the local device. The invention can realize low delay, high throughput and high efficiency in cross-domain access.
Inventors
- LUAN GUIPENG
- Zhi Boxin
Assignees
- 上海芯力基半导体有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251009
Claims (10)
- 1. The PCIe host comprises a PCIe switch, a plurality of communication devices and a root complex configured with an input/output memory management unit, and is characterized in that the PCIe switch is configured with a non-transparent bridge module, wherein the non-transparent bridge module is configured with a non-transparent port, a virtual root complex unit and a virtual device unit; The PCIe host is configured to communicate with a remote host; The virtual equipment unit is configured to receive an address translation service request of a remote host and send a virtual address of the PCIe host to the input-output memory management unit; the input/output memory management unit is configured to translate a virtual address of the PCIe host into a physical address and return a translation result to the virtual equipment unit; the virtual device unit is further configured to return the translated physical address to the remote host through the non-transparent port.
- 2. The PCIe host of claim 1, wherein the far-end host comprises a PCIe switch, a plurality of communication devices, a root complex configured with an input-output memory management unit, the PCIe switch of the far-end host is configured with a non-transparent bridge module, the non-transparent bridge module is configured with a non-transparent port, a virtual root complex unit, and a virtual device unit; The communication device of the remote host is configured to send an address translation service request to a virtual root complex unit of the remote host, and the virtual root complex unit of the remote host forwards the address translation service request to a virtual device unit of the PCIe host through a non-transparent port; The virtual root complex unit of the remote host is configured to receive the translated physical address returned by the PCIe host and forward it to the communication device that issued the address translation service request.
- 3. The PCIe host of claim 2, wherein the communication device of the PCIe host is configured to send an address translation service request to a virtual root complex unit of the PCIe host, the virtual root complex unit of the PCIe host forwarding the address translation service request to a virtual device unit of the far-end host through a non-transparent port; The virtual root complex unit of the PCIe host is configured to receive the translated physical address returned by the remote host and forward it to the communication device that issued the address translation service request.
- 4. The PCIe host of claim 2 or 3, wherein each communication device of the PCIe host and the remote host is configured with a first virtual function module and an address translation cache module, wherein the virtual root complex unit of the PCIe host and the remote host is configured with a number of second virtual function modules, and wherein the virtual device unit of the PCIe host and the remote host is configured with a number of third virtual function modules.
- 5. The PCIe host of claim 4, wherein a number of second virtual function modules of a virtual root complex unit of a far-end host is the same as a number of third virtual function modules of a virtual device unit of the PCIe host, and the PCIe host or far-end host is preconfigured with a mapping table in which the second virtual function modules and the third virtual function modules are in one-to-one correspondence.
- 6. The PCIe host of claim 4, wherein the first virtual function module of the PCIe host's communication device is configured to send an address translation service request to a virtual root complex unit of the PCIe host, the PCIe host's address translation cache module is configured to receive and store translated physical addresses, preferably each PCIe host's communication device is configured with a plurality of first virtual function modules and a plurality of address translation cache modules, the plurality of first virtual function modules and the plurality of address translation cache modules are in one-to-one correspondence, and the address translation cache module corresponding to the first virtual function module that sends an address translation service request to the PCIe host's virtual root complex unit is configured to receive and store translated physical addresses.
- 7. The PCIe host of claim 6, wherein the virtual root complex unit of the PCIe host forwards the address translation service request to a virtual device unit of a remote host by: The virtual root complex unit of the PCIe host receives the address translation service request by utilizing the target second virtual function module; querying the mapping table to determine a target third virtual function module mapped and associated with the target second virtual function module; the target second virtual function module forwards the address translation service request to the target third virtual function module.
- 8. The PCIe host of claim 4, wherein the PCIe switch of the PCIe host is further configured with upstream ports, downstream ports, crossbar matrices, virtual upstream ports, virtual downstream ports; the upstream port is configured to establish a communication connection between the root complex and a crossbar matrix; The downstream port is configured to establish a communication connection between the communication device and a crossbar; The virtual upstream port is configured to establish a communication connection between the crossbar matrix and a virtual root complex unit; The virtual downstream port is configured to establish a communication connection between the crossbar and a virtual equipment unit.
- 9. The PCIe host of claim 4, wherein the input-output memory management unit of the PCIe host is further configured to return address translation service memory attribute information of the PCIe host to a virtual device unit of the PCIe host; And the virtual equipment unit of the PCIe host sends the address translation service memory attribute information to the virtual root complex unit of the remote host through the non-transparent port.
- 10. The PCIe host of claim 8, wherein the PCIe switch of the PCIe host is further configured with an expansion port, and/or wherein the PCIe host further comprises a memory communicatively coupled to a root complex of the PCIe host.
Description
PCIe host for NTB cross-host address translation proxy The application discloses a split application of a patent application with the application number 2025114295567 of 2025, 10 and 9 days and the name of 'a multi-host PCIe system and an NTB cross-host address translation proxy method'. Technical Field The invention relates to the technical field of computers and communication, in particular to a multi-host PCIe system and a Non-transparent bridge (NTB, non-TRANSPARENT BRIDGE) cross-host address translation proxy method based on address translation cache (ATC, address Translation Cache). Background With the expansion of data centers and high performance computing cluster sizes, multi-host system architectures are becoming popular, and NTBs integrated in PCI Express (PCIe) switch chips are widely used for interconnection and resource sharing between different host domains. However, in the SR-IOV (Single Root I/O Virtualization, single Input/output virtualization) virtualization environment, when the VF (Virtual Function) of the local device performs direct memory access (DMA, direct Memory Access) across domains, because the address spaces of the host domains are independent of each other, it is necessary to rely on the NTB and the IOMMU (Input-Output Memory Management Unit, input/output memory management unit) to jointly complete multi-level address translation, where the VF needs to initiate an ATS (address translation service) request to the local host RC (Root Complex), address Translation Service, and the mapping from the Virtual address to the local physical address is completed by the local IOMMU, and then the physical address is converted into a remote address by the NTB, and finally the authority verification and final mapping are performed by the remote IOMMU. Although the link guarantees cross-domain access, the problems of lengthy translation links, high delay, excessive loads of non-transparent bridges and input/output memory management units are brought, PASIDs (Process ADDRESS SPACE ID, process address space identification) are not supported or PASIDs translation needs to be processed additionally, and meanwhile, remote IOMMUs cannot directly utilize AMAs (ATS Memory Attributes, address translation service attributes), so that the requirements of a multi-host system on low delay and high throughput are difficult to meet. In existing multi-host systems, cross-domain memory access relies primarily on the NTB within the PCIe switch chip to effect address translation. When a VF of a device needs to perform DMA to the memory of another host domain, an ATS-based multi-level address translation mechanism is typically employed. The typical flow is that the VF first initiates an ATS request to the RC of the local host, which invokes the local IOMMU to complete the translation of the virtual address to the local host's physical address, which maps to the NTB exposed shared window. And then the VF stores the address returned by the IOMMU into an ATC (address translation cache ), constructs a DMA transaction through the address translation cache, sends the DMA transaction to the NTB, translates the local physical address into the physical address of a remote host after the NTB detects that the target is a cross-domain address, forwards the request to a remote RC, and finally completes the access to the remote memory after the remote RC receives the request and then sends the request to the remote input IOMMU again for permission verification. When the existing multi-host PCIe system realizes cross-domain DMA access through a non-transparent bridge, although the address isolation and access security between different host domains can be ensured, the whole flow has obvious defects that firstly, DMA requests must pass through multistage translation and verification of a local IOMMU, NTB and a remote IOMMU in sequence, so that a translation link is lengthy and access delay is obviously increased, secondly, the NTB needs to perform address translation on each TLP (Transaction LAYER PACKET, transaction layer data packet), and the IOMMU needs to perform authority verification one by one, so that the processing load of a key module is greatly increased, and thirdly, under the condition of not supporting PASIDs, the non-transparent bridge also needs to bear additional PASID translation tasks, so that the system implementation is more complicated, and finally, the remote IOMMU cannot complete the conversion from a cross-domain virtual address to a physical address, so that the existing mechanism cannot optimize by using address translation service memory attribute AMA, and further improvement of performance is limited. That is, it has been difficult in the prior art to meet the needs of multi-host scenarios for low latency, high throughput, and efficient cross-domain access. Disclosure of Invention The invention aims to provide a multi-host PCIe system which can realize low delay, high throughput and high efficien