CN-122019310-A - Memory access delay acquisition method, memory access delay acquisition equipment, memory medium and program product
Abstract
The embodiment of the application provides a memory access delay acquisition method, equipment, a storage medium and a program product. In the access delay acquisition method, in an application scenario of a multi-core processor, a processor core is connected to a shared memory controller through a corresponding interconnection node in an on-chip interconnection network to access a physical memory. The access monitor of the core level is arranged on the interconnection node, and delay monitoring can be carried out on the access request sent by each core and routed to the memory controller through the interconnection node to access the physical memory. The kernel of the operating system can access the memory monitor corresponding to each kernel through the driving module to acquire monitoring data. On the logic path, because the monitoring point (access monitor) is positioned in the routing path behind the last-stage cache, the time-consuming interference of the last-stage cache hit path can be eliminated, and the access delay of the access request which is not hit in the last-stage cache is accurately counted according to the granularity of the core, so that the access performance bottleneck of a single core can be conveniently positioned.
Inventors
- ZHANG JING
- XUE SHUAI
- CHEN JIANKANG
- SONG ZHUO
Assignees
- 阿里云计算有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (11)
- 1. The access delay acquisition method is characterized by being applied to a driving module in a kernel mode of an operating system, wherein the operating system runs on a server platform, the server platform comprises at least one processor, any processor integrates a memory controller and is directly connected with a local physical memory, the processor comprises a plurality of cores, the cores are connected to a shared final-stage cache and the memory controller through corresponding interconnection nodes in an on-chip interconnection network, and the method comprises the following steps: The method comprises the steps of obtaining a kernel virtual address of a memory monitor corresponding to any core, wherein the memory monitor is positioned on a corresponding interconnection node of the core in the on-chip interconnection network, and the memory monitor is used for monitoring the number and response time delay of memory requests sent by the core, wherein the memory requests are requests sent by the core and are routed to the memory controller through the interconnection node to access the physical memory; according to the kernel virtual address, reading monitoring data of the access monitor corresponding to the kernel; And according to the monitoring data, calculating the access delay of the access request sent by the core.
- 2. The method of claim 1, wherein obtaining the kernel virtual address of the memory monitor corresponding to any core comprises: the address inquiry interface takes the core mark as an input parameter, returns the system physical address of the corresponding access monitor and is predefined in a system description table of the firmware; And mapping the system physical address of the access monitor corresponding to the core into a kernel virtual address.
- 3. The method as recited in claim 1, further comprising: determining a counting register allocated for the core and a configuration register corresponding to the counting register in a performance monitoring unit of an interconnection node corresponding to the core; Receiving a configuration instruction aiming at the core, which is sent by a user-state application program; and writing a corresponding value in at least one field of the configuration register according to the configuration instruction so as to configure the counting register as a memory access monitor corresponding to the core.
- 4. The method of claim 3, wherein the at least one field comprises at least one of a startup enable field for starting the memory monitor, a purge trigger field for purging a count value of the memory monitor, a filter activation field for activating a filter function of the memory monitor, and a filter condition configuration field for configuring filter conditions, the filter condition configuration field comprising at least one of a partition identification configuration field, a security attribute configuration field, a request type configuration field, and a data source configuration field.
- 5. The method of claim 4, wherein the step of determining the position of the first electrode is performed, Writing a corresponding value in at least one field of the configuration register according to the configuration instruction, including: Setting the filtering activation field according to the open filtering instruction in the configuration instruction to activate the request filtering function of the access monitor, and And configuring the partition identification configuration field according to a first partition identifier in the configuration instruction, so that the access monitor monitors access requests which are sent by the core and correspond to the first partition identifier, wherein the first partition identifier corresponds to a first control group.
- 6. The method of claim 5, wherein the first partition identifier is assigned to the first control group by the operating system; the method further comprises the steps of: And when the process corresponding to the first control group is scheduled to run on the core, writing the first partition identifier into a target register of the core, so that the core carries the first partition identifier when sending a memory access request.
- 7. The method as recited in claim 4, further comprising: During the operation of the performance monitoring unit, if an update instruction of the filtering condition is received, the start enable field is configured to be 0 to stop the access monitor, and the clear trigger field is set to be 1 to clear the access monitor, and Updating the partition identification configuration field according to a second partition identifier in the updating instruction; the enable field is configured to 1 to restart the access monitor to cause the access monitor to monitor for access requests corresponding to the second partition identifier Fu Duiying a second control group.
- 8. The method of claim 4, wherein the monitoring data includes a number of memory requests matching the filtering condition in the configuration register and a total number of clock cycles that the memory requests matching the filtering condition in the configuration register have elapsed from the issuance to the receipt of a response; according to the monitoring data, calculating access delay generated by routing the access request sent by the core to the memory controller through the interconnection node to access the physical memory, including: And calculating access delay generated by routing a single access request sent by the core to the memory controller through the interconnection node to access the physical memory according to the ratio of the total number of clock cycles to the number of access requests.
- 9. A server is characterized by comprising a memory and a processor; The memory is used for storing one or more computer instructions; The processor is configured to execute the one or more computer instructions for performing the steps in the method of any of claims 1-8.
- 10. A computer readable storage medium storing a computer program, which, when executed by a processor, is capable of carrying out the steps of the method according to any one of claims 1-8.
- 11. A computer program product comprising computer programs/instructions which, when executed by a processor, are able to implement the steps of the method of any of claims 1-8.
Description
Memory access delay acquisition method, memory access delay acquisition equipment, memory medium and program product Technical Field The present application relates to the field of computer technologies, and in particular, to a memory access delay obtaining method, apparatus, storage medium, and program product. Background In some multitasking deployment scenarios, to guarantee service level objectives (SERVICE LEVEL Objectives, SLO) for different tasks, it is often necessary to reasonably manage and dynamically adjust resource usage of different tasks through a resource scheduling agent (SLO-agent). For example, in some scenarios of online task and offline task mixed deployment (Mixed Workload Scheduling), in order to guarantee SLO for an online task, it is often necessary to reasonably manage and dynamically adjust resource usage of an offline task by an SLO-agent. In this process, real-time monitoring and accurate measurement of key performance indicators (Metrics) of the server are particularly important, wherein Memory access delay (Memory ACCESS LATENCY) indicators are particularly critical. In x86 architectures (e.g., intel or AMD), PMU (Performance Monitoring Unit ) events such as mem_load_response and OFFCORE _response are provided that can be used to indirectly estimate access delay from the CPU core to memory. These events reflect the aggregate behavior of requests traversing the L1 cache, L2 cache, L3 cache, memory controller, or even remote Non-uniform memory access (Non-Uniform Memory Access Node, NUMA) nodes accessed through a hyper-path interconnect (Ultra Path Interconnect, UPI) or infinite architecture (Infinity Fabric), but do not individually strip out the access latency incurred by requests routed from the core egress to the DDR controller via the on-chip interconnect network to access physical memory. Furthermore, the general core PMU of the ARM architecture typically does not provide similar core level memory latency events. Some schemes attempt to account for memory latency by the DDRC PMU, but DDRC PMU can only observe latency inside the DDR physical layer. If a request is congested or queued on a path between an on-chip interconnect Node and a DDR Controller, for example, the request is congested on an SNF (Slave Node-Full) or a TZC (trust zone ADDRESS SPACE Controller ) shown in fig. 1, even if the DDR itself is in an idle state, the actual end-to-end delay may still be high, and the delay counted by the PMU of the DDRC may be significantly low, resulting in distortion of the monitoring result of the access delay. Therefore, a new solution is to be proposed. Disclosure of Invention The embodiment of the application provides a memory access delay acquisition method, equipment, a storage medium and a program product, which are used for accurately acquiring the memory access delay of a memory access request which is sent out from a processor core and accesses a physical memory through an on-chip interconnection network. The embodiment of the application provides a memory delay acquisition method which is applied to a driving module in a kernel mode of an operating system, wherein the operating system runs on a server platform, the server platform comprises at least one processor, any processor integrates a memory controller and directly connects local physical memory, the processor comprises a plurality of cores, the cores are connected to a shared final cache and the memory controller through corresponding interconnection nodes in an on-chip interconnection network, the method comprises the steps of acquiring a kernel virtual address of a memory monitor corresponding to any core, the memory monitor is located on a corresponding interconnection node of the on-chip interconnection network of the core, the memory monitor is used for monitoring the number of memory requests sent by the cores and response time delay, the memory requests refer to requests sent by the cores to the memory controller through the interconnection nodes to access the physical memory, reading the memory monitor data corresponding to the cores according to the kernel virtual address, and calculating the memory delay of the memory requests sent by the cores according to the monitor data. Optionally, the method comprises the steps of calling an address query interface provided by firmware of the server platform to obtain a system physical address of the access monitor corresponding to the core, returning the system physical address of the corresponding access monitor by taking a core identifier as an input parameter, and mapping the system physical address of the access monitor corresponding to the core into the kernel virtual address, wherein the kernel virtual address is corresponding to any core. Optionally, the method further comprises the steps of determining a counting register allocated to the core and a configuration register corresponding to the counting register in a performance monitoring unit of an inte