CN-122019155-A - Memory replacement policy determining method and electronic equipment
Abstract
The application provides a memory replacement strategy determining method and electronic equipment, the method comprises the steps of obtaining memory error logs and read-write behavior data corresponding to memory modules in the electronic equipment, wherein the read-write behavior data comprise current data storage amount, accumulated read-write data amount and a plurality of system available bandwidths which are collected every preset time in a preset time period, determining corresponding migration time based on the current data storage amount and the plurality of system available bandwidths corresponding to the memory modules, extracting space-time distribution characteristic information capable of correcting errors from the corresponding memory error logs based on the migration time corresponding to the memory modules, and generating memory replacement strategies corresponding to all the memory modules in the electronic equipment based on the space-time distribution characteristic information corresponding to the memory modules and the accumulated read-write data amount.
Inventors
- ZHANG XINYI
- ZHANG CHUANG
- CHEN HAO
Assignees
- 联想(北京)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260122
Claims (10)
- 1. A method for determining a memory replacement policy, the method comprising: The method comprises the steps of obtaining memory error logs and read-write behavior data corresponding to memory modules in electronic equipment, wherein the read-write behavior data comprise current data storage amount, accumulated read-write data amount and a plurality of system available bandwidths acquired at intervals of preset time in a preset time period; Determining a corresponding migration duration based on the current data storage amount and the available bandwidths of the systems corresponding to each memory module; based on the migration time length corresponding to each memory module, extracting space-time distribution characteristic information of correctable errors from the corresponding memory error log; And generating memory replacement strategies corresponding to all memory modules in the electronic equipment based on the space-time distribution characteristic information corresponding to each memory module and the accumulated read-write data quantity.
- 2. The method of claim 1 wherein the memory error log comprises at least one of a timestamp of an occurrence of an error, a type of error, a physical address and a hierarchical location of the occurrence of the error, error bit information, and an error count.
- 3. The method for determining a memory replacement policy according to claim 1, wherein the determining, based on the current data storage amount and the available bandwidths of the plurality of systems corresponding to each memory module, a corresponding migration duration includes: determining a corresponding target available bandwidth based on the available bandwidths of the multiple systems corresponding to each memory module; And under the target available bandwidth, migrating the duration corresponding to the current data storage amount, and determining the duration as the migration duration.
- 4. The method of claim 1, wherein the spatio-temporal distribution characteristic information comprises at least one of temporal characteristic information, spatial characteristic information, and binary characteristic information.
- 5. The method for determining a memory replacement policy according to claim 4, wherein extracting the spatio-temporal distribution feature information of the correctable errors from the corresponding memory error log based on the migration duration corresponding to each memory module includes: determining a target time window for extracting the space-time distribution characteristic information based on the migration duration; Dividing the target time window into a plurality of time windows according to a first time length, wherein the first time length is any time length in the plurality of time lengths; Determining the repeated occurrence times of the correctable errors at the same space positions of a plurality of memory chips included in each memory module according to different address levels by utilizing corresponding memory error logs in each time window, wherein the address levels comprise at least one of memory units, rows, columns and memory banks; For each address hierarchy, determining the number of times of the maximum repetition number in the plurality of time windows as the corresponding maximum repetition number, and determining the maximum repetition number corresponding to different address hierarchies as the repetition space feature information corresponding to the first time length; And determining the repeated spatial characteristic information corresponding to the time lengths as the spatial characteristic information.
- 6. The method for determining a memory replacement policy according to claim 4, wherein extracting the spatio-temporal distribution feature information of the correctable errors from the corresponding memory error log based on the migration duration corresponding to each memory module includes: determining a target time window for extracting the space-time distribution characteristic information based on the migration duration; Dividing the target time window into a plurality of time windows according to a first time length, wherein the first time length is any one of a plurality of second time lengths; Determining the occurrence times corresponding to the occurrence of correctable errors in each time window in the plurality of time windows, and determining the maximum occurrence times in the plurality of time windows as error burstiness characteristic information corresponding to the first time length; determining average time intervals of adjacent correctable errors in each time window of the plurality of time windows, and determining the minimum average time interval of the plurality of time windows as error interval characteristic information corresponding to the first time length; and determining the error burst characteristic information and the error interval characteristic information corresponding to different time lengths in the time lengths as the time characteristic information.
- 7. The method for determining a memory replacement policy according to claim 4, wherein extracting the spatio-temporal distribution feature information of the correctable errors from the corresponding memory error log based on the migration duration corresponding to each memory module includes: determining a target time window for extracting the space-time distribution characteristic information based on the migration duration; Determining a bit distribution type corresponding to each occurrence of a correctable error in the target time window based on the memory error log corresponding to each memory module; Determining the number of error occurrences corresponding to different bit distribution types based on at least one bit distribution type of correctable errors occurring in each memory module; and determining the occurrence times of errors corresponding to different bit distribution types as the binary bit characteristic information corresponding to each memory module.
- 8. The method for determining a memory replacement policy according to any one of claims 1 to 7, wherein the generating a memory replacement policy corresponding to all memory modules in the electronic device based on the spatiotemporal distribution feature information corresponding to each memory module and the accumulated read-write data amount includes: Determining a first corresponding evaluation factor for occurrence of correctable errors based on the space-time distribution characteristic information corresponding to each memory module; determining a corresponding second evaluation factor with correctable errors based on the accumulated read-write data amount corresponding to each memory module and the total access times corresponding to the memory modules in the electronic equipment; And generating the memory replacement strategy based on the first evaluation factor and the second evaluation factor corresponding to each memory module.
- 9. The method for determining a memory replacement policy according to any one of claims 1 to 7, after the generating the memory replacement policies corresponding to all memory modules in the electronic device, the method further comprises: Generating a replacement instruction corresponding to the memory module to be replaced based on the memory replacement strategy; And outputting the replacement instruction corresponding to the memory module to be replaced so as to replace the memory based on the replacement instruction.
- 10. An electronic device, the electronic device comprising: a memory configured to store a computer program executable on the processor; The processor is configured to obtain memory error logs and read-write behavior data corresponding to memory modules in the electronic equipment when executing the computer program, wherein the read-write behavior data comprise current data storage amount, accumulated read-write data amount and a plurality of system available bandwidths acquired at intervals of preset time in a preset time period, determine corresponding migration time based on the current data storage amount and the plurality of system available bandwidths corresponding to the memory modules, extract space-time distribution characteristic information capable of correcting errors from the corresponding memory error logs based on the migration time corresponding to the memory modules, and generate memory replacement strategies corresponding to all the memory modules in the electronic equipment based on the space-time distribution characteristic information and the accumulated read-write data amount corresponding to the memory modules.
Description
Memory replacement policy determining method and electronic equipment Technical Field The present application relates to the field of memory management technologies, and in particular, to a method for determining a memory replacement policy and an electronic device. Background In a computer system, memory is one of the key hardware components, and its reliability directly affects the stability and data processing capability of the system. With the expansion of server size and the increase of application complexity, memory failures such as uncorrectable errors (Uncorrectable Errors, UE) have become an important factor leading to node failures, service interruption and even cluster instability. Therefore, how to predict the memory failure in advance and formulate a reasonable replacement strategy becomes a key problem for improving the usability of the system and reducing the operation and maintenance cost. In the related art, one typical method is to divide a time window based on historical error data and extract statistical features for prediction by counting bitmap modes of correctable errors (Correctable Errors, CE) and combining expert rules to judge whether the UE will occur in the future. Although the potential fault risks can be identified to a certain extent, the method has obvious limitations, on one hand, the method has poor adaptability to large-scale heterogeneous platforms, and is difficult to accurately reflect the real fault trend under different platforms, and on the other hand, the predicted fault risk has larger difference from the actually-occurring risk, and the prediction accuracy is poor. Disclosure of Invention The embodiment of the application provides a memory replacement strategy determining method and electronic equipment. The technical scheme of the embodiment of the application is realized as follows: The embodiment of the application provides a method for determining a memory replacement strategy, which comprises the following steps: The method comprises the steps of obtaining memory error logs and read-write behavior data corresponding to memory modules in electronic equipment, wherein the read-write behavior data comprise current data storage amount, accumulated read-write data amount and a plurality of system available bandwidths acquired at intervals of preset time in a preset time period; determining a corresponding migration duration based on the current data storage amount and the available bandwidths of the systems corresponding to each memory module; Based on the migration time length corresponding to each memory module, extracting space-time distribution characteristic information of correctable errors from the corresponding memory error logs; Based on the space-time distribution characteristic information and the accumulated read-write data quantity corresponding to each memory module, memory replacement strategies corresponding to all the memory modules in the electronic equipment are generated. The embodiment of the application provides electronic equipment, which comprises: a memory configured to store a computer program executable on the processor; The processor is configured to obtain memory error logs and read-write behavior data corresponding to memory modules in the electronic equipment when executing the computer program, wherein the read-write behavior data comprise current data storage amount, accumulated read-write data amount and a plurality of system available bandwidths acquired at intervals of preset time in a preset time period, determine corresponding migration duration based on the current data storage amount and the plurality of system available bandwidths corresponding to the memory modules, extract space-time distribution characteristic information capable of correcting errors from the corresponding memory error logs based on the migration duration corresponding to the memory modules, and generate memory replacement strategies corresponding to all the memory modules in the electronic equipment based on the space-time distribution characteristic information corresponding to the memory modules and the accumulated read-write data amount. The embodiment of the application provides a computer readable storage medium, which stores a computer program or computer executable instructions for implementing the method for determining the memory replacement policy provided by the embodiment of the application when being executed by a processor. The embodiment of the application provides a computer program product, which comprises a computer program or a computer executable instruction, and when the computer program or the computer executable instruction is executed by a processor, the method for determining the memory replacement strategy provided by the embodiment of the application is realized. Drawings FIG. 1 is a flow chart of a method for determining a memory replacement policy according to an embodiment of the present application; FIG. 2 is a schematic