Search

CN-121996713-A - Method, device, equipment and storage medium for mining modes of critical path data

CN121996713ACN 121996713 ACN121996713 ACN 121996713ACN-121996713-A

Abstract

The application provides a mode mining method, device, equipment and storage medium of critical path data, and relates to the data mining technology. The method comprises the steps of generating a candidate mode set, determining a candidate mode which is larger than or equal to a preset minimum support threshold value in a plurality of candidate modes in the candidate mode set based on a plurality of sequences in a sequence database, taking the candidate mode which is larger than or equal to the preset minimum support threshold value as a high support mode, generating the high support mode set, and stopping high support mode mining if the high support mode set is empty or the second length is larger than the preset maximum mode length. The method solves the technical problem of lower accuracy of pattern mining of the key path of the processor microstructure related diagram.

Inventors

  • ZHENG YAWEN
  • Han Chenji
  • WANG WENXIANG

Assignees

  • 龙芯中科技术股份有限公司

Dates

Publication Date
20260508
Application Date
20251224

Claims (20)

  1. 1. A method for pattern mining of critical path data, comprising: the method comprises the steps of obtaining an analysis processing request, wherein the analysis processing request comprises a sequence database identifier, and reading a sequence database corresponding to the sequence database identifier according to the analysis processing request; Generating a candidate mode set, wherein the candidate mode set comprises a plurality of candidate modes, and the candidate mode set is of a preset first length; Determining a candidate mode which is larger than or equal to a preset minimum support threshold value in a plurality of candidate modes in the candidate mode set based on a plurality of sequences in the sequence database, and taking the candidate mode which is larger than or equal to the preset minimum support threshold value as a high support mode to generate a high support mode set, wherein the second length of the high support mode set is equal to the first length of the candidate mode set; and when the second length is greater than the preset maximum mode length, outputting a high-support mode with the mode length in the high-support mode set being in a preset mode length interval, wherein the high-support mode set represents critical path data of a processor microstructure related graph.
  2. 2. The method of claim 1, wherein the determining a candidate pattern that is greater than or equal to a preset minimum support threshold among the plurality of candidate patterns in the candidate pattern set based on the plurality of sequences in the sequence database, and the generating a high support pattern set using the candidate pattern that is greater than or equal to the preset minimum support threshold as a high support pattern, comprises: Sequentially reading a plurality of candidate modes in the candidate mode set; For each candidate mode, searching for an occurrence corresponding to the candidate mode in each sequence in a plurality of sequences in the sequence database according to preset mining condition information; determining the corresponding database support of the candidate mode according to the occurrence corresponding to the candidate mode; If the database support degree of the corresponding candidate mode is larger than or equal to a preset minimum support degree threshold, determining that the corresponding candidate mode is a high support degree mode, and generating a high support degree mode set according to the high support degree mode.
  3. 3. The method of claim 2, wherein the first length of the candidate pattern set is i+1, the third length of each candidate pattern is m, wherein i has an initial value of 1, the third length m = i+1, and the fourth length of the sequences in the sequence database is n.
  4. 4. A method according to claim 3, wherein for each candidate pattern, searching each sequence of the plurality of sequences in the sequence database for an occurrence corresponding to the candidate pattern according to preset mining condition information, comprising: For each candidate pattern with the third length of m, taking a first character of each sequence in a plurality of sequences in the sequence database as a searching starting point, taking an n-m+1 character of each sequence as an end point, and respectively matching each character from the first character to the n-m+1 character with a corresponding sequence of characters in the candidate pattern to obtain matching result information; If the characters of the matching result information representing the corresponding sequence are the same, determining that a character string consisting of the first character to the n-m+1th character in the sequence is the occurrence corresponding to the candidate mode; And according to preset mining condition information, continuing to search the sequence for the occurrence of the matching with the candidate pattern until the last character in the sequence is searched, and stopping searching.
  5. 5. The method of claim 4, wherein the predetermined mining condition information comprises a one-time condition, or a non-overlapping condition, wherein, The sequence comprises a plurality of edges for indicating the critical path, the one-time condition is used for indicating that any edge in the critical path represented by the sequence can only be used once at most, and the non-overlapping condition is used for indicating that the edge at the same position in the critical path represented by the sequence cannot be reused at the same position in the candidate mode and can be reused at different positions in the candidate mode.
  6. 6. The method of claim 2, wherein determining the database support for the respective candidate pattern based on the occurrence corresponding to the candidate pattern comprises: determining the average utility value of the candidate mode in each occurrence for each sequence, and accumulating the average utility value of each occurrence to obtain the sequence support of the sequence; And multiplying the sequence support of each sequence by a preset longitudinal weight value to obtain a plurality of products, accumulating the products, and generating the corresponding database support of the candidate mode.
  7. 7. The method of claim 4, wherein if the database support of the corresponding candidate pattern is determined to be greater than or equal to a preset minimum support threshold, determining that the corresponding candidate pattern is a high support pattern, and generating a high support pattern set according to the high support pattern, comprises: If the database support degree of the corresponding candidate mode is larger than or equal to a preset minimum support degree threshold, determining that the corresponding candidate mode is a high support degree mode, and adding the high support degree mode into a preset high support degree mode set with a second length of i+1.
  8. 8. The method of claim 7, wherein the method further comprises: if the high-support-degree mode set is not empty or the second length is smaller than or equal to the preset maximum mode length, repeating the following steps until a high-support-degree mode with the mode length in the high-support-degree mode set being in a preset mode length interval is output; Sequentially taking out a plurality of high-support modes in the high-support mode set; Determining a prefix and a suffix of each of a plurality of the high support modes; Traversing and searching other high-support modes except the current high-support mode according to a preset sequence from left to right, and if prefixes of other high-support modes which are equal to the suffixes of the current high-support mode are searched, splicing the current high-support mode and the other high-support modes to obtain a candidate mode with a third length of i+2; And a candidate pattern step of adding the third candidate pattern with the length of i+2 to the first candidate pattern set with the length of i+2, and determining a candidate pattern which is greater than or equal to a preset minimum support threshold value in a plurality of candidate patterns in the candidate pattern set based on a plurality of sequences in the sequence database.
  9. 9. The method according to any one of claims 1-8, further comprising: and sending out prompt information, wherein the prompt information is used for indicating that the high-support mode mining is completed.
  10. 10. A pattern mining apparatus for critical path data, comprising: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an analysis processing request, and the analysis processing request comprises a sequence database identifier; The reading unit is used for reading a sequence database corresponding to the sequence database identifier according to the analysis processing request; The device comprises a first generation unit, a second generation unit and a first generation unit, wherein the first generation unit is used for generating a candidate mode set, the candidate mode set comprises a plurality of candidate modes, and the candidate mode set is of a preset first length; A second generating unit, configured to determine, based on a plurality of sequences in the sequence database, a candidate pattern that is greater than or equal to a preset minimum support threshold among a plurality of candidate patterns in the candidate pattern set, and generate a high support pattern set with the candidate pattern that is greater than or equal to the preset minimum support threshold as a high support pattern, where a second length of the high support pattern set is equal to a first length of the candidate pattern set; And the output unit is used for stopping the high-support mode mining if the high-support mode set is empty or the second length is larger than the preset maximum mode length, and outputting the high-support mode with the mode length in the high-support mode set being in the preset mode length interval when the second length is larger than the preset maximum mode length, wherein the high-support mode set represents the critical path data of the processor microstructure related graph.
  11. 11. The apparatus of claim 10, wherein the second generating unit comprises: the reading-in module is used for sequentially reading in a plurality of candidate modes in the candidate mode set; The searching module is used for searching the occurrence corresponding to each candidate mode in a plurality of sequences in the sequence database according to preset mining condition information; The determining module is used for determining the corresponding database support degree of the candidate mode according to the occurrence corresponding to the candidate mode; And the generation module is used for determining the corresponding candidate mode as a high-support mode if the database support degree of the corresponding candidate mode is greater than or equal to a preset minimum support degree threshold value, and generating a high-support degree mode set according to the high-support degree mode.
  12. 12. The apparatus of claim 11, wherein the first length of the set of candidate patterns is i+1, the third length of each candidate pattern is m, wherein i has an initial value of 1, the third length m = i+1, and the fourth length of sequences in the sequence database is n.
  13. 13. The apparatus of claim 12, wherein the lookup module comprises: the matching sub-module is used for matching each character from the first character to the n-m+1 character with each character in the corresponding sequence in the candidate mode by taking the first character of each sequence in a plurality of sequences in the sequence database as a searching starting point and taking the n-m+1 character of each sequence as an ending point to obtain matching result information; A determining submodule, configured to determine that a string consisting of the first to n-m+1th characters in the sequence is an occurrence corresponding to the candidate pattern if it is determined that the characters of the matching result information representing the corresponding sequence are all the same; And the searching sub-module is used for continuously searching the occurrence of the matching with the candidate pattern in the sequence according to the preset mining condition information until the last character in the sequence is searched, and stopping searching.
  14. 14. The apparatus of claim 13, wherein the predetermined mining condition information comprises a one-time condition, or a non-overlapping condition, wherein, The sequence comprises a plurality of edges for indicating the critical path, the one-time condition is used for indicating that any edge in the critical path represented by the sequence can only be used once at most, and the non-overlapping condition is used for indicating that the edge at the same position in the critical path represented by the sequence cannot be reused at the same position in the candidate mode and can be reused at different positions in the candidate mode.
  15. 15. The apparatus of claim 11, wherein the determining module comprises: The first accumulation sub-module is used for determining the average utility value of the candidate mode in each occurrence for each sequence, and accumulating the average utility value in each occurrence to obtain the sequence support of the sequence; and the second accumulation sub-module is used for multiplying the sequence support of each sequence by a preset longitudinal weight value to obtain a plurality of products, accumulating the products and generating the corresponding database support of the candidate mode.
  16. 16. The apparatus of claim 13, wherein the generating module is specifically configured to: If the database support degree of the corresponding candidate mode is larger than or equal to a preset minimum support degree threshold, determining that the corresponding candidate mode is a high support degree mode, and adding the high support degree mode into a preset high support degree mode set with a second length of i+1.
  17. 17. The apparatus of claim 16, wherein the apparatus further comprises: A first determining unit, configured to repeat the following steps if it is determined that the high support mode set is not empty, or the second length is less than or equal to a preset maximum mode length, until a high support mode in which the mode length in the high support mode set is located in a preset mode length interval is output; a retrieving unit, configured to sequentially retrieve a plurality of high-support modes in the high-support mode set; A second determining unit configured to determine a prefix and a suffix of each of the plurality of high support modes; the splicing unit is used for traversing and searching other high-support modes except the current high-support mode according to a preset sequence from left to right, and splicing the current high-support mode and the other high-support modes if prefixes of the other high-support modes which are equal to suffixes of the current high-support mode are searched, so that a candidate mode with a third length of i+2 is obtained; An adding unit, configured to add the third candidate pattern with the length i+2 to the first candidate pattern set with the length i+2, and perform a candidate pattern step of determining, based on a plurality of sequences in the sequence database, a minimum support threshold value that is greater than or equal to a preset minimum support threshold value among a plurality of candidate patterns in the candidate pattern set.
  18. 18. The apparatus according to any one of claims 10-17, wherein the apparatus further comprises: the prompt unit is used for sending prompt information, wherein the prompt information is used for indicating that the high-support mode mining is completed.
  19. 19. An electronic device comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, the processor implementing the method of any of the preceding claims 1-9 when the computer program is executed.
  20. 20. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-9.

Description

Method, device, equipment and storage medium for mining modes of critical path data Technical Field The present application relates to data mining technologies, and in particular, to a method, an apparatus, a device, and a storage medium for pattern mining of critical path data. Background At present, as the scale of a program operated by a processor is continuously improved, the scale of a benchmark test set for the performance analysis of the microstructure of the processor is also continuously increased, and the performance bottleneck analysis is challenged. In the prior art, a processor microstructure is modeled as a correlation diagram model, and key path data of dynamic running of a program is acquired for analysis. During analysis, known performance events which are designated in advance are counted from the critical path data, and the information such as the duty ratio (the most critical event is found), the interrelation (whether a plurality of performance events need to be optimized simultaneously or not) and the like of the performance events are analyzed. But techniques for processor microstructure performance optimization include not only optimization for a single performance event, but also a combined optimization scheme of critical events and events on their associated chains. However, in the prior art, since the technology of optimizing the performance of the processor microstructure further includes a combination optimizing scheme of the key event and the event on the related chain thereof, that is, useful knowledge required by reasonable formulation of the combination optimizing scheme is that the existing processor microstructure runs the existing application program and has some significant related chain events, but research usually only carries out related chain analysis on a certain performance event, or the combination optimizing scheme is formulated according to experience of a designer, and statistical analysis is not carried out on the related chain events, so that the key path data used in the pattern mining process is single, and the pattern mining accuracy of the key path of the processor microstructure related graph is lower. Disclosure of Invention The application provides a method, a device, equipment and a storage medium for pattern mining of critical path data, which are used for solving the technical problem of low accuracy of pattern mining of a critical path of a processor microstructure related diagram. In a first aspect, the present application provides a method for pattern mining of critical path data, including: the method comprises the steps of obtaining an analysis processing request, wherein the analysis processing request comprises a sequence database identifier, and reading a sequence database corresponding to the sequence database identifier according to the analysis processing request; Generating a candidate mode set, wherein the candidate mode set comprises a plurality of candidate modes, and the candidate mode set is of a preset first length; Determining a candidate mode which is larger than or equal to a preset minimum support threshold value in a plurality of candidate modes in the candidate mode set based on a plurality of sequences in the sequence database, and taking the candidate mode which is larger than or equal to the preset minimum support threshold value as a high support mode to generate a high support mode set, wherein the second length of the high support mode set is equal to the first length of the candidate mode set; and when the second length is greater than the preset maximum mode length, outputting a high-support mode with the mode length in the high-support mode set being in a preset mode length interval, wherein the high-support mode set represents critical path data of a processor microstructure related graph. Further, the method for generating a high support degree mode set by using the candidate modes which are larger than or equal to the preset minimum support degree threshold value as the high support degree modes comprises the following steps: Sequentially reading a plurality of candidate modes in the candidate mode set; For each candidate mode, searching for an occurrence corresponding to the candidate mode in each sequence in a plurality of sequences in the sequence database according to preset mining condition information; determining the corresponding database support of the candidate mode according to the occurrence corresponding to the candidate mode; If the database support degree of the corresponding candidate mode is larger than or equal to a preset minimum support degree threshold, determining that the corresponding candidate mode is a high support degree mode, and generating a high support degree mode set according to the high support degree mode. Further, the first length of the candidate pattern set is i+1, the third length of each candidate pattern is m, wherein the initial value of i is 1, the third length m=i+1, and