CN-116756662-B - Yield prediction method and system for optimizing random forest based on Harris eagle algorithm

CN116756662BCN 116756662 BCN116756662 BCN 116756662BCN-116756662-B

Abstract

The invention discloses a yield prediction method and a system for optimizing a random forest based on a Harris eagle algorithm, wherein the method comprises the steps of obtaining a historical wafer acceptance test data set; preprocessing acceptable test data in a historical wafer acceptable test data set to obtain an acceptable test sample set, iteratively optimizing key frame parameters of a random forest model through a Harris eagle algorithm, modifying the random forest model based on the key frame parameters to construct a wafer yield prediction pre-training model, training the wafer yield test pre-training model based on the acceptable test sample set to obtain a wafer yield prediction model, and inputting acceptable test data to be tested into the wafer yield prediction model to obtain a prediction result. The invention can reduce the yield prediction cost and improve the accuracy of yield prediction.

Inventors

CHEN YINING
WANG SHIQI
CAI YU
GAO DAWEI

Assignees

浙江大学

Dates

Publication Date: 20260512
Application Date: 20230625

Claims (10)

1. The yield prediction method for optimizing the random forest based on the Harris eagle algorithm is characterized by comprising the following steps of: acquiring a historical wafer acceptance test data set; preprocessing the acceptable test data in the historical wafer acceptable test data set to obtain an acceptable test sample set; Iterative optimization of key frame parameters of the random forest model is carried out through a Harriset algorithm, the random forest model is modified based on the key frame parameters, and a wafer yield prediction pre-training model is constructed; Training the wafer yield test pre-training model based on the allowed test sample set to obtain a wafer yield prediction model; Inputting the to-be-detected acceptable test data into a wafer yield prediction model to obtain a prediction result; wherein, the key framework parameters of the random forest model are iteratively optimized through the Harris eagle algorithm and the random forest model is modified based on the key framework parameters, the method comprises the following steps: Presetting parameters of a Harriset algorithm, wherein the parameters comprise population quantity, maximum iteration times, variation ranges and dimensions of hawk groups and prey positions; Setting key frame parameters for iteratively optimizing a random forest algorithm through a Harris eagle algorithm, wherein the key frame parameters at least comprise the number of the maximum weak learners, the maximum depth in decision tree parameters, the maximum feature number, the minimum sample number required by internal node subdivision and the minimum sample number of leaf nodes; Respectively initializing the positions of eagle groups and prey in a Harris eagle algorithm, selecting classification errors of a random forest algorithm as fitness functions, selecting different position updating strategies to update the position vector according to whether the prey is found, the energy state of the prey and the size of the fitness functions in each iteration, and completing optimization when the iteration times reach a preset maximum iteration times, wherein the position vector after the completion of the iteration is the optimized key frame parameter; Wherein the positions of the eagle group and the prey are respectively expressed as 1 And (3) a position vector with dim size, wherein dim represents a dimension, the dimension represents the number of parameters to be optimized, the position vector is updated in each iteration process, the position vector is formed by the key frame parameters, and the position vector is expressed as follows: Wherein X rb (t) represents the position vector at the time of the t-th iteration, t represents the t-th iteration, X [0] represents the maximum number of weak learners, X [1] represents the maximum depth in the decision tree parameters, X [2] represents the maximum number of features, X [3] represents the minimum number of samples required for internal node subdivision, and X [4] represents the minimum number of samples of leaf nodes.
2. The method for predicting yield of an optimized random forest based on a hawk algorithm according to claim 1, wherein the preprocessing at least comprises one or more of outlier detection and processing, missing value processing and data normalization processing.
3. The method for predicting the yield of the optimized random forest based on the Harris hawk algorithm according to claim 2, wherein the missing value processing comprises missing value deletion or missing value filling; the missing value filling includes: And filling the missing values based on the average value, the mode and the median, or establishing a prediction model to fill the missing values.
4. The method for predicting yield of an optimized random forest based on a hawk algorithm according to claim 1, wherein the data normalization process comprises a standard normalization process or a maximum minimum normalization process.
5. The method for predicting yield of an optimized random forest based on a Harris eagle algorithm as claimed in claim 1, wherein the fitness function is expressed as follows: F=1-aucscore wherein aucscore denotes an evaluation index.
6. The method for predicting yield of an optimized random forest based on a harris eagle algorithm according to claim 1, wherein the harris eagle algorithm comprises the following steps: before pursuing, the escape energy of the prey is determined according to the following formula: Wherein E represents escape energy of the prey, T represents a T-th iteration, T represents a set maximum iteration number, and E0 represents a random number between (-1, 1); when the escaping energy |E| of the hunting object is more than or equal to 1, the hunting object escaping energy is considered to be higher, the Harris eagle algorithm considers that the hunting object is full of physical force, the eagle group flies in a large range to search for the hunting object, a random number q is generated for the hunting object found and the hunting object not found, and the position X (t+1) of the next iteration is updated according to different strategies selected according to the size of q; When q is more than or equal to 0.5, any individual in the eagle group does not find the position of the prey, and randomly selecting the flying position of any individual in the eagle group to update the position according to the following formula: wherein X rd (t) represents the position of one eagle randomly selected from the eagle group at the t-th iteration, and r1 and r2 are random numbers between (0 and 1); When q <0.5, the eagle group found a prey, the individuals in the eagle group coiled around them and updated the position, the position update was calculated according to the following formula: Wherein X rb (t) represents the position of the prey individual at the t iteration, X m (t) represents the average position of the population in the eagle crowd at the t iteration, r3, r4 represents the random number in (0, 1), lb and Ub represent the upper and lower bounds of the position respectively, wherein X m (t) is calculated according to the following formula: Wherein N represents the population number, xi (t) represents the position of the population at the t-th iteration; When the escape energy of the hunting object is |E| <1, entering a transition stage from exploration to development, and transferring the Harrison eagle algorithm from exploration to development stage according to the escape energy of the hunting object, and selecting different strategies for surging; When the development stage is entered, four different strategies are adopted for the eagle group to attack the prey, and R is the probability of the prey to successfully escape; When the I E I is more than or equal to 0.5 and R is more than or equal to 0.5, the eagle group initiates a soft attack on the prey, and the position update is calculated according to the following formula: Wherein X rb (t) represents the position of the individual hunting at the t iteration, X (t) represents the position of the individual hunting at the t iteration, J represents the random jump strength of the hunting, and J is calculated according to the following formula: wherein r5 is a random number between (0, 1); when |E| <0.5 and R is more than or equal to 0.5, the eagle group initiates a hard attack, and the position update is calculated according to the following formula: wherein X rb (t) represents the position of the hunting subject at the t iteration, and X (t) represents the position of the subject in the eagle group at the t iteration; when |E| is more than or equal to 0.5 and R <0.5, the eagle group initiates progressive diving soft enclosing attack, and the position update is calculated according to the following formula: Wherein X rb (t) represents the position of the individual hunting at the t iteration, X (t) represents the position of the individual hunting at the t iteration, J represents the random jump strength of hunting, F represents the fitness function, S represents 1 D-dimensional random vectors, LF (D) representing the Levy function; when |e| <0.5 and R <0.5, the eagle group initiates a progressive dive hard attack, and the position update is calculated according to the following formula: wherein X rb (t) represents the position of the prey individual at the t iteration, X m (t) represents the average position of the population in the eagle crowd at the t iteration, J represents the random jump strength of the prey, F represents the fitness function, S represents 1 D-dimensional random vectors, LF (D) represents the Levy function.
7. The method for predicting the yield of the optimized random forest based on the Harris eagle algorithm as claimed in claim 1, wherein the wafer yield test pre-training model is trained based on a allowed test sample set, and the method comprises the following steps: Dividing the acceptable test sample set into a training data set, a test data set and a verification data set; Training the wafer yield test pre-training model based on the training data set, and evaluating the wafer yield prediction model after each training based on the verification data set to finally obtain a wafer yield prediction model; and testing the wafer yield prediction model based on the test data set, and outputting a wafer yield prediction result of the sample in the test data set.
8. The yield prediction system for optimizing the random forest based on the Harris eagle algorithm is characterized by comprising a data processing module, a model training module and a prediction evaluation module; the data processing module is used for acquiring a history wafer acceptable test data set, preprocessing acceptable test data in the history wafer acceptable test data set, and obtaining an acceptable test sample set; the model training module is used for iteratively optimizing key frame parameters of the random forest model through a Harris eagle algorithm, modifying the random forest model based on the key frame parameters and constructing a wafer yield prediction pre-training model; The prediction evaluation module is used for inputting the to-be-detected acceptable test data into the wafer yield prediction model to obtain a prediction result; wherein, the key framework parameters of the random forest model are iteratively optimized through the Harris eagle algorithm and the random forest model is modified based on the key framework parameters, the method comprises the following steps: Presetting parameters of a Harriset algorithm, wherein the parameters comprise population quantity, maximum iteration times, variation ranges and dimensions of hawk groups and prey positions; Setting key frame parameters for iteratively optimizing a random forest algorithm through a Harris eagle algorithm, wherein the key frame parameters at least comprise the number of the maximum weak learners, the maximum depth in decision tree parameters, the maximum feature number, the minimum sample number required by internal node subdivision and the minimum sample number of leaf nodes; Respectively initializing the positions of eagle groups and prey in a Harris eagle algorithm, selecting classification errors of a random forest algorithm as fitness functions, selecting different position updating strategies to update the position vector according to whether the prey is found, the energy state of the prey and the size of the fitness functions in each iteration, and completing optimization when the iteration times reach a preset maximum iteration times, wherein the position vector after the completion of the iteration is the optimized key frame parameter; Wherein the positions of the eagle group and the prey are respectively expressed as 1 And (3) a position vector with dim size, wherein dim represents a dimension, the dimension represents the number of parameters to be optimized, the position vector is updated in each iteration process, the position vector is formed by the key frame parameters, and the position vector is expressed as follows: Wherein X rb (t) represents the position vector at the time of the t-th iteration, t represents the t-th iteration, X [0] represents the maximum number of weak learners, X [1] represents the maximum depth in the decision tree parameters, X [2] represents the maximum number of features, X [3] represents the minimum number of samples required for internal node subdivision, and X [4] represents the minimum number of samples of leaf nodes.
9. A yield prediction device for optimizing random forests based on a harris eagle algorithm, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing computer instructions, characterized in that the computer readable storage medium stores instructions for performing the method of any one of claims 1 to 7.

Description

Yield prediction method and system for optimizing random forest based on Harris eagle algorithm Technical Field The invention relates to the technical field of big data algorithms, in particular to a yield prediction method and a system for optimizing random forests based on a Harris eagle algorithm. Background In the integrated circuit manufacturing process, process parameters are key factors influencing the performance of the chip, and the chip performance is reduced due to the abnormality of the process parameters, so that yield loss is caused. In order to comprehensively understand and monitor the influence of the manufacturing process parameters, a great amount of tests are often performed in the wafer manufacturing process, so that the process parameters in the manufacturing process are effectively controlled and improved, the wafer yield is improved, and the chip cost is reduced. Wafer Acceptance Test (WAT), also known as electrical Test (E-Test), is an important wafer Test method, which mainly includes manufacturing some special Test results in the scribe line space of a wafer, testing the electrical parameters of chips on the wafer, and collecting the electrical characteristic parameter data of the wafer to indirectly reflect the process parameters. The wafer acceptance test data is analyzed, and poor process parameters in the IC manufacturing process can be improved in a targeted manner. In the prior art, wafer acceptance testing belongs to a mainstream testing mode, but how to effectively process wafer acceptance testing data and reasonably utilize the data is always a difficult problem in the technical field. Secondly, the existing algorithm model has various problems of low accuracy, small data processing amount and the like, and needs to be overcome. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a yield prediction method and a system for optimizing a random forest based on a Harris eagle algorithm. In order to solve the technical problems, the invention is solved by the following technical scheme: A yield prediction method for optimizing random forests based on Harris eagle algorithm comprises the following steps: acquiring a historical wafer acceptance test data set; preprocessing the acceptable test data in the historical wafer acceptable test data set to obtain an acceptable test sample set; Iterative optimization of key frame parameters of the random forest model is carried out through a Harriset algorithm, the random forest model is modified based on the key frame parameters, and a wafer yield prediction pre-training model is constructed; Training the wafer yield test pre-training model based on the allowed test sample set to obtain a wafer yield prediction model; And inputting the to-be-detected acceptable test data into a wafer yield prediction model to obtain a prediction result. As an embodiment, the preprocessing at least includes one or more of outlier detection and processing, missing value processing, and data normalization processing. As an embodiment, the missing value processing includes missing value deletion or missing value filling; the missing value filling includes: And filling the missing values based on the average value, the mode and the median, or establishing a prediction model to fill the missing values. As an embodiment, the data normalization process includes a standard normalization process or a maximum minimum normalization process. As an implementation manner, the iterative optimization of the key frame parameters of the random forest model by the harris eagle algorithm and the modification of the random forest model based on the key frame parameters comprise the following steps: Presetting parameters of a Harriset algorithm, wherein the parameters comprise population quantity, maximum iteration times, variation ranges and dimensions of hawk groups and prey positions; Setting key frame parameters for iteratively optimizing a random forest algorithm through a Harris eagle algorithm, wherein the key frame parameters at least comprise the number of the maximum weak learners, the maximum depth in decision tree parameters, the maximum feature number, the minimum sample number required by internal node subdivision and the minimum sample number of leaf nodes; Respectively initializing the positions of eagle groups and prey in a Harris eagle algorithm, selecting classification errors of a random forest algorithm as fitness functions, selecting different position updating strategies to update the position vector according to whether the prey is found, the energy state of the prey and the size of the fitness functions in each iteration, and completing optimization when the iteration times reach a preset maximum iteration times, wherein the position vector after the completion of the iteration is the optimized key frame parameter; The positions of the eagle group and the prey are expressed as a position vector with the size of 1 x d