CN-117235544-B - Intermittent process mode division method for density weighted sum similar label distribution density peak clustering

CN117235544BCN 117235544 BCN117235544 BCN 117235544BCN-117235544-B

Abstract

The invention discloses an intermittent process mode division method for density weighting and similar label distribution density peak clustering, which firstly considers the unbalanced density distribution of intermittent process data samples, introduces a weight coefficient to adjust the local density of a low-density area data sample and acquires an intermittent process data mode center; and finally, constructing an allocation strategy of rest data sample points of the intermittent process to realize the modal division of the intermittent process. According to the invention, the initial mode center and the mode number of the intermittent process data are not required to be used as input parameters, the influence of unbalanced density distribution of the intermittent process data samples on the mode center selection is fully considered, the constructed distribution strategy of the rest data samples of the intermittent process avoids the distribution error of the mode labels, the mode division of the intermittent process can be realized, and the rationality of the mode division result of the intermittent process is improved.

Inventors

WANG JIANLIN
ZHOU XINJIE
LI JI
SUI ENGUANG

Assignees

北京化工大学

Dates

Publication Date: 20260512
Application Date: 20231029

Claims (4)

1. A method for intermittently processing modal partitioning of density weighted and similar label distribution density peak clustering is characterized in that: The method comprises the following steps: Step one, collecting batch intermittent process data, normalizing the process data, utilizing an introduced weight coefficient to adjust the local density of a data sample in a low-density area of the intermittent process, and calculating a decision value of each candidate modal center; the penicillin fermentation process is a typical multi-mode batch process, and 25 batches of process data are generated under different initial conditions and Gaussian noise by using a penicillin fermentation process simulation platform , The duration of each batch is 400 hours, the sampling interval is 1 hour, and penicillin fermentation process variables comprise ventilation rate, stirring power, substrate flow rate, substrate flow temperature, dissolved oxygen concentration, biomass concentration, penicillin concentration, reactor volume, carbon dioxide concentration, PH, reactor temperature, heat generation amount, acid adding flow rate, alkali adding flow rate, cooling water adding flow rate and heating water flow rate; Step two, determining the optimal mode number of the intermittent process through a defined mode evaluation index MEI, and acquiring a mode center of intermittent process data by utilizing a decision value; Step three, constructing an allocation strategy of residual data samples in the intermittent process, obtaining a division result under the optimal mode number, and completing mode division of the intermittent process; defining the density contribution degree of batch process data samples with different distances to the current batch process data sample according to the distance between batch process data samples Is that (3) Wherein j and h represent sampling point serial numbers of intermittent process data; For intermittent process data sample points And The Euclidean distance between the two is calculated, and the local density average value of all data in the intermittent process is calculated Will be Greater than Intermittent process data samples of (2) Less than As the upper limit of the weighting factor, the local density average of the intermittent process data samples is divided by Normalized density contribution to 1 and Weight coefficient is obtained Is that (4); In the formula, Representing intermittent process data samples For intermittent process data samples The degree of density contribution of (2); And Maximum and minimum for density contribution; recalculating local densities of intermittent process data samples using weight coefficients Is that (5); In the formula, Calculating local density average value of intermittent process data sample And standard deviation of relative distance Selecting a local density greater than Or a relative distance greater than As a candidate modal center set.
2. The intermittent process mode division method for density weighted sum similar label distribution density peak clustering according to claim 1, wherein the step one specifically comprises the following steps: Collecting batch process data for I batches I is the batch number, J is the variable number, K is the sampling point number, the sampling point number is averaged along the batch direction, and the average value of each variable is divided by the standard deviation to be standardized, so that an intermittent process modal division data set is obtained ; Calculating the local density of each data sample of the batch process according to the formulas (1) and (2) And relative distance Is that ; ; In the formula, Is a natural base number, and is used for the production of the natural base number, For intermittent process data sample points And A Euclidean distance between them; is a truncated distance parameter; And Representing batch process data sample points, respectively And Is a local density of (2); calculating relative distances of data samples in a collection Is that (6); In the formula, Distance between two intermittent process data samples farthest from each other in the set; Defining two intermittent process data samples with d dimensions And Similarity between ; (7); In the formula, For intermittent process data samples And A difference in dimension n; transforming equation (7) to obtain a distance calculation function of the batch process high-dimension data sample Is that (8); In the formula, Is a constant; Substituting the distance matrix among the intermittent process data samples into the formulas (5) and (6) to obtain the local density and the relative distance of the intermittent process data samples; Calculating decision value of each candidate modal center of intermittent process by using local density and relative distance Is that (9)。
3. The intermittent process mode division method for density weighted sum similar label distribution density peak clustering according to claim 1, wherein the step two is characterized by comprising the following steps: Arranging decision values of each candidate modal center of the intermittent process obtained in the step one in a descending order, and taking the first ten values to define a modal evaluation index Is that (10); (11); (12); In the formula, Representing the f decision value after the decision values are arranged in descending order; Representing the number of modal divisions of the batch process; a sequence number representing a decision value; And Respectively represent And Is a normalized result of (2); When (when) Corresponding to the minimum time I.e. the optimal number of modes before selection The data sample points corresponding to the decision values are the modal centers of the intermittent process.
4. The intermittent process mode division method for density weighted sum similar label distribution density peak clustering according to claim 1, wherein the third step specifically comprises the following steps: Defining similarity index between intermittent process data samples according to distance and local density between intermittent process data samples Is that (13); In the formula, Is a natural base number; For intermittent process data sample points And The distance between them; Representing intermittent process data sample points And A local density difference between; when distributing the rest data sample of the intermittent process, firstly distributing the sample points around the center of the mode obtained in the second step to the corresponding mode, and then using a certain smaller mode The field propagates the modal label outwards, when the number of unassigned sample points is unchanged, calculate the similarity index, distribute it to the mode that the similarity is the largest and the data sample of existing modal label belongs to; To ensure the timeliness of modal partitioning, computing a time constraint of cross-modal distribution sample points Is that (14); In the formula, Is a natural base number; intermittent process data sample points distributed in a cross-mode manner, wherein the sampling time is ; The mode centers before and after the sample point are represented, and the sampling time is ; The difference value between the cross-mode distribution point and the mode center point sampling moment is obtained; Will be Reassigning to The mode division of the intermittent process is completed, and the time sequence of the mode division result is ensured.

Description

Intermittent process mode division method for density weighted sum similar label distribution density peak clustering Technical Field The invention belongs to the technical field of intermittent process monitoring, and particularly relates to an intermittent process modal partitioning method for density weighted and similar Label distribution density peak clustering (WEIGHTED DESTINY AND SIMILARITY Label Allocation DENSITY PEAKS Clustering, WSDPC). Background Batch processes are an important production method in the modern industry and have been widely used in the fields of chemical industry, pharmacy, microelectronics, etc. The frequent operation change of the intermittent production process and the complex production process enable the intermittent production process to have multi-mode characteristics, reasonable division of modes of the intermittent process can provide a basis for mode modeling, and the improvement of modeling accuracy of the multi-mode intermittent process is promoted. The intermittent process mode division method based on density peak clustering selects a mode center by constructing a decision diagram, and distributes residual data samples according to local densities and relative distances of intermittent process data samples, so that mode division of the intermittent process is realized. However, the intermittent process mode division method based on density peak clustering does not consider unbalanced density distribution of intermittent process data samples, so that accuracy of mode center selection is reduced, and distribution strategies of residual data samples may cause transmission of error mode labels, so that rationality of mode division results is affected. Therefore, the density imbalance of the intermittent process data samples is fully considered, the intermittent process mode division method for density weighting and similar label distribution density peak clustering is invented, the local density of the low-density area data samples is adjusted through introducing a weight coefficient, the mode center is accurately selected, the distribution strategy of the intermittent process residual data samples is constructed by combining the relative distance, the local density and epsilon neighbors, and the rationality of the intermittent process mode division result is improved. Disclosure of Invention The invention aims to improve the rationality of an intermittent process mode division result, and provides an intermittent process mode division method for density weighting and similar label distribution density peak clustering, which comprises the following steps: Step one, collecting batch intermittent process data, normalizing the process data, utilizing an introduced weight coefficient to adjust the local density of a data sample in a low-density area of the intermittent process, and calculating a decision value of each candidate modal center; Step two, determining the optimal mode number of the intermittent process through defined mode evaluation indexes (Mode Evaluation Index, MEI), and acquiring a mode center of intermittent process data by utilizing a decision value; And thirdly, constructing an allocation strategy of residual data samples in the intermittent process, obtaining a division result under the optimal mode number, and realizing mode division of the intermittent process. The first step specifically comprises the following steps: Batch process data X (I X J X K) of I batches are collected, I is the batch number, J is the variable number, and K is the sampling point number. Taking the data differences of different batches of the intermittent process into consideration, averaging the data differences along the batch direction, and further respectively subtracting the average value of each process variable by the standard deviation to normalize the data to obtain an intermittent process modal division data set For a pair ofCalculating the local density ρ j and the relative distance δ j of each data sample of the batch process as according to equation (1) and equation (2) Where e is a natural base, d jh is the Euclidean distance between the intermittent process data sample points x j and x h, d c is a truncated distance parameter, and ρ j and ρ h represent the local densities of the intermittent process data sample points x j and x h, respectively. Defining the density contribution degree r jh of the intermittent process data samples with different distances to the current intermittent process data samples as In the formula, j and h represent sampling point serial numbers of intermittent process data. Calculating local density means for batch process dataWill ρ j be greater thanIntermittent process data samples and ρ j is less thanThe local density average value of the intermittent process data sample is divided into an upper limit value w max as a weight coefficient, and the weight coefficient w jh is obtained between the normalized density contribution degree r j