CN-121997006-A - Method, system and storage medium for characterizing drop quantity of end part of cigarette
Abstract
The invention relates to a tobacco production data characteristic screening technology, in particular to a method, a system and a storage medium for representing the quantity of tobacco rod end drop thread based on priori process knowledge, which comprise the steps of obtaining the data of the tobacco rod production process and dividing a dynamic region; the method comprises the steps of carrying out multi-dimensional feature extraction on data after region division to obtain a multi-dimensional feature set, carrying out edge feature pre-screening according to the multi-dimensional feature set to obtain an effective feature set, obtaining a comprehensive grading feature set according to the effective feature set, carrying out in-region feature screening on the comprehensive grading feature set to obtain an initial feature set and a candidate feature set, and carrying out improved sequence forward selection fine screening based on the initial feature set and the candidate feature set to obtain an optimal feature subset. The method for characterizing the tobacco rod end drop quantity provided by the embodiment of the invention realizes higher characterization precision, stronger model robustness, better process interpretability and wider production scene adaptability.
Inventors
- QIU YUCAN
- PAN ZHU
- ZHU LIMING
- ZHU QIANG
- JIA QIAODONG
- FAN HU
- YANG DAOJIAN
- XU XUE
Assignees
- 浙江中烟工业有限责任公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260129
Claims (10)
- 1. A method of characterizing a tobacco rod end drop quantity, the method comprising: acquiring cigarette production process data and dividing a dynamic region; Carrying out multi-dimensional feature extraction on the data subjected to region division to obtain a multi-dimensional feature set; performing edge feature pre-screening according to the multi-dimensional feature set to obtain an effective feature set; Acquiring a comprehensive scoring feature set according to the effective feature set; Performing intra-region feature screening on the comprehensive scoring feature set to obtain an initial feature set and a candidate feature set; and performing improved sequence forward selection fine screening based on the initial feature set and the candidate feature set to obtain an optimal feature subset.
- 2. The characterization method of claim 1 wherein acquiring the cigarette production process data and performing dynamic region segmentation comprises: Acquiring production data of cigarette drop quantity and multi-section discrete microwave density data of a cigarette shaft; calculating a density gradient according to the smoke axis axial multi-section discrete microwave density data to obtain a gradient value; Acquiring potential mutation points according to the gradient values; Screening effective mutation points according to the potential mutation points; and carrying out region division according to the effective mutation points.
- 3. The characterization method of claim 1 wherein performing multi-dimensional feature extraction on the region-partitioned data to obtain a multi-dimensional feature set comprises: Extracting characteristics from the data after division from concentrated trend, discrete degree, distribution form and time sequence characteristic dimension, wherein the concentrated trend comprises mean value, median and mode, the discrete degree comprises standard deviation, variance, range and quartile range, the distribution form comprises skewness and kurtosis, and the time sequence characteristic comprises differential mean value, differential standard deviation, time trend and zero crossing rate.
- 4. The characterization method of claim 1 wherein performing edge feature pre-screening based on the multi-dimensional feature set to obtain a valid feature set comprises: The multi-dimensional feature set is subjected to variation coefficient screening to obtain a first screening data set; performing monotonicity-saliency joint screening on the multidimensional feature set to obtain a second screening data set; Intersection sets are taken of the first screening dataset and the second screening dataset to obtain a valid feature set.
- 5. The characterization method of claim 1 wherein obtaining a composite scoring feature set from the active feature set comprises: respectively calculating a nonlinear association value and a linear association value according to the effective feature set; setting dynamic weights according to the nonlinear correlation values and the linear correlation values; calculating the comprehensive score of each effective feature according to the dynamic weight; and obtaining a comprehensive scoring feature set according to the comprehensive score of each effective feature.
- 6. The characterization method of claim 5 wherein calculating a composite score for each valid feature from the dynamic weights comprises: the composite score for each valid feature is calculated according to equation (1), ,(1) Wherein, the For the purpose of the composite score, In order for the dynamic weights to be given, For the normalized linear correlation value, And the nonlinear correlation value is normalized.
- 7. The characterization method of claim 1 wherein performing improved sequence forward selection fine screening based on the initial feature set and candidate feature set to obtain an optimal feature subset comprises: Performing preferential addition based on the current initial feature set and the current candidate feature set to acquire a current feature subset; based on the current feature subset, calculating a correlation coefficient and eliminating redundant features; updating the current initial feature set and the current candidate feature set according to the current feature subset after the redundant features are removed; judging whether a stopping condition is met; returning to execute the step of preferentially adding based on the current initial feature set and the current candidate feature set to acquire the current feature subset under the condition that the stopping condition is not met; And when judging that the stopping condition is met, selecting the feature subset corresponding to the lowest value of the prediction error as the optimal feature subset.
- 8. The characterization method of claim 1 wherein preferentially joining based on the current initial feature set and the current candidate feature set to obtain the current feature subset comprises: Setting the initialization feature set as The candidate feature set is ; For the candidate feature set Each feature of (3) Constructing an extended feature set ; Based on each extended feature set Training random forest models respectively, and calculating corresponding prediction errors on a verification set; selecting features from all candidate features that minimize prediction error ; Will select the features The current initial feature set is added as the current feature subset.
- 9. A cigarette end drop characterization system, characterized in that it comprises a processor configured to perform the method of any one of claims 1 to 8.
- 10. A computer readable storage medium having instructions stored thereon which, when executed by a processor, implement the method of any of claims 1 to 8.
Description
Method, system and storage medium for characterizing drop quantity of end part of cigarette Technical Field The invention relates to a tobacco production data characteristic screening technology, in particular to a method, a system and a storage medium for representing the tobacco rod end drop quantity based on priori process knowledge. Background In the cigarette production process, the cut tobacco falling quantity at the end part of the cigarette is a core index for measuring the quality of the cigarette, and the stability of the cut tobacco falling quantity directly influences the smoking taste, the combustion performance and the production cost. The traditional detection of the end drop quantity of the cigarette relies on offline sampling and weighing, and has the defects of lag detection and incapability of real-time monitoring, wherein 32 sections of microwave density data of the cigarette are important factors influencing the end drop quantity, the density data are online real-time measurement data, and a data base is provided for digital representation of the end drop quantity. However, the accurate screening of features related to the end doffing amount from 32-segment density data faces multiple challenges, namely the problems of interleaving of noise and effective information in high-dimensional data, difficult characterization of nonlinear association of features and doffing amount, insufficient fusion of process knowledge and a data driving method, reduced model generalization capability caused by feature redundancy and the like become technical bottlenecks for restricting accurate characterization of the end doffing amount of a cigarette. The application of the existing feature screening method in the field of industrial quality control has significant limitations. The traditional method adopts a single index (such as a pearson correlation coefficient) to measure the correlation between the characteristic and the target variable, can only capture linear correlation, and is difficult to characterize the nonlinear mapping relation between the density data and the yarn falling quantity. For example, in dealing with the correlation of density gradient discontinuities with the amount of doffing, linear correlation analysis often results in feature screening bias due to asymmetry in the data distribution. In addition, most feature screening algorithms lack systematic integration of process knowledge, on one hand, a fixed region division mode (such as uniformly dividing cigarette sections) does not consider process characteristics (such as gradient difference between end parts and middle parts) of density distribution, so that feature extraction is not in accordance with actual process requirements, and on the other hand, since features are not grouped and managed according to process knowledge, feature redundancy in a group is difficult to effectively control, model complexity is increased, and overfitting is easy to occur. In the link of feature pre-screening and fine screening, the defects of the prior art are more prominent. The statistic-based screening method (such as variation coefficient and significance test) generally processes each feature independently, ignores time sequence relativity and spatial distribution characteristics among the features, and search algorithms such as sequence forward selection and the like are easy to fall into a local optimal solution because a dynamic redundancy check mechanism is not introduced. In a cigarette production scene, 32 sections of density data have strong spatial correlation, adjacent sections of density fluctuation influence each other, and redundant information is difficult to reject while key features are reserved in the traditional method. In addition, the prior art lacks a dynamic weight adjustment mechanism for feature importance, and cannot balance the contribution of linear correlation and nonlinear correlation to end-doffing quantity characterization, so that the robustness of feature screening results is insufficient. Therefore, how to deeply integrate prior process knowledge with a data-driven feature screening method and realize dynamic region division of cigarette density data, nonlinear association measurement of multidimensional features and redundancy control based on process knowledge becomes a core technical problem for improving the characterization accuracy of the yarn falling amount. The invention aims to solve the problems of linear correlation dependence, poor process suitability and insufficient redundancy control in a feature screening link of the existing method by integrating a process knowledge and a multidimensional feature evaluation system. Disclosure of Invention The embodiment of the invention aims to provide a method, a system and a storage medium for representing the tobacco rod end drop quantity based on priori process knowledge, so as to solve the problems of linear correlation dependence, poor process suitabilit