CN-122024892-A - Full-automatic Raman spectrum baseline correction method and system
Abstract
The invention discloses a full-automatic Raman spectrum baseline correction method and system, wherein the method comprises the steps of introducing a deep learning algorithm on the basis of chemometry, utilizing a weight output model based on deep learning to identify peaks and background areas in a spectrogram and output initial weight vectors, then self-adaptively optimizing smoothness parameters based on a multi-index evaluation strategy, and finally adopting a more robust dynamic iteration re-weighting punishment least square algorithm to carry out dynamic weighting iteration so as to fit a background curve and output a final corrected baseline. The invention can solve the problems that parameters in the existing Raman spectrum baseline correction method need manual adjustment, balance between peak intensity information retention and noise interference removal is difficult to achieve, the processing speed is low and the like, can realize the baseline correction of Raman spectrum on the premise of acquiring the peak position and the intensity of a reliable Raman spectrum peak, has a supporting effect on qualitative and quantitative analysis of Raman spectrum, has high running speed, and can be used for processing large-scale data.
Inventors
- LIU GUOKUN
- Wang Sanlei
- WU HAOPING
- REN BIN
- TIAN ZHONGQUN
Assignees
- 厦门大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260120
Claims (10)
- 1. A full-automatic Raman spectrum baseline correction method is characterized by comprising the following steps: The method comprises the steps of acquiring original Raman spectrum data, wherein the original Raman spectrum data comprises constructed simulated Raman spectrum data or real experimental data; inputting the original Raman spectrum data into a trained weight output model, and outputting a binarized initial weight vector for identifying a peak area and a background area, wherein the weight output model is constructed based on ResUnet architecture, and a training sample of the weight output model is composed of a simulated Raman spectrum data set marked with the peak area and the background area; Performing traversal search in a preset smoothness parameter range by using a punishment least square method PLS based on the initial weight vector, and fitting each candidate smoothness parameter to obtain an initial baseline; And substituting the original Raman spectrum data, the initial weight vector and the optimal smoothness parameter into a more robust self-adaptive iteration weighted least square baseline correction algorithm Dr-airPLS to carry out iteration correction, dynamically optimizing the weight vector according to the baseline fitting condition in the iteration process until a preset condition is met, and outputting a final corrected baseline.
- 2. The fully automated raman spectrum baseline correction method according to claim 1, wherein the process of constructing the simulated raman spectrum dataset comprises: The method comprises the steps of using a Gaussian function, a Lorentz function and a Voigt function as basic lines, simulating Raman spectrum peaks with different peak shapes, combining to obtain a Raman spectrum with preset Raman spectrum peaks, adding Gaussian noise with the intensity in a preset intensity range, and then adding randomly generated indexes and polynomial curves to simulate Rayleigh scattering background and fluorescent background respectively, so as to establish a simulated Raman spectrum data set containing diversified backgrounds.
- 3. The fully automatic raman spectrum baseline correction method according to claim 1, wherein the weight output model building and training process comprises: And training the weight output model by using the simulated Raman spectrum data set, wherein a AdamW optimizer is adopted in the training process, a cosine annealing algorithm is combined to automatically adjust the learning rate, and the optimization target is to minimize binary cross entropy Loss BCE Loss between the predicted weight and the label until the weight output model converges.
- 4. A fully automated raman spectrum baseline correction method according to claim 1, wherein said multiple indicators comprise a smoothness of a fitted initial baseline, a negative point proportion of a corrected spectrum, a negative area average of a corrected spectrum, and kurtosis of a corrected spectrum.
- 5. The fully automatic raman spectrum baseline correction method according to claim 4, wherein the correction quality of each initial baseline is quantitatively evaluated by a multi-index comprehensive evaluation strategy, and an optimal smoothness parameter is adaptively optimized, specifically comprising: normalizing the smoothness of the fitted initial baseline, the proportion of negative value points of the corrected spectrum, the average value of the negative value region of the corrected spectrum and the score of kurtosis of the corrected spectrum to the [0,1] interval respectively, and adding to obtain a comprehensive evaluation index ; Selecting the comprehensive evaluation index with the lowest score And the corresponding smoothness parameter is used as the optimal smoothness parameter.
- 6. The method of claim 5, wherein the integrated assessment index is a full-automatic raman spectrum baseline correction method The expression is as follows: ; Wherein, the Representing the smoothness of the initial baseline after Min-Max normalization; Representing the negative value point proportion after Min-Max normalization; Representing the average value of the negative value region after Min-Max normalization; Shows kurtosis after Min-Max normalization.
- 7. The fully automatic raman spectrum baseline correction method according to claim 1, wherein the dynamically optimized weight vector is represented as follows: ; Wherein, the For the number of points of the data, For the number of iterations, Representing the intensity vector of the original raman spectrum, Representing the initial weight vector output by the weight output model, For representation And the intensity vector of the initial baseline obtained by the optimal smoothness parameter fitting, Represent the first The weights of the corresponding data points after a number of iterations, Represent the first Weights for the corresponding data points after 1 iteration, Represent the first The intensity vector corresponding to the baseline after the number of iterations, Represent the first -Intensity vector corresponding to baseline after 1 iteration.
- 8. The fully automatic raman spectrum baseline correction method according to claim 1, wherein the preset condition comprises a preset convergence condition or a preset number of iterations.
- 9. The fully automatic raman spectrum baseline correction method according to claim 8, wherein the preset convergence condition is that an absolute value of a sum of deviations of a baseline and an original raman spectrum pattern is smaller than a preset threshold.
- 10. A fully automated raman spectroscopy baseline correction system, comprising: the Raman spectrum data acquisition module is used for acquiring original Raman spectrum data, wherein the original Raman spectrum data comprises constructed simulated Raman spectrum data or real experimental data; the initial weight vector identification module is used for inputting the original Raman spectrum data into a trained weight output model and outputting a binarized initial weight vector for identifying a peak area and a background area, wherein the weight output model is constructed based on ResUnet architecture, and a training sample of the weight output model is composed of a simulated Raman spectrum data set marked with the peak area and the background area; The optimal smoothness parameter selection module is used for carrying out traversal search in a preset smoothness parameter range by using a punishment least square method PLS based on the initial weight vector, and fitting each candidate smoothness parameter to obtain an initial baseline; And the correction baseline output module is used for taking the original Raman spectrum data, the initial weight vector and the optimal smoothness parameter as inputs, substituting the original Raman spectrum data, the initial weight vector and the optimal smoothness parameter into a more robust self-adaptive iteration weighted least square baseline correction algorithm Dr-airPLS to carry out iteration correction, dynamically optimizing the weight vector according to the baseline fitting condition in the iteration process until a preset condition is met, and outputting a final corrected baseline.
Description
Full-automatic Raman spectrum baseline correction method and system Technical Field The invention relates to the technical field of Raman spectrum post-processing, in particular to a full-automatic Raman spectrum baseline correction method and system. Background Raman spectroscopy, particularly surface-enhanced raman spectroscopy, has become an important spectroscopy technique for molecular structural analysis by virtue of its high resolution fingerprint and single molecule level high sensitivity. Analysis of spectral structure based on Raman spectrum has been widely used in many fields such as surface science, life science, food safety and environmental monitoring. In the raman spectrum test, due to certain limitations on the transmittance and bandwidth of the optical filter, part of unfiltered rayleigh scattering remains in the low wave number region of the raman spectrum, and the background similar to exponential decay is presented. In addition, fluorescence signals derived from the sample itself or impurities appear as broad peaks or continuous baselines in the raman spectrum, while photoluminescence from SERS substrates appears as humps over the entire spectral measurement range. Therefore, a baseline correction process must be performed on the spectrogram prior to "spectrum-structure" analysis. Current baseline correction methods for raman spectral data fall into two main categories. The method is based on a polynomial fitting algorithm, wherein the background is regarded as a curve formed by polynomial functions of a certain series, the polynomial background is fitted through a least square method, the core idea is that 'peak points' higher than the fitted curve are continuously removed through an iterative process, and only 'background points' are reserved for next fitting until the termination condition is met. Although the principle is intuitive, the method is extremely sensitive to the selection of polynomial orders, the complex curved background cannot be fitted due to the fact that the polynomial orders are too low, and false fluctuation of a baseline occurs in a peak-free area due to the fact that the order is too high. Another class of algorithms is based on the penalty least squares method by fitting a baseline by minimizing an objective function that contains a "fidelity term" and a "roughness penalty term". It does not rely on a local window, but instead globally optimizes the full spectrum, forcing the baseline to remain smooth with a penalty term while preserving the spectral profile. Such as an adaptive iterative weighted least squares baseline correction (airPLS) algorithm that achieves spectral baseline correction by introducing an adaptive iterative weighting strategy, estimating the baseline and adjusting the weights adaptively iteratively under penalty smoothness constraints. Although such an approach is more robust than a polynomial, its effect is highly dependent on the smoothness parameter and the setting of the asymmetric weights. Thus, the present work has attempted to develop a new algorithm that can accommodate the baseline correction processing of complex spectra. Disclosure of Invention The invention aims to overcome the defects of the prior art, aims to overcome the problems that parameters in the existing Raman spectrum baseline correction method need manual adjustment, balance is difficult to be achieved between peak intensity information retention and noise interference removal, the processing speed is low and the like, and provides a full-automatic Raman spectrum baseline correction method and system, by constructing the deep learning network ResUnet with residual connection and U-shaped symmetrical structure to intelligently identify spectrum peak areas, combining with multi-index evaluation strategy to adaptively optimizing smooth parameters and introducing an improved dynamic iterative weighting algorithm, the high-efficiency, accurate and automatic correction of Raman spectrum baselines is realized, and the method can be suitable for the baseline correction processing of complex spectrograms. The technical scheme adopted for solving the technical problems is as follows: in one aspect, a fully automated raman spectrum baseline correction method comprises: The method comprises the steps of acquiring original Raman spectrum data, wherein the original Raman spectrum data comprises constructed simulated Raman spectrum data or real experimental data; inputting the original Raman spectrum data into a trained weight output model, and outputting a binarized initial weight vector for identifying a peak area and a background area, wherein the weight output model is constructed based on ResUnet architecture, and a training sample of the weight output model is composed of a simulated Raman spectrum data set marked with the peak area and the background area; Performing traversal search in a preset smoothness parameter range by using a punishment least square method PLS based on