CN-122016714-A - Soybean origin tracing detection method based on near infrared spectrum technology

CN122016714ACN 122016714 ACN122016714 ACN 122016714ACN-122016714-A

Abstract

The invention is applicable to the technical field of food traceability detection, and discloses a soybean origin traceability detection method based on a near infrared spectrum technology, which comprises the following steps: and (3) collecting soybean samples in different areas, respectively collecting near infrared spectrums of the soybeans by using near infrared equipment (a host machine and a slave machine), defining (a source domain and a target domain), performing interpolation alignment, spectrum signal correction and other treatments on the spectrum data, dividing a data set, merging different numbers of spectrum samples in the target domain with the spectrum samples in the source domain, and jointly training an ANN model to realize model migration. The detection method reduces the construction cost of the soybean classification model during migration, and has important significance for rapid evaluation and deployment of soybean origin tracing.

Inventors

WANG KAIQIANG
YAN WENQIAN
Kang Chuanbin
JU LEI

Assignees

中国海洋大学

Dates

Publication Date: 20260512
Application Date: 20260323

Claims (8)

1. The soybean origin tracing detection method based on the near infrared spectrum technology is characterized by comprising the following steps of: S1, collecting soybean samples in different areas, and screening out impurities and defective soybean grains; S2, collecting a near infrared spectrum of the soybean sample in the step S1 by using a host; s3, collecting a near infrared spectrum of the soybean sample in the step S1 by using a slave machine; S4, preprocessing near infrared spectrums collected by the host and the slave in S2 and S3, wherein a near infrared spectrum set collected by the host is called a source domain, and a near infrared spectrum set collected by the slave is called a target domain; S5, dividing the source domain and target domain data sets preprocessed in the S4 according to a proportion; and S6, merging the source domain and the target domain in the S5 to construct a soybean origin tracing discrimination model.
2. The method for detecting soybean origin tracing based on near infrared spectrum technology according to claim 1, wherein the collection of soybean samples of different origins in step S1 is performed, and the sample size of each region should be not less than 30 groups.
3. The soybean origin tracing detection method based on the near infrared spectrum technology according to claim 1, wherein the spectrum collection ranges of the master and the slave in the steps S2 and S3 should cover a short-wave near infrared spectrum region (700 nm ‒ 1100 nm) and a long-wave near infrared spectrum region (1100 ‒ 2500 nm), the number of spectrum collection of each sample is not less than 10, and the relative standard deviation of the signal intensities of the spectrums collected in the step S2 at different wavelengths is less than 7%. The spectrum acquired in the step S3 should have a relative standard deviation of signal intensity of less than 25% at different wavelengths.
4. The soybean origin tracing detection method based on the near infrared spectrum technology according to claim 1, wherein the spectrum preprocessing in the step S4 comprises spectrum linear interpolation and spectrum signal correction.
5. The near infrared spectrum technology-based soybean origin tracing detection method according to claim 4, wherein the spectrum signal correction adopts a correction method of standard normal transformation and multiple scattering correction to preprocess an original spectrum, and is processed in series with a derivative method comprising a first-order gap derivative.
6. The soybean origin tracing detection method based on the near infrared spectrum technology according to claim 1, wherein in the step S5, the target domain is randomly divided into a training set and a testing set according to the proportion, and the consistency of the NIR spectrum proportions of soybeans from different origins in the training set and the testing set is ensured.
7. The near infrared spectrum technology-based soybean origin tracing detection method according to claim 6, wherein in the step S6, different numbers of spectrum samples are extracted from a target domain training set and combined with a source domain spectrum, a model is trained together, and a deep learning network is utilized to identify the "sharing" feature between different domains.
8. The near infrared spectrum technology-based soybean origin tracing detection method of claim 7, wherein the indexes for evaluating the model performance in the step S6 include accuracy and weighted F1 score as indexes for evaluating the model performance, and the calculation method is as follows: , , , 。

Description

Soybean origin tracing detection method based on near infrared spectrum technology Technical Field The invention relates to the technical field of food traceability detection, in particular to a soybean origin traceability detection method based on a near infrared spectrum technology. Background Soybean traceability is one of the key attributes for ensuring the quality and safety of agricultural products. As the soybeans in different production areas have differences in growth environment, planting conditions and the like, the nutrition components, quality characteristics and market price of the soybeans also have obvious differences, and the proper processing application of the soybeans is further affected. With the continuous expansion of the scale of soybeans imported in China, the quality safety control standard of agricultural products is also increasingly improved. Therefore, developing efficient and reliable agricultural product origin tracing technology has become an urgent need in agricultural product trade. In recent years, the combination of near infrared spectroscopy technology and chemometric methods has gradually become a powerful tool for soybean production area traceability. Research shows that the soybean is influenced by specific natural conditions and cultivation modes in the growth process, and chemical components and metabolic products of the soybean are different, so that the soybean becomes unique regional characteristics. This feature also provides an inherent basis for origin identification based on spectral information. Near infrared spectrum is a mature and easily miniaturized analytical technique with a wavelength range of 780 ‒ 2526 nm. The characteristics of vibration frequency multiplication and frequency combination signals of hydrogen-containing groups in a sample are mainly reflected, so that the integral composition and structural information of organic compounds in soybeans are revealed. In the experiment, the sample can quickly obtain the structural information of moisture, protein, lipid and other molecules in the soybean based on the spectral information only by simple pretreatment. The research of constructing chemometric models based on spectrum signals has been widely carried out, and methods such as a Support Vector Machine (SVM) and a Neural Network (NN) are proved to be applicable to regression analysis of key nutritional components in soybeans and classification prediction of production places. The neural network has the structural advantages of multilevel and modularization, can be used for designing a learning architecture with specific functions, can integrate other functions such as feature extraction, data enhancement and the like on the basis of tasks such as classification and regression, and improves the robustness and generalization capability of the model. In the task of classifying models constructed based on NIR spectroscopy, it is common to collect spectral samples and train models using only fixed equipment or equipment of the same model. However, in the practical deployment model, the sources and conditions of the spectrum devices are complex. The model trained based on one device (host) is applied to other types of devices, and the problem of performance degradation often occurs. This is because in a spectral classification model built on a single instrument/lot, the characteristic signals that contribute higher to the decision may be associated to a high degree with the interference signals caused by the equipment or lot, rather than the spectral signals associated with the place of production, thereby weakening the robust association between model features and real place of production information. This problem is often due to noise in the spectrum, baseline drift, etc. caused by different instruments/batches. The transfer learning can improve the adaptability and generalization performance of the model in complex actual scenes such as multi-source, multi-task, multi-batch experiments and the like by identifying the characteristics among different fields. In the migration learning of different spectrum sources, the signal difference between devices is a key bottleneck for restricting the large-scale application of the technology. This is because there is a difference in optical configuration and photoelectric signal conversion between different devices, resulting in systematic deviations in the intensity and sensitivity of the spectrum. Although migration methods such as direct normalization, segmented direct normalization and the like realize the consistency of signal intensities of different spectrum sources through mathematical transformation. However, the methods need to correspond signals before and after migration to target samples one by one in spectrum dimension, the sensitivity of data after migration is low, the difference between the data after migration and the target spectrum obtained by actual test is large, and the method cannot b