Search

CN-122024812-A - Single-cell Raman spectrum database construction and identification model establishment method for ARC biological agent core strain

CN122024812ACN 122024812 ACN122024812 ACN 122024812ACN-122024812-A

Abstract

The invention discloses a single-cell Raman spectrum database construction and identification model establishment method of an ARC biological agent core raw material strain, belongs to the field of biological detection, and adopts a Raman histology technology to construct an ARC biological agent core raw material strain characteristic spectrum data set containing more than 63,000 single-cell Raman spectrums. Of the six machine learning predictive models developed and compared for each strain, the Linear Discriminant Analysis (LDA) model performed best with classification accuracy exceeding 92.4%. The research provides a unique spectral fingerprint spectrum for the ARC biological agent core raw material strain, and can directly assist the establishment of a quality control framework and the application of the ARC biological agent core raw material strain in sustainable agricultural production.

Inventors

  • KANG JIE
  • WEI FENG
  • DOU JINGJING
  • ZHANG LIANGXIAO
  • ZHANG QI
  • LI PEIWU

Assignees

  • 中国农业科学院油料作物研究所

Dates

Publication Date
20260512
Application Date
20260206

Claims (6)

  1. The method for constructing the ARC biological agent Raman spectrum database and establishing the identification model is characterized by comprising the following steps: (1) Preparing bacterial suspensions of four core raw material strains of ARC biological bacterial agents, namely bacillus amyloliquefaciens (Bacillus amyloliquefaciens), brevibacillus laterosporus (Brevibacillus laterosporus), paenibacillus mucilaginosus (Bacillus mucilaginosus) and Enterobacter ludwigii (Enterobacter ludwigii); (2) Acquiring single-cell Raman spectra of four strains by adopting a Raman flow cytometry technology based on microfluidics; (3) The Raman spectrum data processing, namely cutting an effective spectrum in a 977-3010 cm-1 range, correcting a base line by adopting a least square method, and carrying out maximum-minimum value normalization processing; (4) The LDA modeling is carried out, namely, spectrum dimension reduction is carried out by combining a mask self-encoder MAE with a K mean value, and then a prediction model is constructed by linear discriminant analysis LDA; (5) And (3) model verification, namely repeating the steps (1) to (3), and substituting the model into the prediction model in the step (4) for verification.
  2. 2. The method according to claim 1, wherein in the step (1), the preparation of the bacterial suspension is carried out by inoculating each of the four core raw material strains constituting the ARC biological agent into LB liquid medium, culturing for 12 hours under shaking conditions of 37 ℃ and 180 rpm, taking 1 mL of each log phase bacterial culture, centrifuging for 5 minutes at 8000 Xg, washing the cell pellet twice with sterile PBS, washing with sterile deionized water for 10 minutes each time, and finally re-suspending the cells in loading buffer and adjusting the cell concentration to 1.0X10- 2 –1.0×10 5 cells/mL.
  3. 3. The method for establishing the micro-fluidic sample injection system according to claim 1, wherein the specific method in the step (2) is characterized in that the sample injection rate of the micro-fluidic sample injection system is controlled to be 50-100 mu L/min, the laser wavelength is 532 nm, the laser energy is 100 mW, the acquisition time is 0.5 and s, and more than 5000 cells are respectively acquired from each strain of bacterial sample liquid.
  4. 4. The method of claim 1, wherein the raman spectrum data processing in step (3) comprises the steps of: a. Data cleaning: 1) Signal-to-noise ratio signal regions 2897-2971, signal-to-noise ratio threshold 30, method peak_area 2) Outlier removal, outlier ratio 0.05 3) Cosmic ray correction, anomaly spectrum deletion B. data preprocessing: 1) Band selection 977-3010 2) Baseline correction, namely spectrum baseline removing method based on least square method 3) Normalization, namely normalization of maximum-minimum values.
  5. 5. The method according to claim 1, wherein the mask in step (4) is set to 2 in number of principal components and set to 4 in clusters by combining parameters of K-means to spectral dimension reduction with an encoder MAE.
  6. 6. The method of claim 1, wherein the linear discriminant analysis LDA in step (4) is performed by singular value decomposition svd.

Description

Single-cell Raman spectrum database construction and identification model establishment method for ARC biological agent core strain Technical Field The invention belongs to the field of biological detection, and particularly relates to a method for constructing a single-cell Raman spectrum database of an ARC biological agent core raw material strain and a method for constructing an identification model. Background The root systems of leguminous crops such as soybeans, peanuts and the like have unique characteristics of symbiosis with rhizobium in soil to form rhizobium (Oldroyd, 2013; stokstad and the like, 2016). Each nodule, like a miniature nitrogen processing plant, is capable of converting free nitrogen (N 2) in air into plant-absorbable nutrient ammonia (Poeter et al, 2024). The symbiotic nitrogen fixation process can provide approximately 70% of nitrogen for the total biomass of crops and approximately 80% of nitrogen for the kernels (Herridge et al, 2008). Therefore, the improvement of symbiotic nitrogen fixation efficiency not only can remarkably improve the nitrogen nutrition level of leguminous crops and the yield and quality of the crops, but also can reduce the dependence on chemical nitrogen fertilizer in agricultural production and reduce environmental pollution (Zhang et al, 2023). However, in natural conditions, legume root systems generally form only a limited number of nodules with low nitrogen fixation efficiency. The traditional method for inoculating rhizobia has the defects of poor ecological adaptability, strong strain specificity, high environmental sensitivity and the like. Therefore, developing a green and efficient strategy to enhance nodulation and nitrogen fixation capability of leguminous crop-rhizobium symbiotic system is not only an important direction of current agricultural scientific research, but also a great challenge to be overcome. To address these challenges, we have previously developed ARC biological agents that significantly increase the number of root nodules and nitrogen fixation capacity of leguminous crops such as soybean, peanut, etc., using plant probiotics such as bacillus amyloliquefaciens, brevibacillus laterosporus, bacillus mucilaginosus, escherichia coli, etc., as core material strains (free of rhizobium) (Zhou et al, 2022; zhang et al, 2025). The field test results carried out in 174 demonstration points of the main soybean production area of China for three years in 2022-2024 show that the ARC biological microbial inoculum obviously improves the symbiotic nitrogen fixation capacity of soybean root systems, and is characterized in that the number of root nodules is increased by 3.7 times on average, the activity of the nitrogen fixation enzyme is increased by 4.8 times on average, and the soybean yield is increased by more than 15%. The fingerprint spectrum of the core raw material strains is comprehensively analyzed, and the method has important significance in verifying the microbial inoculum formula, guiding the excavation of beneficial strains and improving the productivity of the power-assisted soybean oil material. The newly developed single cell functional imaging tool "raman set" (Ramanome) enables in situ, label-free, non-destructive metabolic analysis of microbial cells by integrating single cell raman spectroscopic data and can be used in combination with targeted cell sorting techniques (Teng et al, 2016). The technology breaks through the traditional mode of 'screening before culturing' in the microbial research, and initiates a new strategy of 'screening before culturing' without designing a fluorescent probe (Xu et al, 2017). Disclosure of Invention The invention aims to solve the technical problem of how to construct a Raman spectrum fingerprint database of an ARC biological agent core raw material strain and establish an identification model. The technical scheme of the invention is that the method for constructing the ARC biological agent Raman spectrum database and the identification model comprises the following steps: (1) Preparing bacterial suspensions of four core raw material strains of an ARC biological bacterial agent, namely bacillus amyloliquefaciens (Bacillus amyloliquefaciens), brevibacillus laterosporus (Brevibacillus laterosporus), paenibacillus mucilaginosus (Bacillus mucilaginosus) and Enterobacter ludwigii (Enterobacter ludwigii); (2) Acquiring single-cell Raman spectra of four strains by adopting a Raman flow cytometry technology based on microfluidics; (3) The Raman spectrum data processing, namely cutting an effective spectrum in a 977-3010 cm-1 range, correcting a base line by adopting a least square method, and carrying out maximum-minimum value normalization processing; (4) Modeling analysis, namely firstly reducing the dimension of a spectrum by combining a mask MAE with a K-means value, and then constructing a prediction model by linear discriminant analysis LDA; (5) And (3) model verification, namely repeati