CN-120998302-B - Determination method of high ethanol tolerance yeast strain based on multiple groups of chemical markers

CN120998302BCN 120998302 BCN120998302 BCN 120998302BCN-120998302-B

Abstract

The invention discloses a method for determining a high ethanol tolerance yeast strain based on a plurality of chemical markers, and relates to the technical field of computers. The method comprises the steps of performing Z-score standardization treatment on gene expression amounts, protein expression profiles and metabolite abundance of a plurality of unlabeled wild yeast strains, aligning three groups of data after the Z-score standardization treatment according to characteristics, extracting standardized values of GRE1 genes, TPS1 genes, HSP104 proteins, ADH2 proteins, CTA1 proteins, sphingosine and trehalose-6-phosphate of the three groups of data after the characteristic alignment, forming a plurality of groups of biological marker characteristic combinations according to the extracted standardized values, inputting the plurality of groups of biological marker characteristic combinations into a strain screening model for industrial strain screening to obtain high ethanol tolerance scores of each of the plurality of unlabeled wild yeast strains, and taking a strain corresponding to the high ethanol tolerance scores which are greater than or equal to a preset threshold as the high ethanol tolerance yeast strain. The method can reduce the screening cost of yeast strains.

Inventors

FENG LI
LI XIANGCHEN
LU YEWEI
MA RUILING

Assignees

浙江农林大学暨阳学院

Dates

Publication Date: 20260512
Application Date: 20250527

Claims (6)

1. A method for determining a strain of highly ethanol tolerant yeast based on a plurality of sets of chemical markers, comprising: Obtaining gene expression quantity, protein expression profile and metabolite abundance of a plurality of unlabeled wild yeast strains; Performing Z-score standardization treatment on the gene expression quantity, the protein expression profile and the metabolite abundance respectively, and aligning three groups of data after the Z-score standardization treatment according to characteristics; Extracting standardized values of GRE1 genes, TPS1 genes, HSP104 proteins, ADH2 proteins, CTA1 proteins, sphingosine and trehalose-6-phosphate of three groups of data with aligned characteristics, and forming a plurality of groups of biological biomarker characteristic combinations according to the extracted standardized values; The multi-group biological marker feature combination is input into a strain screening model for industrial strain screening to obtain high ethanol tolerance probability scores of each of a plurality of unlabeled wild yeast strains, wherein the strain screening model comprises a random forest model, a support vector machine model, a gradient lifting decision tree model, a naive Bayes model, a neural network model and a generalized linear model; And taking the unlabeled wild yeast strain with the high ethanol tolerance probability score being greater than or equal to a preset threshold value as the high ethanol tolerance yeast strain.
2. The method of claim 1, wherein the combining the plurality of sets of chemical biomarker features, inputting a strain screening model for industrial strain screening, and obtaining a high ethanol tolerance probability score for each of a plurality of unlabeled wild yeast strains, specifically comprises: sequentially arranging Z-score values of 7 markers in the multiple sets of chemical biomarker signature combinations as an input vector; in a strain screening model, each tree is independently traversed, each tree is split from a root node according to a characteristic threshold value, unlabeled wild yeast strains are distributed to specific leaf nodes, and the leaf nodes correspond to a classification label; Recording the duty ratio of the high ethanol tolerance yeast strain in the specific leaf node as the local probability during training; and carrying out weighted average on the local probabilities of all the trees to obtain a weighted average result, and determining the weighted average result as a high ethanol tolerance probability score of the unlabeled wild yeast strain, wherein the weighted average weight is the precision of the trees.
3. The method of claim 1, wherein the training process of the strain screening model specifically comprises: Acquiring a sample feature matrix, and dividing the sample feature matrix into a training set and a verification set; Training a random forest model, a support vector machine model, a gradient lifting decision tree model, a naive Bayesian model, a neural network model and a generalized linear model through the training set, and adjusting parameters of all models until all models are optimal; Inputting the verification set into the model for each model to obtain a screening result of the model, and comparing the screening result of the model with the real category of the test set to obtain an index of the model, wherein the index comprises prediction precision, sensitivity and specificity; Performing contrast analysis on indexes of all models, and determining an optimal model according to a contrast analysis result; and determining the optimal model as the strain screening model.
4. The method of claim 3, wherein the obtaining a sample feature matrix specifically comprises: obtaining a plurality of high ethanol tolerant yeast strains and a plurality of low ethanol tolerant yeast strains as a sample training set; Respectively obtaining sample gene expression quantity, sample protein expression profile and sample metabolite abundance of each yeast strain in a sample training set under different ethanol concentrations; Performing Z-score standardization treatment on the sample gene expression quantity, the sample protein expression profile and the sample metabolite abundance respectively to obtain three groups of data; And performing feature selection on the three groups of data after the Z-score standardization treatment to obtain a sample feature matrix, wherein the sample feature matrix comprises GRE1 genes, TPS1 genes, HSP104 proteins, ADH2 proteins, CTA1 proteins, sphingosine and trehalose-6-phosphate.
5. The method of claim 4, wherein the feature selection of the three sets of data after the Z-score normalization process to obtain a sample feature matrix specifically comprises: Aligning data after Z-score standardization treatment to data of group crossing by UniProt ID/KEGG ID; performing LASSO regression operation after the alignment of the inter-group data to screen out a plurality of features; According to the screened multiple characteristics, each yeast strain corresponds to one row in the characteristic matrix, each biomarker corresponds to one column, an n multiplied by 7 dimensional matrix is generated, the n multiplied by 7 dimensional matrix is determined as a sample characteristic matrix, and n is the number of yeast strains in a sample training set.
6. The method of claim 4, wherein the selection of the plurality of high ethanol tolerant yeast strains and the plurality of low ethanol tolerant yeast strains comprises: Selecting a candidate strain from a wild saccharomyces cerevisiae strain library; preparing an ethanol-containing culture medium; culturing the candidate strain in a medium containing ethanol; After the cultivation, the candidate strains are subjected to primary screening and secondary screening, and the strains obtained by secondary screening are screened according to the screening standard of the high-ethanol-tolerance strains and the screening standard of the low-ethanol-tolerance strains, so that a plurality of high-ethanol-tolerance yeast strains and a plurality of low-ethanol-tolerance yeast strains are obtained.

Description

Determination method of high ethanol tolerance yeast strain based on multiple groups of chemical markers Technical Field The invention relates to the technical field of computers, in particular to a method for determining a high ethanol tolerance yeast strain based on a plurality of chemical markers. Background Saccharomyces cerevisiae is the core strain for industrial ethanol fermentation. In the process of ethanol fermentation, as the concentration of ethanol increases, stress is generated on the growth of yeast cells, and the yeast cells can generate corresponding stress response for survival and growth to cope with the stress, and the coping mechanism is that the yeast cells have tolerance to ethanol. Ethanol tolerance directly determines Saccharomyces cerevisiae fermentation efficiency and product concentration. However, high concentrations of ethanol can disrupt cell membrane integrity, inhibit enzymatic activity, and induce oxidative stress, leading to cell growth arrest and even death. It is generally believed that the effect of ethanol on yeast cells is mainly manifested in 3 aspects, namely inhibition of cell growth, cell survival and fermentation. Ethanol tolerance is therefore often defined by the effect of ethanol on cell growth. 48 h is cultivated in a culture medium containing 1% -14% ethanol at room temperature, and the highest ethanol concentration allowed to grow is representative of the ethanol tolerance level of the yeast. Wherein the strain capable of growing in 3% -6% ethanol has poor ethanol tolerance, 6% -10% is medium, and 10% -13% is high. This definition is relatively simple and is commonly used to screen strains with ethanol tolerance. Improving the ethanol tolerance of yeast is of great importance for improving fermentation efficiency, yield and quality of the final product. The traditional yeast strain screening method relies on culturing strains in culture media with different ethanol concentrations, and judging the ethanol tolerance of the yeast strains by measuring parameters such as a growth curve, a maximum specific growth rate, a final cell density and the like. However, this method requires a large number of experimental operations, which is time-consuming and labor-consuming, resulting in high screening costs. Thus, there is a need for a method of reducing the cost of screening yeast strains. Disclosure of Invention Based on this, it is necessary to provide a method for determining a highly ethanol tolerant yeast strain based on a plurality of chemical markers in view of the above-mentioned technical problems. The method can reduce the screening cost of yeast strains. The invention adopts the following technical scheme: the invention provides a method for determining a high ethanol tolerance yeast strain based on a plurality of chemical markers, which comprises the following steps: Obtaining gene expression quantity, protein expression profile and metabolite abundance of a plurality of unlabeled wild yeast strains; respectively carrying out Z-score standardization treatment on the gene expression quantity, the protein expression profile and the metabolite abundance, and aligning three groups of data after the Z-score standardization treatment according to characteristics; Extracting standardized values of GRE1 genes, TPS1 genes, HSP104 proteins, ADH2 proteins, CTA1 proteins, sphingosine and trehalose-6-phosphate of three groups of data with aligned characteristics, and forming a plurality of groups of biological biomarker characteristic combinations according to the extracted standardized values; Combining the multiple groups of biological marker characteristics, inputting the combination into a strain screening model for industrial strain screening to obtain high ethanol tolerance probability scores of each of the multiple unlabeled wild yeast strains; And taking the unlabeled wild yeast strain with the high ethanol tolerance probability score being greater than or equal to a preset threshold value as the high ethanol tolerance yeast strain. Preferably, the combination of the multiple groups of biological biomarker characteristics is input into a strain screening model for industrial strain screening to obtain a high ethanol tolerance probability score of each of the plurality of unlabeled wild yeast strains, and the method specifically comprises the following steps: sequentially arranging Z-score values of 7 markers in the multiple sets of chemical biomarker signature combinations as an input vector; in a strain screening model, each tree is independently traversed, each tree is split from a root node according to a characteristic threshold value, unlabeled wild yeast strains are distributed to specific leaf nodes, and the leaf nodes correspond to a classification label; Recording the duty ratio of the high ethanol tolerance yeast strain in the specific leaf node as the local probability during training; And carrying out weighted average on the local probabilities of