CN-122023918-A - Base identification method based on breeding chip, scanning equipment and storage medium
Abstract
The invention discloses a base identification method, a training method, scanning equipment and a storage medium based on a breeding chip, wherein the method comprises the steps of obtaining fluorescent images which are shot on the breeding chip and comprise a plurality of different base channels; the method comprises the steps of obtaining brightness values of scattered points of each target probe in each type of probe on each base channel based on the fluorescence image, extracting scattered point data characteristics of each type of probe based on the brightness values of the scattered points of each target probe in each type of probe on each base channel, forming input of a pre-trained base recognition model based on the scattered point data characteristics of each type of probe, and outputting base data of target detection sites detected by each type of probe.
Inventors
- CHEN WEI
- WANG GUFENG
- ZHAO LUYANG
Assignees
- 深圳赛陆医疗科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260203
Claims (11)
- 1. A breeding chip-based base recognition method, comprising: acquiring fluorescent images of a plurality of different base channels shot by a breeding chip; acquiring brightness values of scattered points of each target probe in each type of probe on each base channel based on the fluorescence images; Extracting the scattered point data characteristics of each type of probe based on the brightness value of each target probe scattered point in each base channel; Based on the scatter data characteristics of each type of probe, the input of a pre-trained base recognition model is formed, and the base data of the target detection site detected by each type of probe is output.
- 2. The method for identifying bases based on a breeding chip according to claim 1, wherein the obtaining, based on the fluorescence image, a brightness value of each target probe scatter point on each base channel in each type of probe comprises: Based on the fluorescence image, preprocessing brightness data of each type of probe is obtained; Based on the preprocessing brightness data of each type of probe, deleting the abnormal brightness data in each type of probe to obtain the brightness value of each target probe scattered point in each type of probe on each base channel.
- 3. The method for identifying bases based on breeding chips as claimed in claim 2, wherein the step of deleting abnormal brightness data in each type of probes based on the pre-processed brightness data of each type of probes to obtain brightness values of scattered points of each target probe in each type of probes on each base channel comprises at least one of the following steps: Based on the preprocessing brightness data of any kind of probes, estimating the brightness distribution of any kind of probes, calculating the distance from each probe scattered point in any kind of probes to the center of the brightness distribution, taking the brightness data of the probe scattered points with the distance being greater than or equal to a preset distance threshold value as abnormal brightness data, and deleting the abnormal brightness data to obtain the brightness value of each target probe scattered point in any kind of probes on each base channel; Generating a plurality of isolation trees based on the preprocessing brightness data of any kind of probes, calculating the average path length from each probe scattered point in any kind of probes to all isolation trees, calculating the corresponding anomaly score of each probe scattered point based on the average path length corresponding to each probe scattered point, taking the brightness data of the probe scattered points with the anomaly score being larger than the anomaly threshold value as the anomaly brightness data, and deleting the anomaly brightness data to obtain the brightness value data of each target probe scattered point in any kind of probes on each base channel.
- 4. The method for identifying bases based on breeding chips according to claim 1, wherein the scattered point data characteristic comprises at least one of a priori base sequence characteristic, a base channel brightness principal component characteristic and a base brightness Gaussian distribution characteristic, and the extracting the scattered point data characteristic of each type of probe based on the brightness value of each target probe scattered point in each base channel comprises at least one of the following: for any type of probe, based on the brightness value of each target probe scattered point on each base channel, performing principal component analysis to obtain the base channel brightness principal component characteristics; and for any type of probe, based on the brightness value of each target probe scattered point on each base channel, performing Gaussian distribution fitting to obtain the base brightness Gaussian distribution characteristics.
- 5. The method for identifying a base based on a breeding chip according to claim 4, wherein the a priori base sequence characteristics include at least one of a ratio of base C to base G of the probe sequence of the probe of any one type, a preset number of bases at the end of the probe sequence of the probe of any one type; The base channel brightness principal component characteristics comprise at least one of a first principal component direction vector, an interpretation variance and an interpretation variance proportion of the first principal component direction vector, a second principal component direction vector, an interpretation variance and an interpretation variance proportion of the second principal component direction vector, and brightness mean values and brightness variance on each base channel, wherein the first principal component direction vector represents the central axis direction of a four-dimensional distribution shape formed by probe scattered points of any type of probes in a four-dimensional base channel space, and the second principal component direction vector represents the minor central axis direction of the four-dimensional distribution shape formed by probe scattered points of any type of probes in the four-dimensional base channel space; The base brightness Gaussian distribution characteristic comprises at least one of a brightness mean vector based on Gaussian distribution and a covariance matrix based on Gaussian distribution.
- 6. The method for identifying bases based on a breeding chip according to claim 5, wherein, for any type of probe, performing principal component analysis based on the brightness value of each target probe scatter on each base channel, obtaining the base channel brightness principal component characteristics comprises: Forming a brightness data matrix by brightness values of all target probe scattered points in any type of probes on all base channels, wherein the first dimension of the brightness data matrix represents the probe scattered point marks, and the second dimension represents the brightness on all base channels; calculating a luminance covariance matrix based on the luminance data matrix, wherein the luminance covariance matrix represents a linear correlation between the luminances of the four base channels; Based on the brightness covariance matrix, performing feature decomposition to obtain a plurality of feature values, taking a feature vector corresponding to the largest feature value in the plurality of feature values as a first principal component direction vector, and taking a feature vector corresponding to the next largest feature value in the plurality of feature values as a second principal component direction vector; And calculating an interpretation variance and an interpretation variance proportion of the first principal component direction vector based on the first principal component direction vector, and calculating an interpretation variance and an interpretation variance proportion of the second principal component direction vector based on the second principal component direction vector.
- 7. The method for identifying bases based on a breeding chip according to claim 5, wherein the step of performing gaussian distribution fitting based on the brightness values of the scattered points of each target probe on each base channel to obtain the base brightness gaussian distribution characteristics comprises: For any type of probe, fitting the brightness values of all target probe scattered points in any type of probe on all base channels by adopting multi-dimensional Gaussian distribution to obtain the central position of the multi-dimensional Gaussian distribution and a covariance matrix of the multi-dimensional Gaussian distribution, and taking the central position of the multi-dimensional Gaussian distribution as a brightness average vector based on Gaussian distribution.
- 8. The breeding chip-based base recognition method according to claim 1, wherein the method comprises: Acquiring a training data set, wherein each training sample in the training data set comprises scattered data sample characteristics of a type of probes and base labels of detection sample sites corresponding to the type of probes; Based on the training dataset, the base recognition model is trained.
- 9. The breeding chip-based base recognition method according to claim 8, further comprising: Obtaining a base tag of the detection sample site; Wherein obtaining the base tag of the test sample site comprises: Screening sample sites meeting the conditions from sequencing data in whole genome sequencing as detection sample sites, wherein the sequencing data comprises base identification data of each sample site and base identification quality values of each sample site, and the conditions comprise at least one of the following conditions that the base identification quality values are larger than preset quality values, the sequencing depth is larger than the preset sequencing depth, and the ratio of the sum of the maximum number of bases and the next-to-multiple number of bases is larger than a first preset ratio; when the maximum base ratio of the detection sample site exceeds a second preset ratio, determining that the detection sample site is a homozygous sample, and determining that the label of the detection sample site is a homozygous sample label of a base type corresponding to the maximum base ratio; And when the maximum base ratio of the detection sample site is smaller than or equal to a second preset ratio, determining that the detection sample site is a heterozygous sample, and determining that the label of the detection sample site is a heterozygous sample label, wherein the heterozygous sample label is formed by a base type corresponding to the maximum base ratio and a base type corresponding to the next-largest base ratio.
- 10. A scanning device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 9.
- 11. A computer readable storage medium storing a computer program, which when executed by a processor causes the processor to perform the steps of the method according to any one of claims 1 to 9.
Description
Base identification method based on breeding chip, scanning equipment and storage medium Technical Field The invention relates to the technical field of breeding, in particular to a base identification method based on a breeding chip, scanning equipment and a computer readable storage medium. Background With the rapid development of molecular biology technology, high-throughput molecular marker detection has become one of the core technologies of modern agricultural breeding. The breeding chip is used as a key tool, can detect tens of thousands of Single Nucleotide Polymorphism (SNP) loci at a time, and greatly accelerates the efficiency of the breeding processes such as genotyping, genetic map construction, important trait gene positioning, whole genome selection and the like. In the practical application of the breeding chip, a fluorescent image after the hybridization reaction of the chip is required to be obtained through special scanning equipment. Fluorescent signal points (hereinafter referred to as "fluorescent points") included in the image. The intensity information of each fluorescent spot directly corresponds to a specific molecular marker and genotype information thereof. Therefore, the rapid and accurate positioning and quantification of each fluorescent spot in the image are the primary premise and key technical links for ensuring the accurate and reliable genotyping result. However, in the whole preparation and detection process of the agricultural breeding chip, some instability exists in the probe signal, such as 1) part of the probe has low signal to noise ratio, 2) part of the probe has certain preference, and 3) the heterozygous and homozygous boundary lines are fuzzy, which can influence the accuracy of base identification of detection sites. Disclosure of Invention In order to solve the existing technical problems, the invention provides a base identification method scanning device based on a breeding chip and a computer readable storage medium, which can greatly improve the accuracy, stability and applicability of base identification. According to the first aspect, a base identification method based on a breeding chip is provided, and the base identification method comprises the steps of obtaining fluorescent images of a plurality of different base channels shot by the breeding chip, obtaining brightness values of scattered points of each target probe in each type of probes on each base channel based on the fluorescent images, extracting scattered point data characteristics of each type of probes based on the brightness values of the scattered points of each target probe in each base channel, forming an input of a pre-trained base identification model based on the scattered point data characteristics of each type of probes, and outputting base data of target detection sites detected by each type of probes. In a second aspect, there is provided a scanning device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the breeding chip-based base identification method provided by the embodiment of the application. In a third aspect, a computer-readable storage medium is provided, in which a computer program is stored, which when executed by a processor causes the processor to perform the steps of the breeding chip-based base identification method provided by the embodiment of the application. The application acquires fluorescent images, acquires the brightness value of each target probe scattered point on each base channel in each type of probe based on the fluorescent images, extracts the scattered point data characteristic of each type of probe based on the brightness value of each target probe scattered point on each base channel in each type of probe, and the scattered point data characteristic represents the base brightness characteristic of a target detection site captured by the type of probe, and forms the base data of the input and output target detection sites of a pre-training base recognition model based on the scattered point data characteristic of each type of probe. Drawings FIG. 1 is a schematic diagram of the scatter distribution of two probes; FIG. 2 is an application environment diagram of a breeding chip-based base recognition method in an embodiment; FIG. 3 is a flow chart of a breeding chip-based base recognition method in one embodiment; FIG. 4 is a plot of the scatter of the probe before and after pretreatment in one embodiment; FIG. 5 is a schematic diagram showing a comparison of the abnormal data points before and after the abnormal data points are removed in an embodiment; FIG. 6 is a schematic diagram of a breeding chip-based base recognition device in one embodiment; fig. 7 is a schematic structural diagram of a scanning device in an embodiment. Detailed Description The technical scheme of the invention is further elaborated below by referring to the draw