CN-121983147-A - Abnormal data positioning method of gene sequencing system, computer equipment and computer readable storage medium
Abstract
The disclosure provides an abnormal data positioning method, computer equipment and a computer readable storage medium of a gene sequencing system, wherein the method comprises the steps of obtaining gene sequencing data obtained by sequencing a sample library of one or more samples by the gene sequencing system, wherein the gene sequencing data comprises a gene fragment sequence of the sample library, imaging unit information of the gene fragment sequence and position information of the gene fragment sequence in a sequencing image corresponding to the imaging unit, mapping and comparing the gene fragment sequence of the sample library with a reference sequence to obtain a mapping and comparing result of the gene fragment sequence, determining an abnormal gene fragment sequence from the gene fragment sequence according to the mapping and comparing result, determining an imaging unit of the abnormal gene fragment sequence and positions in the sequencing image corresponding to the imaging unit according to the gene sequencing data, marking abnormal light spots in the sequencing image, and obtaining a sequencing image marked with the abnormal light spots.
Inventors
- CAI KEYA
- WANG YUYAO
- YUAN JINGXIAN
- WANG DANYANG
Assignees
- 郑州思昆生物工程有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260123
Claims (10)
- 1. A method for locating abnormal data of a gene sequencing system, comprising: obtaining gene sequencing data obtained by sequencing a sample library of one or more samples by using a gene sequencing system, wherein the gene sequencing data comprises a gene fragment sequence of the sample library, imaging unit information of the gene fragment sequence and position information of the gene fragment sequence in a sequencing image corresponding to the imaging unit; Mapping and comparing the gene fragment sequences of the sample library with a reference sequence to obtain mapping and comparing results of the gene fragment sequences, and determining abnormal gene fragment sequences from the gene fragment sequences according to the mapping and comparing results; determining an imaging unit where the abnormal gene fragment sequence is located and the position of the abnormal gene fragment sequence in a sequencing image corresponding to the imaging unit according to the gene sequencing data, and marking abnormal light spots at the position of the abnormal gene fragment sequence in the sequencing image corresponding to the imaging unit to obtain a sequencing image marked with the abnormal light spots.
- 2. The method of claim 1, wherein determining abnormal gene segment sequences from the gene segment sequences based on the mapping and alignment results comprises at least one of: Determining the gene fragment sequence as the abnormal gene fragment sequence under the condition that the mapping and comparison result of the gene fragment sequence indicates that the comparison is unsuccessful; And under the condition that the mapping and comparison results of the gene fragment sequences indicate that the comparison is successful, counting the number of bases which are not compared in the gene fragment sequences and the reference sequences, and if the number of bases which are not compared is larger than a preset number threshold, determining the gene fragment sequences as the abnormal gene fragment sequences.
- 3. The method of claim 1 or 2, wherein the gene sequencing data further comprises at least one of source information about sequences of gene fragments, upper and lower surfaces of the sequencing chip, flow channels of the sequencing chip, side columns of the flow channels, cameras for photographing.
- 4. A method according to claim 3, characterized in that the method further comprises: before mapping and comparing the gene fragment sequences of the sample library with reference sequences, grouping the gene fragment sequences by taking at least one source information as a dimension of grouping summarization to obtain a plurality of groups of gene fragment sequences; Mapping and comparing the gene fragment sequences of the sample library with a reference sequence to obtain mapping and comparing results of the gene fragment sequences, and determining abnormal gene fragment sequences from the gene fragment sequences according to the comparing results, wherein the mapping and comparing results comprise the following steps: Mapping and comparing the gene segment sequences in each group with a reference sequence to obtain mapping and comparing results of the gene segment sequences in each group, and determining abnormal gene segment sequences in the gene segment sequences in each group according to the mapping and comparing results.
- 5. A method according to claim 3, characterized in that the method further comprises: after mapping and comparing the gene fragment sequences of the sample library with reference sequences, grouping the abnormal gene fragment sequences by taking at least one source information as a grouping summarization dimension to obtain a plurality of groups of abnormal gene fragment sequences; Determining an imaging unit where the abnormal gene fragment sequence is located and a position of the abnormal gene fragment sequence in a sequencing image corresponding to the imaging unit according to the gene sequencing data, and marking abnormal light spots at the position of the abnormal gene fragment sequence in the sequencing image corresponding to the imaging unit to obtain a sequencing image marked with the abnormal light spots, wherein the method comprises the following steps: And respectively determining an imaging unit in which the abnormal gene fragment sequences in each group are located and the positions of the abnormal gene fragment sequences in each group in a sequencing image corresponding to the imaging unit according to the gene sequencing data, and marking abnormal light spots at the positions of the abnormal gene fragment sequences in each group in the sequencing image corresponding to the imaging unit to obtain the sequencing image marked with the abnormal light spots.
- 6. The method according to any one of claims 1 to 5, wherein marking the abnormal spot at the position of the abnormal gene fragment sequence in the sequencing image corresponding to the imaging unit, to obtain the sequencing image marked with the abnormal spot, comprises: For the situation that the abnormal gene segment sequences are not successfully compared, selecting one sequencing image from all sequencing images of all sequencing cycles corresponding to an imaging unit, and marking abnormal light spots at the positions of the abnormal gene segment sequences in the selected sequencing image to obtain a sequencing image marked with the abnormal light spots; Or alternatively And for the situation that the abnormal gene segment sequence is successfully compared but the number of bases which are not compared is greater than a preset number threshold, selecting a sequencing image of a sequencing cycle with wrong bases from all sequencing images of all sequencing cycles corresponding to an imaging unit, and marking abnormal light spots at the positions of the abnormal gene segment sequence in the selected sequencing image to obtain a sequencing image marked with the abnormal light spots.
- 7. The method according to any one of claims 2-6, further comprising: Partitioning the sequencing image marked with the abnormal light spots to obtain a plurality of sub-images; Counting the sequence of the unmatched abnormal gene fragments existing in the sub-images and the number or proportion of bases which are successfully compared but not aligned according to different abnormal types; And establishing an abnormality indication heat map based on the number or proportion of abnormal gene segment sequences of different abnormality types in the sub-image, wherein the abnormality indication heat map comprises abnormality indication information respectively corresponding to the sub-image, and the abnormality indication information characterizes the unmatched abnormal gene segment sequences and the number or proportion of bases which are successfully and unmatched in the sub-image through pixel values.
- 8. The method according to any one of claims 2-6, further comprising: counting the number or proportion of abnormal gene fragment sequences existing in the imaging unit according to different abnormal types; Comparing the number or proportion of the abnormal gene segment sequences with a set threshold value, and determining the overall data quality of the imaging unit according to the comparison result.
- 9. A computer device comprising a processor, a memory storing machine-readable instructions executable by the processor for executing the machine-readable instructions stored in the memory, which when executed by the processor, perform the steps of the method for locating abnormal data of a gene sequencing system according to any one of claims 1 to 8.
- 10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a computer device, performs the steps of the abnormal data localization method of the gene sequencing system according to any one of claims 1 to 8.
Description
Abnormal data positioning method of gene sequencing system, computer equipment and computer readable storage medium Technical Field The disclosure relates to the technical field of gene sequencing, in particular to an abnormal data positioning method, computer equipment and a computer readable storage medium of a gene sequencing system. Background The nucleic acid sequencing technology can ascertain the sequence of genetic material and is widely applied to the fields of molecular biology related research, genetic breeding, clinical diagnosis, drug research and development and the like. Genetic sequencing techniques are capable of simultaneously analyzing millions or even billions of sample libraries (e.g., nucleic acid fragments) to achieve high throughput sequencing. The sequencing system for carrying out gene sequencing has complex structure, high integration level and high quality data output, requires a sequencing chip, a reagent reaction, a temperature control system and an optical system to have higher stability, and simultaneously the sequencing system is used for identifying gene fragment sequences of different sample libraries to be tested, has higher sensitivity, and is a necessary path for improving the stability of the sequencing system by systematically evaluating and positioning the sequencing system based on gene sequencing data. The current evaluation mode can reflect the whole quality of the sequencing system, and cannot realize accurate problem positioning. Disclosure of Invention The embodiment of the disclosure at least provides an abnormal data positioning method, computer equipment and storage medium of gene sequencing data. In a first aspect, embodiments of the present disclosure provide a method for locating abnormal data of a gene sequencing system, including: obtaining gene sequencing data obtained by sequencing a sample library of one or more samples by using a gene sequencing system, wherein the gene sequencing data comprises a gene fragment sequence of the sample library, imaging unit information of the gene fragment sequence and position information of the gene fragment sequence in a sequencing image corresponding to the imaging unit; Mapping and comparing the gene fragment sequences of the sample library with a reference sequence to obtain mapping and comparing results of the gene fragment sequences, and determining abnormal gene fragment sequences from the gene fragment sequences according to the mapping and comparing results; determining an imaging unit where the abnormal gene fragment sequence is located and the position of the abnormal gene fragment sequence in a sequencing image corresponding to the imaging unit according to the gene sequencing data, and marking abnormal light spots at the position of the abnormal gene fragment sequence in the sequencing image corresponding to the imaging unit to obtain a sequencing image marked with the abnormal light spots. Optionally, the determining abnormal gene segment sequences from the gene segment sequences according to the alignment result includes at least one of the following: Determining the gene fragment sequence as the abnormal gene fragment sequence under the condition that the comparison result of the gene fragment sequence indicates that the comparison is not successful; And under the condition that the comparison result of the gene segment sequences indicates that the comparison is successful, counting the number of bases which are not compared in the gene segment sequences and the reference genome sequences, and if the number of bases which are not compared is larger than a preset data threshold, determining the gene segment sequences as the abnormal gene segment sequences. Optionally, the determining abnormal gene segment sequences from the gene segment sequences according to the mapping and the comparison result includes at least one of the following: Determining the gene fragment sequence as the abnormal gene fragment sequence under the condition that the mapping and comparison result of the gene fragment sequence indicates that the comparison is unsuccessful; And under the condition that the mapping and comparison results of the gene fragment sequences indicate that the comparison is successful, counting the number of bases which are not compared in the gene fragment sequences and the reference sequences, and if the number of bases which are not compared is larger than a preset number threshold, determining the gene fragment sequences as the abnormal gene fragment sequences. Optionally, the gene sequencing data further comprises at least one source information related to the sequence of the gene fragments, namely the upper surface and the lower surface of the sequencing chip, the flow channel of the sequencing chip, the side columns of the flow channel and a camera for photographing. Optionally, the method further comprises: before mapping and comparing the gene fragment sequences of the sample library with referen