CN-122024988-A - Automatic generation method and equipment for gene detection report
Abstract
The invention discloses an automatic generation method and equipment of a gene detection report, wherein the method comprises the steps of obtaining gene detection result data; forming prompt word input data of a pre-trained report generation model based on the gene detection result data, and outputting a target interpretation report through the report generation model based on the prompt word input data.
Inventors
- LI YING
- JIA HUIJUE
Assignees
- 杭州斐然微科生物科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251128
Claims (10)
- 1. An automatic generation method of a gene detection report, comprising: obtaining gene detection result data; forming prompt word input data of a pre-trained report generation model based on the gene detection result data; and outputting a target interpretation report through the report generation model based on the prompt word input data.
- 2. The method of automatically generating a report of gene testing according to claim 1, wherein the forming of the prompter input data of the pre-trained report generating model based on the gene testing result data comprises: analyzing the gene detection result data to obtain a biological information analysis result; acquiring target metadata of a detection item; Querying a target knowledge piece associated with the target metadata based on the target metadata; And generating the prompt word input data based on the biological information analysis result, the target metadata and the target knowledge piece.
- 3. The method of automatically generating a gene detection report according to claim 2, wherein the querying a target knowledge piece associated with the target metadata based on the target metadata comprises: Determining a target knowledge base matched with the field corresponding to the target metadata based on the target metadata; Splicing the target metadata with the gene detection result data to obtain a query statement; Searching one or more knowledge segments which are most similar to the semantic similarity of the query statement in the target knowledge base to obtain the target knowledge segments.
- 4. The method of claim 2, wherein generating the hint word input data based on the biological information analysis result, the target metadata, and the target knowledge piece comprises: matching a target prompt word template matched with the target metadata from a prompt word template library; and filling part of contents in the target prompt word template based on the biological information analysis result, the target metadata and the target knowledge fragment to obtain the prompt word input data.
- 5. The method of automatically generating a genetic test report according to claim 1, wherein the outputting a target interpretation report based on the prompt word input data by the report generation model comprises: Generating a to-be-trimmed interpretation report through the report generation model based on the prompt word input data, wherein the to-be-trimmed interpretation report comprises an initial interpretation report and a report trimmed from the initial interpretation report; acquiring a verification prompt word template; Filling the verification prompt word template based on the interpretation report to be fine-tuned, the biological information analysis result, the target metadata and the target knowledge segment, and generating verification input data; Forming the input of the report generation model based on the checking input data, outputting a fine-tuned interpretation report, returning to obtain a checking prompt word template when the fine-tuned interpretation report is determined to be continuously adjusted, and taking the finally-adjusted output report as an adjusted interpretation report when the fine-tuned interpretation report is determined not to be continuously adjusted; And generating a verification interpretation report based on the initial interpretation report and the adjustment interpretation report, and generating a target interpretation report based on the verification interpretation report.
- 6. The method of claim 5, wherein generating the verification interpretation report based on the initial interpretation report and the adjustment interpretation report, and generating the target interpretation report based on the verification interpretation report, comprises at least one of: Calculating the text similarity between sentences in the initial interpretation report and sentences in the adjustment interpretation report under the same subject content, and replacing sentences with sentences corresponding to the sentences in the adjustment interpretation report, wherein the text similarity in the initial interpretation report is lower than a preset similarity threshold; replacing non-standard terms in the initial interpretation report with standard terms; And labeling the modified sentences in the initial interpretation report.
- 7. The method of claim 5, wherein generating a target interpretation report based on the verification interpretation report comprises at least one of: converting at least part of digital data in the gene detection result data into a chart, and embedding the chart into the verification interpretation report; generating a species composition map based on species data in the gene detection result data; When the verification reading report triggers a manual auditing condition, the associated data of the verification reading report is sent to terminal equipment of an expert user so that the expert user can audit the verification reading report and obtain a modified verification reading report.
- 8. The automatic generation method of a gene detection report according to claim 1, wherein the method further comprises: Acquiring a training data set, wherein each training sample in the training data set comprises detection sample data and a label report for the detection sample data; Forming the input of a report generating model in training based on the detection sample data, and outputting a current training report corresponding to the detection sample data; and training a report generation model based on the text difference degree between the current training report and the label report.
- 9. The method of claim 8, wherein training a report generation model based on a text degree of difference between the current training report and the tag report comprises: in each training process, calculating the text difference degree between the current training report and the label report to obtain the current text difference loss; traversing each word in the current training report, acquiring words belonging to non-standard terms, and determining penalty loss of the current term according to the words of the non-standard terms; calculating a current total loss value based on the current text difference loss and the current term penalty loss; And when the current total loss value is greater than the preset loss threshold value, executing back propagation, updating parameters in the report generation model in training, and continuing training the report generation model.
- 10. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory, the processor of the at least one computing device to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 1-9.
Description
Automatic generation method and equipment for gene detection report Technical Field The invention relates to the technical field of genes, in particular to an automatic generation method and equipment of a gene detection report. Background With the continuous maturation and continuous reduction of cost of high-throughput sequencing technology, microbial gene detection has gradually moved from basic research to clinical diagnosis, environmental monitoring, food safety, public health and other practical application scenarios. The high-throughput sequencing technology can detect genetic materials of all microorganisms in a sample in an unbiased and high-coverage manner, so that the microbiome composition of a target region, including the variety, abundance and functional potential of various microorganisms such as bacteria, viruses, fungi and archaea, can be comprehensively and accurately analyzed. The capability makes the microbial community have great value in the fields of outbreak tracing of infectious diseases, hospital infection control, environmental ecological assessment, tracing of food pollution, research on association of microbiome and host health and the like. Typical microbial gene detection procedures generally include the key steps of sample collection, nucleic acid (DNA/RNA) extraction, library construction, high throughput sequencing, bioinformatics analysis, and final detection report generation. The bioinformatics analysis stage involves multiple complex steps of quality control, sequence alignment or taxonomic annotation, species abundance calculation, function prediction, statistical evaluation and the like of original sequencing data, and automation and standardization are gradually realized. Report writing at the end of the flow is a key step of connecting the data analysis result with user reading, and directly affects the understandability and clinical or application value of the detection result. However, the writing of the detection report is still mostly done manually. At present, the generation of microbial gene detection reports relies primarily on either manual writing or semi-automated template filling systems. In the manual writing mode, a professional technician needs to write contents such as detection conclusion, clinical meaning reading, recommended measures and the like one by one according to multi-dimensional information such as sequencing data, species annotation, function prediction, drug resistance gene analysis and the like and combining medical or industry knowledge. In a semi-automatic system, a preset report template is generally adopted, and the analysis result is filled into corresponding fields, such as a species abundance form, a pathogen list, a drug-resistant gene detection condition and the like, through a rule engine or a script program. The report generating system based on rules or templates can only realize gap-filling output and cannot conduct comprehensive reasoning according to the logical relation between detection results. For example, when multiple potential pathogens and their corresponding drug-resistant genes are detected simultaneously, it is difficult for the system to automatically generate comprehensive assessments that have clinical guideline significance, such as infection risk level judgment, treatment recommendation prioritization, and the like. And the attention points of different application scenes (such as a hospital ICU, a food enterprise or an environment monitoring station) to the report content are obviously different, and the existing templatization system is difficult to dynamically adapt to the user requirements. In addition, when anomalies or contradictions occur in the test results (e.g., low abundance but highly pathogenic species are detected), the system fails to provide a reasonable interpretation or confidence indication, making it difficult for the user to determine the reliability of the results. The report content needs to be iterated continuously as new pathogens appear, drug resistance mechanisms evolve and clinical guidelines are updated. However, conventional rule engines rely on manual writing and maintenance of large if-else logic or keyword mapping tables, which makes it difficult to respond quickly to knowledge updates. Once the detection index or the interpretation dimension is newly added, the template structure is often required to be redesigned, even the whole report generation flow is reconfigured, and the system flexibility is severely limited. Disclosure of Invention In order to solve the existing technical problems, the invention provides the automatic generation method and the computing equipment for the gene detection report, which can realize standardized output, eliminate subjective differences read by different experts, ensure that all users obtain reports with uniform quality, and form different prompt word input data based on different users, thereby realizing the generation of reports with d