CN-121999886-A - Intelligent agent-based biological informatics task automatic analysis method and system
Abstract
The invention provides an intelligent agent-based automatic analysis method and system for bioinformatics tasks, which relate to the technical field of bioinformatics, and the method comprises the steps of receiving a research target and a data storage path which are described by a user in natural language; the method comprises the steps of acquiring domain knowledge from a preset bioinformatics knowledge base, integrating the domain knowledge with a research target, driving a large language model to automatically generate a structured task sequence and corresponding executable analysis codes, submitting the executable analysis codes to a distributed computing resource platform for execution according to the task sequence, monitoring states and logs in real time in the execution process, generating a correction instruction set by carrying out semantic analysis and pattern recognition on the logs when abnormality is detected, and recovering and completing execution of the task sequence based on the instruction set. The invention realizes full-flow automation and intellectualization from natural language intention to analysis result, and remarkably improves analysis efficiency and reliability.
Inventors
- GU XIAOFENG
- TIAN JIAN
- MAO YICHAO
- Xie shang
- LI DONGWEI
- WANG WEIYIN
- WANG WENSHAN
- YAN RUONAN
Assignees
- 中国农业科学院生物技术研究所
Dates
- Publication Date
- 20260508
- Application Date
- 20260409
Claims (10)
- 1. An intelligent agent-based bioinformatics task automation analysis method is characterized by comprising the following steps: Receiving user input, the user input including a bioinformatics study goal described in natural language, and a data storage path for bioinformatics data to be analyzed; Acquiring domain knowledge from a preset biological information knowledge base based on the biological information study target and the data storage path, integrating the domain knowledge with the biological information study target to drive a large language model to generate a task sequence consisting of a plurality of atomic-level biological information analysis tasks with logical dependency relationships, and generating executable analysis codes suitable for each task in the task sequence; Submitting the executable analysis code to a distributed computing resource platform for scheduling and execution according to the task sequence; And monitoring the execution state and the running log of the executable analysis code in real time, carrying out semantic analysis and pattern recognition on the running log when the execution abnormality is detected to generate a group of instruction sets for correcting the task sequence, and recovering and completing the execution of the task sequence based on the instruction sets.
- 2. The agent-based bioinformatics task automation analysis method of claim 1 wherein during execution of the executable analysis code, the method further comprises: and storing the task sequence, the executable analysis code, the intermediate execution state of the executable analysis code and result data to form a traceable analysis context.
- 3. The automated analysis method of intelligent agent-based bioinformatics tasks according to claim 1, wherein the performing semantic analysis and pattern recognition on the running log specifically comprises: And matching the running log with a preset biological information error mode knowledge base, wherein the biological information error mode knowledge base comprises the mapping relation between error log characteristics and the instructions in the instruction set.
- 4. The automated analysis method of intelligent agent-based bioinformatics tasks of claim 1, wherein the integrating the domain knowledge with the bioinformatics research objective specifically comprises: And organizing the acquired domain knowledge and the bioinformatics research target into standardized prompt word input through structural prompt word engineering so as to drive a large language model to generate the task sequence.
- 5. The automated analysis method of agent-based bioinformatics tasks of claim 1, wherein submitting the executable analysis code to a distributed computing resource platform for scheduling and execution comprises: analyzing the executable analysis code into a plurality of independent computing jobs by a task execution engine; And submitting the plurality of independent computing jobs to the distributed computing resource platform for dependency scheduling and parallel execution according to the logic dependency relationship in the task sequence.
- 6. The automated analysis method of agent-based bioinformatics tasks of claim 1, wherein the user input further comprises descriptive information of a data type or an experimental type of the bioinformatics data to be analyzed; The obtaining domain knowledge from a preset bioinformatics knowledge base specifically includes: and preferentially acquiring domain knowledge matched with the description information.
- 7. The automated analysis method of agent-based bioinformatics tasks of claim 1, wherein the real-time monitoring of the execution status and the running log of the executable analysis code comprises: By constructing an execution script that encapsulates the executable analysis code, non-canonical error outputs and process states from different bioinformatics analysis tools are uniformly captured and analyzed.
- 8. An agent-based bioinformatics task automation analysis system, comprising: an input receiving module for receiving user input including a bioinformatics study goal described in natural language, and a data storage path of bioinformatics data to be analyzed; The planning decision module is used for acquiring domain knowledge from a preset biological information knowledge base based on the biological information study target and the data storage path, integrating the domain knowledge with the biological information study target, driving a large language model to generate a task sequence consisting of a plurality of atomic-level biological information analysis tasks with logic dependency relationships, and generating executable analysis codes which are suitable for each task in the task sequence; The task execution engine is used for submitting the executable analysis codes to a distributed computing resource platform for scheduling and execution according to the task sequence; The fault-tolerant processing module is used for monitoring the execution state and the running log of the executable analysis code in real time, carrying out semantic analysis and pattern recognition on the running log when the execution abnormality is detected to generate a group of instruction sets for correcting the task sequence, and recovering and completing the execution of the task sequence based on the instruction sets.
- 9. The agent-based bioinformatics task automation analysis system of claim 8 further comprising: And the user interaction module is used for providing a graphical interface for a user to review and modify after the planning decision module generates a task sequence and/or an analyzable code, and feeding back a modification result to the planning decision module or the task execution engine.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the automated analysis method of agent-based bioinformatics tasks of any one of claims 1 to 7 when the computer program is executed by the processor.
Description
Intelligent agent-based biological informatics task automatic analysis method and system Technical Field The invention relates to the technical field of computers and bioinformatics, in particular to an intelligent agent-based bioinformatics task automatic analysis method and system. Background In recent years, with the rapid development and wide application of high-throughput sequencing technology, data in the field of bioinformatics has developed a situation that presents an exponential growth. In the face of massive bioinformatics data, traditional analysis methods are highly dependent on manual operations of bioinformatics operators with specialized programming skills and profound domain knowledge. The manual analysis mode is low in efficiency, long in time consumption, easy to introduce human errors and poor in repeatability and expandability. To address this challenge, those skilled in the art have developed workflow management systems that attempt to automate a fixed analysis flow. However, these systems generally require a user to define a rigid and linear analysis script in advance, and have limited flow adaptability, so that it is difficult to cope with exploratory scientific research requirements that research targets are changeable and analysis strategies need to be dynamically adjusted. With the advent of artificial intelligence technology, particularly Large Language Models (LLMs), new technological paths are provided for solving the above-mentioned problems. Although some existing intelligent systems attempt to generate codes or provide analysis suggestions by using a large language model, when dealing with complex bioinformatics analysis flows with multiple steps and multiple cooperation, there are still technical bottlenecks such as insufficient compatibility to new tools, inability to effectively process unexpected errors in the analysis process, and difficulty in understanding high-level scientific research intentions of users. Disclosure of Invention The invention provides an intelligent agent-based biological informatics task automatic analysis method and system, which are used for solving the defects of low biological informatics analysis automation degree, stiff flow and poor fault tolerance in the prior art and realizing full-flow autonomous planning, execution and intelligent fault tolerance from a natural language target to an analysis result. The invention provides an intelligent agent-based biological informatics task automatic analysis method, which comprises the following steps: Receiving user input, the user input including a bioinformatics study goal described in natural language, and a data storage path for bioinformatics data to be analyzed; Acquiring domain knowledge from a preset biological information knowledge base based on the biological information study target and the data storage path, integrating the domain knowledge with the biological information study target to drive a large language model to generate a task sequence consisting of a plurality of atomic-level biological information analysis tasks with logical dependency relationships, and generating executable analysis codes suitable for each task in the task sequence; Submitting the executable analysis code to a distributed computing resource platform for scheduling and execution according to the task sequence; And monitoring the execution state and the running log of the executable analysis code in real time, carrying out semantic analysis and pattern recognition on the running log when the execution abnormality is detected to generate a group of instruction sets for correcting the task sequence, and recovering and completing the execution of the task sequence based on the instruction sets. According to the automated analysis method for the bioinformatics task based on the intelligent agent provided by the invention, in the execution process of the executable analysis code, the method further comprises the following steps: and persisting the task sequence, the executable analysis code, the intermediate execution state of the executable analysis code and result data to form a traceable analysis context. According to the automated analysis method of the biological informatics task based on the intelligent agent, which is provided by the invention, the semantic analysis and the pattern recognition are carried out on the operation log, and the method specifically comprises the following steps: And matching the running log with a preset biological information error mode knowledge base, wherein the biological information error mode knowledge base comprises the mapping relation between error log characteristics and the instructions in the instruction set. According to the automated analysis method for the bioinformatics task based on the intelligent agent, the integration of the domain knowledge and the bioinformatics research target specifically comprises the following steps: And organizing the acquired domain knowle