CN-121483390-B - MRNA vaccine sequence design system and method based on large language model intelligent agent
Abstract
The invention discloses an mRNA vaccine sequence design system and method based on a large language model intelligent agent, wherein the mRNA vaccine sequence design system based on the large language model intelligent agent comprises the following steps: the system comprises a user interaction and task analysis module, a task planning and intelligent agent scheduling module, an RNA design skill library and skill description module, a multi-objective evaluation and closed-loop optimization module and a data and model management module. The invention automatically completes the generation, evaluation and optimization of the mRNA vaccine sequence aiming at the target antigen in a computer environment by integrating a large language model with various mRNA sequence analysis and optimization tools, is suitable for the design and screening of candidate sequences of preventive or therapeutic mRNA vaccines, improves the design efficiency and reduces the degree of programming depending on manual experience and manual script.
Inventors
- ZHANG RUIQI
- Tang Yurou
- CAI MENGMENG
- LI SHIBO
Assignees
- 微观纪元(合肥)量子科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260112
Claims (9)
- 1. The mRNA vaccine sequence design system based on the large language model intelligent agent is characterized by comprising a user interaction and task analysis module, a task planning and intelligent agent scheduling module, an RNA design skill library and skill description module, a multi-objective evaluation and closed-loop optimization module and a data and model management module; The user interaction and task analysis module is used for receiving vaccine design parameters, and analyzing and converting the vaccine design parameters through a large language model to obtain task description objects; The task planning and intelligent agent scheduling module is connected with the user interaction and task analysis module and the RNA design skill library and skill description module, and is used for constructing a DAG workflow according to the task description object and the skills in the RNA design skill library and skill description module, calling the skills in the RNA design skill library and skill description module through an intelligent agent to generate candidate mRNA sequences, and dynamically adjusting the skills selected from the RNA design skill library and skill description module according to the comprehensive scoring result in the execution process of the DAG workflow; The RNA design skill library and skill description module is connected with the multi-objective evaluation and closed-loop optimization module and the data and model management module and is used for packaging mRNA molecular design and multidimensional characteristic evaluation functions to obtain a plurality of skill modules, and uniformly defining, registering and managing each skill module; The multi-objective evaluation and closed-loop optimization module is connected with the task planning and intelligent agent scheduling module and is used for carrying out multi-dimensional index analysis and comprehensive evaluation on the candidate mRNA sequences to obtain the comprehensive scoring result, and feeding back the comprehensive scoring result to the task planning and intelligent agent scheduling module; The data and model management module is connected with the multi-objective evaluation and closed-loop optimization module and is used for managing and storing system data and model parameters and tool configuration related to each RNA design skill; The task planning and intelligent agent scheduling module comprises: The task planning sub-module is connected with the user interaction and task analysis module and the RNA design skill library and skill description module and is used for inquiring related skill sets from the RNA design skill library and skill description module according to the task description object and constructing the DAG workflow by combining the dependency relationship, the input-output compatibility and the resource constraint among the skills; The intelligent agent decision sub-module is connected with the task planning sub-module and is used for analyzing the intermediate result through a large language model in the execution process of the DAG workflow to obtain an evaluation result, and dynamically adjusting skill calling sequence, parameter configuration and optimization strategy according to the evaluation result; The scheduling and execution coordination sub-module is connected with the RNA design skill library and skill description module and the multi-objective evaluation and closed-loop optimization module and is used for issuing a calling task, converting data format of input data of skill calling and sending skill calling results and execution logs to the multi-objective evaluation and closed-loop optimization module; The system data comprises the task description object and the candidate mRNA sequence, and the calling task is to call a tool interface in the RNA design skill library and skill description module.
- 2. The large language model intelligent agent-based mRNA vaccine sequence design system of claim 1, wherein the user interaction and task parsing module comprises: the input receiving submodule is used for receiving vaccine design parameters input through natural language; the task semantic analysis submodule is connected with the input receiving submodule and is used for carrying out semantic analysis on the vaccine design parameters through a large language model to obtain a structured analysis result; the task configuration construction submodule is connected with the task semantic analysis submodule and is used for converting the analysis result into the task description object; wherein the vaccine design parameters include target antigen information, indication scenario and design constraints.
- 3. The mRNA vaccine sequence design system based on the large language model intelligent agent of claim 2, wherein the RNA design skill library and skill description module comprises: A skill description sub-module, configured to define a structured skill description object for each skill module in the skill library; The skill registration and discovery sub-module is connected with the skill description sub-module and is used for providing a skill registration interface, skill retrieval and skill filtration and verifying the skill description object in the skill registration process; The skill call interface sub-module is connected with the multi-objective evaluation and closed-loop optimization module and the data and model management module and is used for packaging a unified call interface, providing a standardized call method, packaging the skill call result into a unified result object and sending the result object to the multi-objective evaluation and closed-loop optimization module and the data and model management module; The structured skill description object comprises at least three of skill identification and version information, applicable object types, input parameter definition, output result definition, dependency relationship, performance and resource information and calling modes.
- 4. The large language model intelligent agent-based mRNA vaccine sequence design system of claim 1, wherein the multi-objective evaluation and closed loop optimization module comprises: A candidate sequence management sub-module, configured to manage candidate mRNA sequences generated in each iteration round, assign unique identifiers to the candidate mRNA sequences, and maintain a candidate sequence state set; The multi-index evaluation sub-module is used for calling evaluation class skills in the RNA design skill library to carry out multi-dimensional index analysis on the candidate mRNA sequences, obtaining a plurality of typical indexes, converting the typical indexes corresponding to the candidate mRNA sequences into EvaluationRecord objects in a standardized form and sending the EvaluationRecord objects to the data and model management module; The multi-target comprehensive evaluation submodule is connected with the multi-index evaluation submodule and used for mapping a plurality of typical indexes into comprehensive scoring results through a preset scoring strategy according to task weights and constraint conditions; The closed-loop optimization control sub-module is connected with the multi-target comprehensive evaluation sub-module and the task planning and intelligent agent scheduling module, and is used for screening the candidate mRNA sequences according to the comprehensive evaluation result, constructing feedback abstract information for the screened mRNA sequences, sending the feedback abstract information to the task planning and intelligent agent scheduling module, and controlling the termination condition of the iterative process; The candidate sequence state set comprises a candidate set, a current main selection set and a historical elimination set, and the termination condition comprises reaching a preset scoring threshold value, an upper limit of the number of iteration rounds and insufficient improvement amplitude.
- 5. The large language model intelligent agent-based mRNA vaccine sequence design system of claim 4, the data and model management module comprising: The task and configuration management sub-module is used for storing task description objects corresponding to each design task and recording time points of task creation, updating and execution; The sequence and evaluation result storage submodule is used for storing the candidate mRNA sequence and the EvaluationRecord objects corresponding to the candidate mRNA sequence and supporting the search of the candidate mRNA sequence through screening conditions, wherein the screening conditions comprise task ID, round number and scoring interval; The skill and model version management sub-module is used for managing the registration information and version information of the RNA design skills, recording the history change of each RNA design skill and managing the version and parameters of each model in the system; And the log and audit sub-module is used for recording log information and supporting audit and quality analysis of the design process.
- 6. The large language model intelligent agent-based mRNA vaccine sequence design system according to claim 3, further comprising an experimental feedback integration module; The experimental feedback integration module is connected with the multi-objective evaluation and closed-loop optimization module and the task planning and intelligent agent scheduling module and comprises an experimental data receiving sub-module, a data standardization and mapping sub-module and a feedback interface sub-module; wherein the experimental data receiving submodule is used for receiving in vitro or in vivo experimental data of the mRNA vaccine candidate sequence; The data standardization and mapping sub-module is connected with the experimental data receiving sub-module and is used for converting the experimental data into a uniform data structure and mapping the data structure into experimental feedback parameters for updating a multi-target evaluation strategy; The feedback interface sub-module is connected with the data standardization and mapping sub-module, the multi-objective evaluation and closed-loop optimization module and the task planning and intelligent agent scheduling module and is used for sending the experimental feedback parameters to the multi-objective evaluation and closed-loop optimization module and the task planning and intelligent agent scheduling module; The experimental data include expression level, immune response intensity and safety index.
- 7. The large language model intelligent agent-based mRNA vaccine sequence design system of claim 1, further comprising a code generation and execution module; the code generation and execution module comprises a code template and generation sub-module, a safe execution environment sub-module and a result analysis and error processing sub-module; the code template and generation submodule is used for maintaining a plurality of script templates related to RNA design skills, and generating or complementing script codes by a large language model according to calling modes and parameter definitions in skill description objects; the safe execution environment submodule is used for running the script generated by the code generation submodule in the isolated execution environment and recording standard output, error information and return value in the script running process; The result analysis and error processing sub-module is used for analyzing a result file or standard output generated by script operation, and is also used for sorting error information into prompts when errors or anomalies are detected, and giving correction suggestions by a large language model or automatically generating a repaired script.
- 8. A method for designing mRNA vaccine sequences based on a large language model intelligent agent, which is applied to the mRNA vaccine sequence designing system based on a large language model intelligent agent according to any one of claims 1 to 7, comprising: the method comprises the steps of obtaining vaccine design parameters, and sending the vaccine design parameters to a user interaction and task analysis module, wherein the user interaction and task analysis module analyzes and converts the vaccine design parameters through a large language model to obtain a task description object; the task planning and intelligent agent scheduling module constructs a DAG workflow according to the task description object, the RNA design skill library and the skills in the skill description module, and invokes the skills in the RNA design skill library and the skills description module through the intelligent agent to generate candidate mRNA sequences; The multi-objective evaluation and closed-loop optimization module performs multi-dimensional index analysis and comprehensive evaluation on the candidate mRNA sequences to obtain comprehensive scoring results, and feeds the comprehensive scoring results back to the task planning and intelligent agent scheduling module; Judging whether the candidate mRNA sequence meets a preset design constraint or convergence condition; if yes, outputting the current candidate mRNA sequence; If the multi-objective evaluation and closed-loop optimization module does not meet the requirement, the intelligent agent modifies, replaces or regenerates the candidate mRNA sequence based on the comprehensive scoring result to form a new round of candidate mRNA sequence, and returns to the step of carrying out multi-dimensional index analysis and comprehensive evaluation on the candidate mRNA sequence by the multi-objective evaluation and closed-loop optimization module; And in the execution process of the DAG workflow, the task planning and intelligent agent scheduling module dynamically adjusts skills selected from the RNA design skill library and the skill description module according to the comprehensive scoring result.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the computer program when executed by the processor implements the large language model intelligent agent-based mRNA vaccine sequence design method of claim 8.
Description
MRNA vaccine sequence design system and method based on large language model intelligent agent Technical Field The invention relates to the technical field of computers, in particular to an mRNA vaccine sequence design system based on a large language model intelligent agent, an mRNA vaccine sequence design method based on the large language model intelligent agent and electronic equipment. Background The development of mRNA vaccines for a given pathogen or tumor-associated target typically involves several links, such as antigen screening and design, T-cell and B-cell epitope prediction, mRNA sequence engineering optimization (including coding and non-coding regions), delivery system design, and in vitro and in vivo validation. The calculation design and optimization of the mRNA sequence are key steps for determining the expression efficiency, stability and safety, and have important influence on the final effect of the vaccine. In the sequence design stage of mRNA vaccines, related technical routes are generally calculated from target protein or polypeptide sequences according to the following thinking that potential protective epitopes are firstly screened from pathogen proteins or tumor antigens by immunoinformatics and structural biology methods, and the binding affinities of candidate peptide fragments to multiple HLA alleles are predicted by using MHC I/II binding prediction tools (such as NETMHCPAN, NETMHCIIPAN and multiple algorithms on the IEDB platform) to evaluate the potential of T cell immune response. And then, under the premise of keeping the amino acid sequence of the protein unchanged, carrying out codon optimization on an mRNA coding region so as to improve translation efficiency and protein expression level, wherein common indexes comprise a codon adaptability index (Codon Adaptation Index, CAI), GC content distribution, a codon pair mode, avoidance of certain unfavorable dinucleotide frequencies and the like, and partially researching and adopting a deep generation model or a special optimization algorithm to search and optimize a codon combination. Then, the local and the whole structural stability is estimated by an RNA secondary structure prediction tool (such as RNAfold and the like in VIENNARNA kits), for example, an algorithm such as LINEARDESIGN which is proposed in recent years regards mRNA sequence design as an optimization problem which takes structural stability and codon usage into consideration in a huge combination space, and the stability and the expression performance of the full-length mRNA are obviously improved by a dynamic programming method and the like. Finally, the tail and head cap structures of the 5'UTR, the 3' UTR, the poly (A) and the head cap are engineered to improve ribosome loading, translation initiation and mRNA half-life, and immunogenicity, stability and expression level can be balanced by introducing N1-methyl pseudouridine and other chemical modification, optimizing local structures and the like. However, this related art route tends to result in longer design cycles, and multiple target conflicts (e.g., TE >80% versus off-target < 5%) are difficult to trade off efficiently. Meanwhile, in order to support the above design route, various computing tools and partially integrated platforms have been developed in the related art. For example, structure and stability optimization tools, epitope and immunogenicity prediction tools, mRNA vaccine design workflow and Web platforms. The tools and the platforms improve the calculation efficiency of mRNA vaccine design to a certain extent, so that researchers can quickly obtain candidate mRNA sequences and perform primary screening in expression, stability, immunogenicity and the like. Although tools and platforms in related technologies improve the calculation efficiency of mRNA vaccine design to a certain extent, so that researchers can quickly obtain candidate mRNA sequences and perform preliminary screening, limitations still exist in the prior art that 1, tools are dispersed, interface heterogeneous mRNA vaccine design involves a plurality of calculation tools and models such as epitope prediction, codon optimization, secondary structure prediction, stability evaluation, immunogenicity and safety evaluation, the tools exist in the form of independent Web services, command line software or script libraries, interface formats and input and output data structures are different, researchers need to write scripts manually for interfacing and data conversion, the use threshold is high and error-prone, and the manual dry preemption ratio is generally greater than 50%, 2, workflow is fixed, and the design flow in the related technologies of flexible scheduling is usually fixed by adopting fixed 'linear workflow', and the tool combination mode and calling sequence are preset in the development stage of the platform. In the face of personalized requirements (such as heavy immunogenicity, safety or expression efficie