CN-121279274-B - Medical instrument multi-file long text generation method based on multiple agents

CN121279274BCN 121279274 BCN121279274 BCN 121279274BCN-121279274-B

Abstract

The invention provides a multi-agent-based medical instrument multi-file long text generation method, which relates to the technical field of medical instruments and comprises the steps of collecting medical instrument source information, constructing a knowledge graph, carrying out multi-level heterogeneous graph coding, forming a hierarchical semantic index, receiving document generation requirements, extracting task node vectors, generating task specific embedded representations, generating document paragraph contents by combining global constraint vectors, constructing semantic dependency graphs, and carrying out serialization fusion to obtain a complete medical instrument document, wherein the generation efficiency and the content accuracy of the medical instrument document can be improved, and the consistency and the integrity among the documents are enhanced.

Inventors

ZHENG YIHAO

Assignees

钰兔科技集团有限公司

Dates

Publication Date: 20260512
Application Date: 20251023

Claims (10)

1. The medical instrument multi-file long text generation method based on the multi-agent is characterized by comprising the following steps of: collecting medical instrument source information, extracting entities and association relations to construct a knowledge graph, carrying out multi-level heterogeneous graph coding on the knowledge graph, calculating neighborhood weighted representation of each node, determining a node embedded vector set by combining contrast learning, carrying out cluster analysis on the node embedded vector set to obtain a clustering result, and constructing a hierarchical semantic index; Receiving a document generation requirement, decomposing the document content to be generated into a plurality of document paragraph tasks, retrieving a clustering subset from a hierarchical semantic index based on the document paragraph tasks, extracting task node vectors corresponding to the clustering subset, performing bilinear transformation on the task node vectors and semantic representations of the document paragraph tasks to obtain an interaction feature matrix, performing singular value decomposition on the interaction feature matrix to extract a main component vector, and performing residual connection with the task node vectors to obtain task specific embedded representations; adding the task specific embedded representation and a preset global constraint vector to a preset generation execution path, generating document paragraph contents, extracting key semantic units, carrying out syntactic dependency analysis on the key semantic units, extracting syntactic structural features, calculating semantic relativity among different key semantic units, constructing a directed semantic dependency graph based on the semantic relativity, carrying out topological sorting, determining semantic dependency relations among different key semantic units, and carrying out serialization fusion on the document paragraph contents to obtain a complete medical instrument document.
2. The method of claim 1, wherein collecting medical instrument source information and extracting entities and association relations to construct a knowledge graph, performing multi-level heterogeneous graph coding on the knowledge graph, calculating neighborhood weighted representations of each node, and determining a node embedded vector set in combination with contrast learning comprises: Collecting medical instrument source information from a medical instrument technical document, performing word segmentation processing on the medical instrument source information to obtain a word sequence, performing named entity recognition on the word sequence to extract an entity set, performing relationship extraction on entities in the entity set to obtain an association relationship between the entities, and constructing a knowledge graph according to the entity set and the association relationship; layering and grouping the nodes in the knowledge graph according to the node types to obtain a plurality of node levels, performing graph rolling operation on the nodes in each node level to obtain intra-layer node representations, and performing cross-layer aggregation on the intra-layer node representations of different node levels to obtain a multi-layer heterogeneous graph coding result; Extracting adjacent node sets of each node in the multi-level heterogram coding result, carrying out weighted summation on node representations in the adjacent node sets to obtain neighborhood weighted representations of each node, taking the neighborhood weighted representations as positive samples, carrying out random sampling on the nodes in the knowledge graph to obtain negative samples, calculating comparison loss between the positive samples and the negative samples, and carrying out back propagation optimization to obtain node embedded vector sets.
3. The method of claim 1, wherein performing cluster analysis on the set of node embedded vectors to obtain a cluster result and constructing a hierarchical semantic index comprises: Carrying out normalization processing on node embedded vectors in a node embedded vector set to obtain normalized node embedded vectors, calculating a semantic similarity matrix between the normalized node embedded vectors, constructing an adjacency graph between nodes based on the semantic similarity matrix, carrying out spectrum decomposition on the adjacency graph to obtain a characteristic subspace, projecting the normalized node embedded vectors into the characteristic subspace to obtain low-dimensional projection vectors, carrying out density estimation on the low-dimensional projection vectors to obtain local density values of all nodes, and selecting nodes with the local density values higher than a preset density threshold as clustering center candidate nodes; Calculating distance metric values among the cluster center candidate nodes, carrying out hierarchical combination on the cluster center candidate nodes based on the distance metric values to obtain a multi-level cluster center set, and distributing node embedded vectors in the node embedded vector set to the nearest cluster centers in the multi-level cluster center set to obtain a clustering result; And calculating semantic center representations of the clusters based on node embedded vectors of the clusters in the clustering result, calculating semantic association degrees among the semantic center representations, constructing hierarchical relations among the clusters, and layering the clusters in the hierarchical relations according to semantic granularity from coarse to fine to obtain a hierarchical semantic index.
4. The method of claim 1, wherein receiving a document generation requirement and decomposing document content to be generated into a plurality of document paragraph tasks, retrieving a subset of clusters from the hierarchical semantic index based on the document paragraph tasks and extracting task node vectors corresponding to the subset of clusters, and performing bilinear transformation on the task node vectors and semantic representations of the document paragraph tasks to obtain the interaction feature matrix comprises: Receiving a document generation requirement, extracting a document type identifier, determining a document structure template according to the document type identifier, dividing the document content to be generated into a plurality of paragraph areas, extracting paragraph topics and paragraph function attributes from each paragraph area, and combining the paragraph topics and the paragraph function attributes to form a document paragraph task; Carrying out semantic coding on a document paragraph task to obtain a task query vector, calculating similarity represented by semantic centers of all levels in the task query vector and a hierarchical semantic index, starting from the top layer of the hierarchical semantic index, screening cluster clusters with the similarity higher than a level threshold value layer by layer, traversing downwards along the hierarchical semantic index until reaching a bottom cluster to obtain a cluster subset, extracting node embedded vectors contained in the cluster subset, and marking the node embedded vectors as task node vectors; and carrying out text coding on the document paragraph task to obtain semantic representation of the document paragraph task, and carrying out tensor product operation on the task node vector and the semantic representation through a preset bilinear transformation matrix to obtain an interaction feature matrix.
5. The method of claim 1, wherein extracting a principal component vector by singular value decomposition of the interaction feature matrix and performing residual connection with the task node vector to obtain a task-specific embedded representation comprises: Performing row-column expansion on the interaction feature matrix to obtain a three-dimensional tensor structure, performing low-rank decomposition on each slice of the three-dimensional tensor structure to obtain a plurality of low-rank approximate matrices, calculating a reconstruction error between each low-rank approximate matrix and the interaction feature matrix, selecting the low-rank approximate matrix with the minimum reconstruction error as an optimized interaction matrix, and performing singular value decomposition to obtain a left singular vector matrix, a singular value diagonal matrix and a right singular vector matrix; calculating the sum of squares of singular values in a singular value diagonal matrix as total energy, accumulating the square value of each singular value to obtain an accumulated energy sequence, selecting a position in the accumulated energy sequence, wherein the ratio of the accumulated energy to the total energy exceeds a preset retention threshold value for the first time, as an end position, extracting a left singular vector and a right singular vector corresponding to the singular value before the end position, and splicing to obtain a main component vector; and constructing a reference vector set based on the clustering subset which is the same as the source of the task node vector, calculating the similarity of the principal component vector and each vector in the reference vector set, normalizing to obtain attention weight distribution, carrying out weighted aggregation on the reference vector set based on the attention weight distribution to obtain an adaptive reference vector, and carrying out residual connection on the principal component vector and the adaptive reference vector to obtain the task specific embedded representation.
6. The method of claim 1, wherein adding the task-specific embedded representation and the preset global constraint vector to a preset generation execution path, generating document paragraph content and extracting key semantic units, syntactic dependency analysis of the key semantic units to extract syntactic structural features and calculate semantic relatedness between different key semantic units comprises: Performing gating fusion on the task specific embedded representation and a preset global constraint vector to obtain a conditional guide vector, decomposing the conditional guide vector into a semantic control component and a structural control component, adding the semantic control component into a semantic decoding layer of the generation execution path to generate a semantic candidate sequence, adding the structural control component into a structural decoding layer of the generation execution path to generate a structural candidate sequence, performing cross-validation on the semantic candidate sequence and the structural candidate sequence to obtain a consistency score, selecting a semantic candidate sequence and a structural candidate sequence combination with the highest consistency score to obtain document paragraph content, performing part-of-speech labeling and named entity recognition on the document paragraph content, and extracting a phrase which simultaneously meets part-of-speech requirements and entity type requirements corresponding to the document generation requirement as a key semantic unit; Performing syntactic dependency analysis on the key semantic units to obtain a dependency relationship tree, traversing the dependency relationship tree to extract dependency triples corresponding to the key semantic units, performing vector coding on the dependency triples to obtain local syntactic features, performing graph-volume aggregation on the local syntactic features of all the key semantic units in the document paragraph content to obtain global syntactic features, fusing the local syntactic features and the global syntactic features to obtain syntactic structural features, calculating vector distances among the syntactic structural features of different key semantic units, and outputting the vector distances as semantic relativity.
7. The method of claim 1, wherein constructing a directed semantic dependency graph based on semantic relevance and performing topological ordering, determining semantic dependencies between different key semantic units and performing serialization fusion on document paragraph content to obtain a complete medical instrument document comprises: Establishing connection edges between nodes according to the semantic relativity by taking the key semantic units as nodes to construct an initial semantic dependency graph, coding the initial semantic dependency graph through a graph neural network to obtain a node representation matrix, calculating causal influence intensity between any two node representations in the node representation matrix, distributing directions for each connection edge based on the causal influence intensity to obtain a directed edge and a directed semantic dependency graph, calculating information circulation contribution degree corresponding to each directed edge, determining a low contribution edge by combining a preset contribution threshold, removing the low contribution edge in the directed semantic dependency graph, performing loop detection, and eliminating a detected loop to obtain a simplified directed semantic dependency graph; Performing topological sorting on the simplified directed semantic dependency graph to obtain a linear arrangement sequence, and marking semantic dependency relationships between two key semantic units directly connected through directed edges in the simplified directed semantic dependency graph as direct semantic dependency relationships; Constructing a hierarchical structure tree according to the linear arrangement sequence, attributing document paragraph contents with key semantic units with direct semantic dependency relationships to the same father node in the hierarchical structure tree, generating father node abstract representations, inputting the father node abstract representations into child node document paragraph contents for semantic enhancement, extracting the document paragraph contents after semantic enhancement according to the depth-first traversal sequence of the hierarchical structure tree, and carrying out serialization splicing to obtain the complete medical instrument document.
8. A multi-agent based medical device multi-file long text generation system for implementing the method of any of the preceding claims 1-7, comprising: The first unit is used for collecting medical instrument source information, extracting entities and association relations to construct a knowledge graph, carrying out multi-level heterogeneous graph coding on the knowledge graph, calculating neighborhood weighted representation of each node, determining a node embedded vector set by combining contrast learning, carrying out cluster analysis on the node embedded vector set to obtain a clustering result and constructing a hierarchical semantic index; The second unit is used for receiving a document generation requirement, decomposing document contents to be generated into a plurality of document paragraph tasks, retrieving a clustering subset from a hierarchical semantic index based on the document paragraph tasks, extracting task node vectors corresponding to the clustering subset, performing bilinear transformation on the task node vectors and semantic representations of the document paragraph tasks to obtain an interaction feature matrix, performing singular value decomposition on the interaction feature matrix to extract a main component vector, and performing residual connection with the task node vectors to obtain task specific embedded representations; The third unit is used for adding the task specific embedded representation and the preset global constraint vector to a preset generation execution path, generating document paragraph contents and extracting key semantic units, carrying out syntactic dependency analysis on the key semantic units, extracting syntactic structural features, calculating semantic relativity among different key semantic units, constructing a directed semantic dependency graph based on the semantic relativity, carrying out topological sorting, determining semantic dependency relations among different key semantic units, and carrying out serialization fusion on the document paragraph contents to obtain the complete medical instrument document.
9. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.

Description

Medical instrument multi-file long text generation method based on multiple agents Technical Field The invention relates to the technical field of medical equipment, in particular to a multi-agent-based medical equipment multi-file long text generation method. Background With the rapid development of the medical instrument industry, the generation demands of medical instrument related documents are increasing, the medical instrument documents comprise a plurality of types such as a use instruction book, a technical white paper book, a clinical evaluation report and the like, the documents not only need to follow strict supervision requirements, but also need to contain a large number of professional terms and technical details, and the traditional medical instrument document generation mainly depends on manual writing, is time-consuming and labor-consuming and is difficult to ensure the consistency and the integrity of contents; Along with the development of artificial intelligence technology, particularly the application of a large language model, the text automatic generation technology has significantly progressed, but the existing medical instrument document generation technology still has the problems that the complex knowledge structure in the medical instrument field is difficult to effectively process, the complex association relation among different entities is difficult to accurately capture, the generated content has insufficient professionality and insufficient association, global consistency is difficult to maintain, the situation of logic fracture, repeated information or contradiction frequently occurs among all parts of content, the overall quality and usability of the document are affected, the generated content cannot be ensured to meet compliance standards, the workload of later manual auditing and modification is increased, and the like. Disclosure of Invention The embodiment of the invention provides a medical instrument multi-file long text generation method based on multiple intelligent agents, which at least can solve part of problems in the prior art. In a first aspect of the embodiment of the present invention, a method for generating a multi-agent-based medical device multi-file long text is provided, including: collecting medical instrument source information, extracting entities and association relations to construct a knowledge graph, carrying out multi-level heterogeneous graph coding on the knowledge graph, calculating neighborhood weighted representation of each node, determining a node embedded vector set by combining contrast learning, carrying out cluster analysis on the node embedded vector set to obtain a clustering result, and constructing a hierarchical semantic index; Receiving a document generation requirement, decomposing the document content to be generated into a plurality of document paragraph tasks, retrieving a clustering subset from a hierarchical semantic index based on the document paragraph tasks, extracting task node vectors corresponding to the clustering subset, performing bilinear transformation on the task node vectors and semantic representations of the document paragraph tasks to obtain an interaction feature matrix, performing singular value decomposition on the interaction feature matrix to extract a main component vector, and performing residual connection with the task node vectors to obtain task specific embedded representations; adding the task specific embedded representation and a preset global constraint vector to a preset generation execution path, generating document paragraph contents, extracting key semantic units, carrying out syntactic dependency analysis on the key semantic units, extracting syntactic structural features, calculating semantic relativity among different key semantic units, constructing a directed semantic dependency graph based on the semantic relativity, carrying out topological sorting, determining semantic dependency relations among different key semantic units, and carrying out serialization fusion on the document paragraph contents to obtain a complete medical instrument document. In an alternative embodiment of the present invention, Collecting medical instrument source information, extracting an entity and an association relation to construct a knowledge graph, carrying out multi-level heterogeneous graph coding on the knowledge graph, calculating neighborhood weighted representation of each node, and determining a node embedded vector set by combining contrast learning comprises the following steps: Collecting medical instrument source information from a medical instrument technical document, performing word segmentation processing on the medical instrument source information to obtain a word sequence, performing named entity recognition on the word sequence to extract an entity set, performing relationship extraction on entities in the entity set to obtain an association relationship between the entities, and