CN-122019786-A - DeepSeek-based dynamic completion method of aquatic disease control knowledge graph, electronic equipment and storage medium
Abstract
The invention provides a DeepSeek-based dynamic complement method for an aquatic disease control knowledge graph, electronic equipment and a storage medium, and belongs to the field of aquatic disease control. The method is characterized in that a physical classification labeling strategy based on a dangerous object is designed aiming at the phenomenon of 'same illness and abnormal', a DeepSeek model oriented to vertical field prompt template optimization is introduced, a global-local double-order collaborative enhancement retrieval mechanism is combined, the problem of long text context dependence failure is effectively relieved, the precision and consistency of semantic retrieval are improved, and secondly, in order to enhance the understanding of deep semantic relations among triples with the phenomenon of 'same illness and abnormal', a micro logic rule enhancement module is adopted to integrate expert knowledge into neural representation learning in a symbolic logic mode, so that the completion accuracy is improved, and the interpretation of the model is enhanced. The invention realizes the deep complement and reasoning optimization of the knowledge graph and can provide targeted control schemes for different aquatic animals.
Inventors
- YU HONG
- HUANG WEI
- LI QIUSHI
- HAN CHI
- LIN GENGYU
- YE SHIGEN
Assignees
- 大连海洋大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260109
Claims (10)
- 1. The dynamic completion method of the aquatic disease control knowledge graph based on DeepSeek is characterized by comprising the following steps of: S1, constructing an aquatic disease control data set comprising basic structural data and academic paper incremental data; S2, adopting an entity classification labeling mechanism based on a hazard object, introducing an attribute deconstructing strategy, combining a predefined Schema constraint and DeepSeek models, constructing an extraction model, extracting triples from basic structural data by using the extraction model, and constructing a basic knowledge graph; s3, invoking DeepSeek model optimized by a vertical field prompt template by adopting a global-local double-order collaborative retrieval architecture to perform semantic enhancement retrieval on the academic paper incremental data to obtain a high-quality text segment, inputting the high-quality text segment into an extraction model in S2 to perform joint extraction of entities and relations, and finally forming a new paper knowledge graph containing a hazard object label; S4, fusing the basic knowledge graph and the new paper knowledge graph into a high-quality knowledge graph with uniform structure and consistent semantics by adopting a time sequence-assisted entity disambiguation strategy; And S5, inputting the high-quality knowledge graph into a micro logic rule enhancement module based on nerve-symbol fusion for further reasoning to obtain an optimized knowledge graph enhanced by depth complementation and logic consistency.
- 2. The DeepSeek-based dynamic completion method of an aquatic disease control knowledge graph according to claim 1, wherein the construction process of the basic structural data is as follows: defining an initial data template which takes a disease name as a core entity and contains 8 key attributes, wherein the key attributes comprise disease type, pathogen/etiology, hazard objects, main symptoms, epidemic situation, treatment methods, control measures and prevention methods; based on the initial data template, aquatic disease data from different sources are searched by taking aquatic disease names as keywords, and cleaning, de-duplication and standardization processing are carried out to obtain basic structural data.
- 3. The dynamic completion method of the aquatic disease control knowledge graph based on DeepSeek as claimed in claim 1, wherein the construction process of the academic paper incremental data is as follows: Searching in an academic database by taking aquatic disease names as keywords, and collecting related academic papers; preprocessing the academic paper content in PDF format, including automatically identifying and deleting irrelevant data including authors, units and references by combining prompt word engineering with DeepSeek large language models, and reserving text parts containing core academic discussions to obtain academic paper incremental data.
- 4. The dynamic completion method of the aquatic disease control knowledge graph based on DeepSeek as claimed in claim 1, wherein in the step S2, the process of constructing the basic knowledge graph includes: combining the Schema constraint, the entity classification labeling mechanism based on the hazard object, and the attribute deconstructing strategy with the DeepSeek model to obtain an extraction model; And extracting the basic structural data based on the extraction model to obtain triples, and forming a basic knowledge graph.
- 5. The dynamic completion method of the aquatic disease control knowledge graph based on DeepSeek as claimed in claim 4, wherein the Schema constraint is defined according to an initial data template, including entity type, attribute type, triplet composition form and extraction processing rule, The triples are expressed in the form of (head entity, relation and tail entity), wherein the head entity is further refined into a structure of (head entity name and head entity label) based on an entity classification labeling mechanism of the hazard object so as to enhance semantic hierarchy and classification accuracy of entity description; The entity types are disease name, disease type, pathogen/etiology, harm subject, main symptom, epidemic situation, treatment method, control measure and prevention method; The attribute types are disease type, pathogen/etiology, harm subject, main symptom, epidemic situation, treatment method, control measure and prevention method; The extraction processing rule is that when the text length of the tail entity is recognized to exceed a preset threshold value, the boundary of the tail entity is determined and cut off according to context semantics, and when the same attribute exists between one head entity and a plurality of tail entities, the head entity and the tail entities are recognized to be divided into a plurality of independent triples.
- 6. The DeepSeek-based dynamic completion method of aquatic disease control knowledge graph according to claim 4 or 5, wherein the attribute deconstructing strategy includes: Deconstructing the 'popular condition' attribute, and subdividing the 'popular condition' attribute into three independent sub-attributes of popular time, popular place and popular temperature so as to separate space-time and environmental factor information; Deconstructing the "principal symptom" attribute and breaking it down into two sub-attributes of body surface symptoms and in vivo symptoms to distinguish the appearance characteristics of the disease at different physiological sites.
- 7. The dynamic completion method of aquatic disease control knowledge graph based on DeepSeek as set forth in claim 6, wherein in step S3, the process of constructing the new paper knowledge graph includes: Designing a template of the template in the field of aquatic disease control, wherein the template defines the extracted entity type, extraction step and output format, provides a plurality of real aquatic disease cases, enhances the scene adaptability of the model, and enhances the DeepSeek model by using the template of the template; firstly, executing global search by utilizing DeepSeek model optimized by a template of campt, quickly positioning macroscopic documents or chapters related to a target disease from incremental data of academic papers, then starting local search based on global search results, and focusing on sentences or text fragments with fine granularity inside the related documents or chapters to obtain high-quality text fragments; And (3) extracting the entity and the relation of the high-quality text fragment based on the extraction model constructed in the step (S2) to obtain triples consistent with the format of the step (S2), wherein the triples form a new paper knowledge graph.
- 8. The dynamic complementation method of the aquatic disease control knowledge graph based on DeepSeek of claim 7, wherein the specific implementation manner of obtaining the high-quality text segment by the global-local double-order collaborative search architecture is as follows: the academic paper incremental data and the query are sent into DeepSeek models optimized by a template in the vertical field to obtain global retrieval results; Based on global search result, starting local search, sending global search result and inquiry into cross attention mechanism, converting global search result into value vector and key vector of attention module, converting inquiry into inquiry vector of attention module, calculating transposed product of inquiry vector and key vector, normalizing by Softmax to obtain attention weight, weighting with value vector to obtain association feature Z so as to capture semantic association of "question" and "global answer" and find out most relevant part of global answer to question, processing said association feature Z by means of full connection layer, activating by Sigmoid to obtain gating coefficient And then passing through a gating formula: Dynamic fusion problem vector With context vectors Enhancing the correlation of semantic representation, and finally obtaining optimized local query fused with global information; And (3) sending the local query fused with the global information to a DeepSeek model optimized by a template in the vertical field again to obtain a final retrieval result, namely a high-quality text fragment.
- 9. The dynamic completion method of the aquatic disease control knowledge graph based on DeepSeek as claimed in claim 7, wherein in the step S4, the method for obtaining the high-quality knowledge graph is as follows: giving timestamp marks to all triples to enhance disambiguation characteristics, namely marking a basic knowledge graph as reference time, and marking the release year of a new paper knowledge graph; and carrying out entity alignment operation on the head entities in the basic knowledge graph and the new paper knowledge graph so as to accurately identify and correlate knowledge variants in different sources, and fusing the basic knowledge graph and the new paper knowledge graph to obtain a high-quality knowledge graph.
- 10. The dynamic completion method of aquatic disease control knowledge graph based on DeepSeek as set forth in claim 7, wherein in step S5, the processing procedure of the micro logic rule enhancing module based on neural-symbol fusion includes: The high-quality knowledge graph is input into a neural knowledge graph to be embedded into vector representation of a model learning entity and a relation, potential semantic association in data is captured, and a score fused with neural model prediction is obtained ; Meanwhile, the expert knowledge of the field of' same disease is formed into a soft constraint form of a first-order logic rule which can be made by observing the data condition and discussing with aquatic product field expert for many times; Embedding the first-order logic rules into a microclculable graph through a logic tensor layer, and calculating rule reasoning scores of each rule on candidate triples ; Will be And (3) with Performing self-adaptive weighted fusion to obtain final establishment probability of the triples : Wherein, the Is a learnable balance parameter; The range of the values is as follows ; Based on And screening out new triples with high confidence, finally combining the new relationships with the original high-quality knowledge graph, and outputting an optimized knowledge graph which contains new inferred triples after completion and has consistent semantics and clear time sequence.
Description
DeepSeek-based dynamic completion method of aquatic disease control knowledge graph, electronic equipment and storage medium Technical Field The invention relates to the technical field of aquatic disease control, in particular to a DeepSeek-based aquatic disease control knowledge graph dynamic complementation method, electronic equipment and a storage medium. Background Disease control is a key link for guaranteeing healthy development of aquaculture, and effectively controls disease transmission, reduces industrial loss and improves quality safety level of aquatic products through scientific diagnosis, accurate medication and timely early warning. The knowledge graph is used as an effective tool for organizing and utilizing mass knowledge, a structured and systematic cognitive basis is provided for disease control, intelligent diagnosis of diseases and accurate medication and risk early warning are supported, and therefore prevention and control efficiency is improved. In the construction and optimization process of the knowledge graph, the completion of the knowledge graph has a key effect, and the goal is to predict and complement the missing triples so as to enhance the integrity, consistency and reasoning capacity of the knowledge base, and the knowledge graph has become a core support for promoting the development of artificial intelligent application such as information retrieval, intelligent question-answering and the like. In recent years, knowledge graph completion technology is continuously developed, and huge potential is also presented for completing the task of knowledge graph completion by using a large language model. However, although knowledge graph completion has achieved significant success in applications in general fields (e.g., search engines, recommendation systems), it still faces a series of serious challenges in vertical fields such as aqua disease control. The complexity and the dynamics of the phenomenon of 'same disease and abnormal policy' in the aquatic disease data material have a remarkable influence on the knowledge graph complement effect. For example, gill rot disease has fundamental differences in symptoms and corresponding control strategies exhibited by different hosts (e.g., grass carp versus prawn) and different periods (e.g., different years of flow). In the space dimension, typical symptoms, epidemic rules and prevention measures of the same disease can be obviously different due to different hazard objects, if the disease is not distinguished, mismatching of prevention and treatment advice is easy to cause, in the time dimension, along with scientific research progress and pathogen evolution, cognition and prevention strategies of the same disease are continuously updated, and if knowledge maps are not reflected in time, outdated or wrong advice is caused. The traditional knowledge representation and completion method is difficult to effectively describe the differences of fine granularity and time sequence, so that knowledge confusion and reasoning deviation become important bottlenecks for constructing and completing the current aquatic disease knowledge graph. In addition, the existing knowledge graph completion technology is difficult to effectively process the long-length texts and the unstructured data with dense technical terms in the fishery academic paper, and the traditional knowledge graph completion method has the defects in an open domain scene. The method comprises the specific problems that a model based on a closed world hypothesis cannot identify and link new entities, a fine granularity semantic disambiguation mechanism for the problems of 'co-morbid and abnormal policy' and the like is lacking, and the existing open domain knowledge graph completion method is such as a model based on fixed window mask, text simple aggregation, interactive attention or multi-hop neighborhood fusion, and when the method is applied to paper-level long texts, the defects of incomplete semantic information capture, high computational complexity, text noise sensitivity, inaccurate relation fact extraction and insufficient timeliness are commonly existed, so that the accuracy, reliability and field adaptability of knowledge extraction are limited. In summary, the existing open domain knowledge graph completion model is generally applied to a short text scene, and the influence of the "same disease and abnormal situation" on the knowledge graph completion effect in the aquatic disease control field is not fully considered yet. Therefore, a set of special technical architecture is needed for constructing and complementing knowledge maps in the field of controlling the aquatic diseases, and the framework needs to have the key capabilities of realizing accurate analysis of semantics, particularly solving the problem of semantic ambiguity specific to the field, realizing deep injection of knowledge in the field of controlling the aquatic diseases, guiding capturing of key