CN-121981238-A - Knowledge graph construction method in sense-through calculation control integrated field
Abstract
The application relates to a knowledge graph construction method in the field of sense general calculation and control integration. The method comprises the steps of firstly obtaining PDF and Word data sources from papers, standards and industry libraries, generating a pretreatment data set through analysis and cleaning, constructing a domain ontology layer which integrates sense, calculation and control into a root from top to bottom, synchronously constructing a professional term library and a synonym dictionary, then screening high-confidence entities through entity boundary probability and term weight based on the ontology and the term system, obtaining high-reliability entity pairs through position/type filtering and relationship strength quantization, inputting the high-reliability entity pairs and the pretreatment data into a large model, extracting initial triples through a prompt Word template, obtaining standardized triples through integrating the similarity and the multisource entities, finally importing the standardized triples into a graph database, mapping the standardized triples into nodes/edges and carrying out visual display, and solving the problem of semantic unification and precise association of cross-domain knowledge.
Inventors
- YU XIAOYOU
- ZHONG RUIJIE
Assignees
- 湖南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260408
Claims (10)
- 1. The utility model provides a sense leads to calculation accuse integration field knowledge graph construction method which characterized in that, the method includes: Acquiring PDF and Word format original data sources from paper databases, standard document libraries and industry databases in the related field of sense-through calculation control integration, and performing data processing on the original data sources to obtain a preprocessing data set; Constructing a sense-through calculation control integrated field body layer in a top-down mode, dividing a second-level according to fusion modes, supplementing an application example of a specific field as a third-level based on each fusion mode, and constructing a term library and a synonym dictionary containing field professional terms and abbreviations thereof and Chinese and English contrast according to wireless communication industry specifications to obtain a sense-through calculation control integrated field body and a term system; Taking the sense-through calculation control integrated field body and the term system as input, calculating entity boundary probability based on the body term set and the context representation, combining term weights to obtain entity confidence coefficient, and screening high-confidence-coefficient entities to obtain a sense-through calculation control integrated field high-confidence-coefficient entity set; generating candidate entity pairs through a position and type related filter function based on the high-confidence entity set in the sense-through calculation and control integrated field, and quantifying the relationship strength through the co-occurrence frequency and the type association degree to obtain high-reliability candidate entity pairs and corresponding relationship sets; inputting the high-reliability candidate entity pairs, the corresponding relation set and the JSON file in the preprocessing dataset into a large model, and executing a knowledge extraction task through a preset prompting word template to obtain an initial triplet set; merging the multi-source entities in the initial triplet set based on the comprehensive similarity, eliminating redundant entities and unifying a concept system to obtain a standardized triplet set; and importing the standardized triplet set into a graph database, mapping the entity into a node, mapping the relationship into a directed edge, extracting a subgraph by utilizing a query language of the graph database, and displaying by a front-end rendering engine to realize visual display of the knowledge graph in the sense-through-control integrated field.
- 2. The method of claim 1, wherein the raw data source comprises technical standards, academic papers and industry reports for wireless communication and sensing, computing and control fusion, wherein the data processing is performed on the raw data source to obtain a preprocessed data set, and the method comprises the following steps: Analyzing the PDF file into an HTML document by utilizing an OCR tool, cleaning and extracting a text, directly analyzing text, title, table and formula information for a Word file, reserving key technical symbols of the band unit in the sense-through calculation control integrated field, generating a JSON file according to a paragraph structure by all the analyzed contents, and reserving corresponding document metadata to obtain a preprocessing data set in the sense-through calculation control integrated field.
- 3. The method of claim 1, wherein the step of determining the position of the substrate comprises, the second level is divided into a sensing and communication integration, a sensing and calculation integration, a sensing and control integration, a communication and control integration, a sensing and communication and calculation integration, a sensing and communication and control integration, a sensing and calculation and control integration and a communication and calculation and control integration according to the fusion mode.
- 4. The method of claim 1, wherein calculating the entity boundary probabilities is: Wherein, the Is a sigmoid function of the number of bits, E 0,1 is a super parameter that balances set weights and context information, Representing the boundary probabilities of the entity context, Is the term weight that is used to refer to, And the term is meant.
- 5. The method of claim 4, wherein the term weight is: Wherein, the Representing the total number of documents in the dataset, Representing the frequency of occurrence of the term in the corpus of documents.
- 6. The method of claim 1, wherein the relationship strength comprises an entity-to-co-occurrence frequency weight and an entity-to-type association degree weight, the entity-to-co-occurrence frequency weight being: Wherein, the Representing entity pairs The frequencies of co-occurrence in the corpus, Representing entities Frequency of occurrence alone; The entity-to-type association degree weight is as follows: Wherein, the Representing entities Is of the type(s) of (a), Representation type And Through the corresponding relation The set of instances of the association is, Representation type And By means of a set of instances associated by an arbitrary relationship, Is a smoothing factor.
- 7. The method of claim 1, wherein the location and type dependent filter function comprises a location dependent filter function and a type dependent filter function, wherein generating candidate entity pairs by the location and type dependent filter function comprises: defining a position-dependent filter function: , Wherein, the Representing entities And (3) with The distance of the position in the text is kept only less than the threshold value And pairs of entities that are located in the same sentence, An index representing a sentence in which the entity is located; Defining a type-dependent filter function: Based on type association matrix Calculating type relevance of entity pairs If the association degree is greater than the threshold value Then the reservation; obtaining a set of candidate entity pairs based on two filter functions , wherein, Representing a set of entity pairs filtered by location, Representing a collection of entity pairs filtered by type, Representing the original set of candidate entity pairs.
- 8. The method of claim 1, wherein the hint word template comprises a role definition layer, a task specification layer and an output constraint layer, wherein the role definition layer defines a large language model as a sense-by-sense control integrated domain expert, the task specification layer definitely inputs a processing logic and entity type constraint list of data, and the output constraint layer forces the model to output a structured triplet format and set an error fusing mechanism.
- 9. The method of claim 1, wherein the integrated similarity calculation formula is: Wherein, the Representing the similarity of the character strings, Representing the degree of similarity of the semantics, Representing the context similarity, p, q and v represent their weight coefficients, respectively, and conform to , Representing two different entities.
- 10. The method according to claim 9, wherein the method further comprises: The character string similarity Calculating based on the character vector and the character set of the entity by adopting a weighted combination of a cosine similarity algorithm and a Jaccard similarity algorithm; The semantic similarity Calculating cosine similarity between entity semantic vectors generated based on the pre-training model; the context similarity The computation is based on cosine similarity between the word vector averages of the entities within the context window in the document.
Description
Knowledge graph construction method in sense-through calculation control integrated field Technical Field The application relates to the technical field of data processing, in particular to a knowledge graph construction method in the field of sense-through calculation and control integration. Background In a conventional von neumann computing architecture, the computing unit needs to operate with the command instructions of the memory unit, and data is frequently transferred between the processor and the memory, which requires a lot of energy consumption for data transfer. With the rapid development of information communication technology, artificial intelligence and network systems, deep fusion of sensing, communication, computation and control can break the boundary of data between a storage unit and a processing unit in a von neumann architecture, and achieve more efficient computing power and energy efficiency. In recent years, with the proposal and development of the concept of sense-by-compute integration (Integration of Sensing, communication, computation and Control), each world generally realizes that there is coupling and complementarity among the sense, communication, computation and control, namely, communication provides reliable connection and data transmission link for sense and computation, sense is understood as providing context support for Communication scheduling and computation decision through environmental information acquisition, behavior prediction and semantics, computation provides computational basis for Communication protocol adaptation and sense precision improvement through intelligent reasoning, resource optimization and task scheduling, and control realizes closed-loop adjustment and autonomous optimization of system behavior on the basis of the three. Through collaborative design, cross-layer linkage and joint optimization of sensing, communication, calculation and control, global sharing of resources and efficient flow of information can be realized, so that a next generation network system architecture with high reliability, low time delay and high adaptability is constructed. At present, the research of sense-through calculation control integration is in a rapid development and system construction stage, and the application scene spans a plurality of key industries. This field is essentially a typical multidisciplinary crossover technology architecture. The difference of different departments in a theoretical system, a term structure and a data organization mode is obvious, so that the related knowledge integrating sensing, calculation and control has the characteristics of multi-source isomerism, large field span, various semantic expressions and the like. How to effectively extract, organize and relate the cross-domain knowledge from mass documents, standard documents and industry databases becomes the basic problem of support sense, communication and control integrated research. Disclosure of Invention Based on the above, it is necessary to provide a knowledge graph construction method of the sense-through integration field for the wireless communication system, which can provide a new knowledge expression framework for theoretical research of next-generation intelligent network and also lay an important foundation for constructing an autonomous intelligent system. A sense-through calculation control integrated domain knowledge graph construction method comprises the following steps: Acquiring PDF and Word format original data sources from paper databases, standard document libraries and industry databases in the related field of sense-through calculation control integration, and performing data processing on the original data sources to obtain a preprocessing data set; Constructing a sense-through calculation control integrated field body layer in a top-down mode, dividing a second-level according to fusion modes, supplementing an application example of a specific field as a third-level based on each fusion mode, and constructing a term library and a synonym dictionary containing field professional terms and abbreviations thereof and Chinese and English contrast according to wireless communication industry specifications to obtain a sense-through calculation control integrated field body and a term system; Taking a sense-through calculation control integrated field body and a term system as input, calculating entity boundary probability based on a body term set and context representation, combining term weights to obtain entity confidence coefficient, and screening high-confidence entity to obtain a sense-through calculation control integrated field high-confidence entity set; Based on a high-confidence entity set in the sense-through calculation control integrated field, generating candidate entity pairs through a position and type related filtering function, quantifying the relation strength through co-occurrence frequency and type association degree, and filterin