CN-121981237-A - Knowledge question-answering dynamic map construction method, knowledge question-answering method, storage medium, program product and electronic device
Abstract
The application provides a knowledge question-answering dynamic map construction method, a knowledge question-answering method, a storage medium, a program product and electronic equipment, wherein the knowledge question-answering dynamic map construction method comprises the steps of collecting multi-mode data at least comprising time sequence image data to capture text semantics and position, conducting semantic alignment through a self-adaptive gating fusion network of a multi-layer perceptron to generate unified text vectors, conducting cross fusion on entities, relations and global features separated from the unified text vectors and visual features of image projection through an improved multi-mode zoning fusion network to obtain multi-mode deep features, and finally inputting the deep features into a decoder after being enhanced to generate a multi-element set with a time stamp and merging the multi-element set with a time stamp into a map database to construct the knowledge question-answering dynamic map. According to the application, through multi-source data fusion and PFKAN knowledge graph construction, the robustness of semantic retrieval and reasoning is improved, and dynamic adaptation and efficient decision under a multi-role scene are realized.
Inventors
- JI YOU
- GAO QIANG
- Li Duantengchuan
- JIA YUHAO
Assignees
- 上海路明星光智能科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260408
Claims (10)
- 1. A knowledge question-answering dynamic map construction method is characterized by comprising the following steps: Collecting multi-modal data, wherein the multi-modal data at least comprises image data with time sequence attributes; performing at least coding mapping, semantic alignment, feature fusion and enhancement decoding processing based on the multi-modal data to construct a knowledge question-answer dynamic map, wherein, Coding mapping is performed based on multi-mode data to obtain high-dimensional content vectors, the high-dimensional content vectors capture text semantic features and map the text content in position, and Generating unified text vector based on self-adaptive gating fusion network of multi-layer perceptron during semantic alignment, and The method comprises the steps of carrying out cross fusion on self-defined combined features by utilizing an improved multi-modal partition fusion network to obtain multi-modal deep features when carrying out feature fusion, wherein the combined features at least comprise entity features, relationship features and global sharing features which are obtained by splitting based on the unified text vector, and visual features which are obtained by projection based on the image data And enhancing the multi-mode deep features when enhancement decoding is carried out, inputting the enhanced features into a preset decoder, and merging the generated multi-group set into a graph database to obtain the knowledge question-answer dynamic graph, wherein the multi-group set comprises time stamps associated with time sequence attributes.
- 2. The knowledge question-answering dynamic map construction method according to claim 1, wherein the encoding mapping is based on multi-mode data to obtain a high-dimensional content vector, the high-dimensional content vector captures text semantic features and performs position mapping on text content, and the method specifically comprises: constructing a procedure hierarchy tree based on the multimodal data to obtain a procedure hierarchy index sequence, wherein the procedure hierarchy index sequence at least comprises a chapter path sequence; dividing a long text in a rule hierarchical structure tree into rule content blocks by adopting a self-adaptive sliding window, and encoding each rule content block to obtain a high-dimensional content vector; mapping the chapter path sequence into a path vector through a sine/cosine position coding function, and calculating as follows: ; ; Wherein, the As a function of the position-coding, For the depth position of the current text content in the procedural hierarchy tree, As a dimension of the vector, For an even dimension in the vector, For an odd number of dimensions in the vector, Is an index of the vector dimension.
- 3. The knowledge question-answering dynamic map construction method according to claim 2, wherein the semantic alignment is performed to generate a unified text vector based on a self-adaptive gating fusion network of a multi-layer perceptron, and the method specifically comprises: and splicing the high-dimensional content vector and the path vector, and calculating a gating coefficient through a full connection layer, wherein the calculation formula is as follows: ; Wherein, the In order to gate the coefficient of the power supply, The function is activated for Sigmoid, In order to gate the weight matrix, In the case of a high-dimensional content vector, The vector concatenation operation is represented by a vector, As a vector of the path it is, Is a gating bias term; And carrying out weighted fusion on the high-dimensional content vector and the path vector based on the gating coefficient to generate the unified text vector, wherein the calculation formula is as follows: ; Wherein, the In order to unify the text vectors, In order to gate the coefficient of the power supply, Representing an element-by-element multiplication, In the case of a high-dimensional content vector, Is a path vector.
- 4. The knowledge question-answering dynamic map construction method according to claim 1, wherein the method is characterized in that the method comprises the steps of cross-fusing the self-defined combined features by using an improved multi-modal partition fusion network to obtain multi-modal deep features, and the method specifically comprises the following steps: and carrying out semantic structuring splitting on the unified text vector to obtain an entity characteristic region, a relation characteristic region and a global shared context region, wherein the calculation formula is as follows: ; ; ; Wherein, the As a region of the physical characteristics, For the function to be activated by the ReLU, As the weight matrix of the entity characteristic region, In order to unify the text vectors, For the physical feature region bias term, In order to be a region of a relationship feature, For the weight matrix of the relational feature area, For the relational feature region bias term, For a global shared context area, For a globally shared context area weight matrix, Biasing items for the global shared context region; The image data comprises an infrared image, the infrared image features are projected into a vision alignment feature area under the same dimension as the unified text vector, and the calculation formula is as follows: ; Wherein, the In order to align the regions of the visual features, For the purpose of normalizing the layer(s), For the visual alignment feature region weight matrix, As a feature of the infrared image, Biasing items for the visual alignment feature region; and based on the entity feature region, the relation feature region, the global sharing context region and the vision alignment feature region, splicing to obtain the combined feature, wherein the calculation formula is as follows: ; Wherein, the In order to combine the features of the present invention, The characteristic stitching operation is represented as such, As the weight matrix of the entity characteristic region, As a region of the physical characteristics, For the weight matrix of the relational feature area, In order to be a region of a relationship feature, For a globally shared context area weight matrix, For a global shared context area, For the visual alignment feature region weight matrix, Alignment areas for visual features; The combined features are subjected to cross fusion by utilizing an improved multi-mode partition fusion network to obtain multi-mode deep features, wherein each feature component of the combined features is subjected to independent nonlinear calculation by adopting an element level unfolding mode and then is aggregated, and the calculation formula is as follows: ; ; Wherein, the In the form of a multi-modal deep layer feature, Is a multi-modal deep layer feature Is a sum of dimensions of (a); Is a multi-modal deep layer feature Is the first of (2) Outputting individual components; for combined features of input Is the first of (2) Scalar feature components; Based on scalar characteristic components A kind of electronic device B-spline basis functions; Is the first to Scalar feature component and the first Trainable control coefficients corresponding to the order basis functions; Is the total order of the B-spline basis function.
- 5. The knowledge question-answering dynamic graph construction method according to claim 1, wherein the multi-mode deep feature is enhanced when enhancement decoding is performed, and the enhanced feature is input into a preset decoder, so that the generated multi-element set is integrated into a graph database to obtain the knowledge question-answering dynamic graph, and the knowledge question-answering dynamic graph construction method specifically comprises the following steps: based on the channel convolution attention module, the feature weight is adjusted, and the calculation formula is as follows: ; Wherein, the In the form of a multi-modal deep layer feature, For multi-modal deep features after feature weights are adjusted by the channel convolution attention module, Representing an element-by-element multiplication, The function is activated for Sigmoid, For a multi-layer perceptron that shares parameters, Represents an average value of the aggregated global semantics, Extremum representing aggregated global semantics; based on the focusing key information area of the spatial convolution attention module, the calculation formula is as follows: ; ; Wherein, the For the corresponding enhancement features after focusing the critical information area by the spatial convolution attention module, For multi-modal deep features after feature weights are adjusted by the channel convolution attention module, Representing an element-by-element multiplication, The function is activated for Sigmoid, To adopt A two-dimensional convolution layer of large-size convolution kernels, Represents an average value of the aggregated global semantics, The vector concatenation operation is represented by a vector, Extremum representing aggregated global semantics; inputting the enhanced features into a preset decoder to incorporate the generated multi-group set into a graph database to obtain the knowledge question-answering dynamic graph, wherein the multi-group set expression is as follows: ; Wherein, the In the case of a collection of tuples, In order to be a head entity, As a type of header entity, For the relationship between the head and tail entities, As a tail entity, the number of the tail entities, As a result of the type of tail entity, For the current operation or defect status of the device, Is a time stamp.
- 6. A knowledge question-answering method, characterized by being applied to a knowledge question-answering dynamic map constructed by the knowledge question-answering dynamic map construction method according to any one of claims 1 to 5, wherein the method comprises: receiving a natural language problem input by a user terminal, and identifying a user query intention; Processing the constructed knowledge question-answer dynamic map based on the user query intention to obtain a candidate answer set, wherein the knowledge question-answer dynamic map at least comprises constraint reasoning based on a fault tree during processing; And inputting the candidate answer set into a preset large language model to obtain a question and answer result, and visually displaying the question and answer result to the user side.
- 7. The knowledge question and answer method according to claim 6, characterized in that based on the user query intention, the knowledge question and answer dynamic map is used for processing to obtain a candidate answer set, and the method specifically comprises: the user query intention is obtained by a weighted voting strategy based on confidence, and the calculation formula is as follows: ; Wherein, the For the user to query for an intent, Is a function of the maximum value of the power supply, In order to match the output discrete values in a regular way, To monitor the classification confidence of the learning model output, The semantic probability output for a large language model, Speaking weights corresponding to different judging paths, and Constraint reasoning is carried out based on a fault tree, wherein a user problem is extracted based on the user query intention and is used as a root node to be combined with the fault tree for reverse deduction so as to obtain a fault reason set; And performing multi-hop reasoning in the knowledge question-answering dynamic map based on the fault cause set to generate a structured reasoning result, encoding the user questions into vectors, performing vector retrieval to identify a matching procedure content block, and combining fusion weights to obtain a candidate answer set, wherein the calculation formula is as follows: ; Wherein, the As a set of candidate answers, In order to fuse the weights, the weights are, In order to reason the result of the reasoning, In order to match the blocks of procedure content, Representing and operating.
- 8. A computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the knowledge question-answering dynamic map construction method according to any one of claims 1 to 5 and/or the knowledge question-answering method according to any one of claims 6 to 7.
- 9. A computer program product, characterized in that the computer program product comprises a computer program code which, when run on a computer, causes the computer to implement the knowledge question and answer dynamic graph construction method of any one of claims 1 to 5 and/or the knowledge question and answer method of any one of claims 6 to 7.
- 10. An electronic device, characterized in that the electronic device comprises a processor and a memory, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the electronic device executes the knowledge question and answer dynamic map construction method according to any one of claims 1 to 5 and/or the knowledge question and answer method according to any one of claims 6 to 7.
Description
Knowledge question-answering dynamic map construction method, knowledge question-answering method, storage medium, program product and electronic device Technical Field The application belongs to the technical field of artificial intelligence and natural language processing, and particularly relates to a knowledge question-answering dynamic map construction method, a knowledge question-answering method, a storage medium, a program product and electronic equipment. Background In the operation and maintenance process of a modern power system, data such as substation operation log, relay protection fixed value list, manufacturer technical specification, PMS equipment standing book, SCADA time sequence data, unmanned aerial vehicle inspection infrared thermal imaging diagram and the like are core information sources for guaranteeing safe and stable operation of a power grid. The information covers key links such as equipment state monitoring, fault emergency treatment, maintenance planning and the like, and has irreplaceable value for decision making and judgment of different roles such as a dispatcher, operation and examination responsibilities, on-site maintenance workers and the like. The existing electric power operation and inspection information acquisition mode mainly depends on manual cross-system inquiry, wherein on one hand, unstructured texts such as rules, drawings and the like, SCADA time sequence data, infrared thermal image and the like and structured or visual data such as infrared thermal image and the like are stored in different systems to form a serious data island, on the other hand, the electric power equipment fault diagnosis usually needs to be combined with phenomenon-principle-rules to carry out complex causal reasoning, and the traditional retrieval system based on keyword matching lacks logical reasoning capability, so that the complex fault tracing problem is difficult to deal with. Although the large language model is excellent in the general question-answering field in recent years, the large language model has obvious defects when being directly applied to an electric operation and detection scene, namely, the large language model lacks the perception capability of visual modes such as infrared images and the like and cannot be used for assisting diagnosis by utilizing temperature field characteristics, the large language model lacks strict field logic constraint, is easy to generate illusion, gives error suggestions which do not accord with safety regulations, has great potential safety hazards, and lacks feedback closed loops of expert knowledge, so that the system is difficult to continuously evolve in practical application. Disclosure of Invention In view of the above drawbacks of the prior art, the present application is directed to providing a knowledge question and answer dynamic map construction method, a knowledge question and answer method, a storage medium, a program product, and an electronic device, for solving the problems of lack of multi-modal perception, low domain logic constraint, and potential safety hazard when the large language model in the prior art is applied. In a first aspect, the present application provides a knowledge question-answering dynamic map construction method, which includes: Collecting multi-modal data, wherein the multi-modal data at least comprises image data with time sequence attributes; performing at least coding mapping, semantic alignment, feature fusion and enhancement decoding processing based on the multi-modal data to construct a knowledge question-answer dynamic map, wherein, Coding mapping is performed based on multi-mode data to obtain high-dimensional content vectors, the high-dimensional content vectors capture text semantic features and map the text content in position, and Generating unified text vector based on self-adaptive gating fusion network of multi-layer perceptron during semantic alignment, and The method comprises the steps of carrying out cross fusion on self-defined combined features by utilizing an improved multi-modal partition fusion network to obtain multi-modal deep features when carrying out feature fusion, wherein the combined features at least comprise entity features, relationship features and global sharing features which are obtained by splitting based on the unified text vector, and visual features which are obtained by projection based on the image data And enhancing the multi-mode deep features when enhancement decoding is carried out, inputting the enhanced features into a preset decoder, and merging the generated multi-group set into a graph database to obtain the knowledge question-answer dynamic graph, wherein the multi-group set comprises time stamps associated with time sequence attributes. In some embodiments of the first aspect of the present application, encoding is performed based on multi-modal data to obtain a high-dimensional content vector, where the high-dimensional content vector c