CN-122023380-A - Visual transducer-based power transmission line fault unmanned aerial vehicle inspection and intelligent diagnosis method and system

CN122023380ACN 122023380 ACN122023380 ACN 122023380ACN-122023380-A

Abstract

The invention discloses a visual transducer-based unmanned aerial vehicle inspection and intelligent diagnosis method and system for power transmission line faults. The method comprises the steps of obtaining a patrol image through an unmanned aerial vehicle, identifying circuit components, constructing a topological graph describing physical connection relations among the components, performing graph embedding learning on the topological graph to generate topological feature vectors corresponding to the components, dividing the image into image block sequences, fusing standard position codes of each image block with the component topological feature vectors covered by corresponding space to generate enhanced position codes, adding the enhanced position codes and the image block features, inputting the enhanced position codes into a visual transducer network, extracting multi-scale visual features, and finally realizing fault identification and positioning based on the features. According to the invention, the physical structure information of the line is merged into the position code of the transducer in the form of the topological feature vector, so that the network is guided to extract the visual features strongly related to the line structure, and the accuracy and the robustness of the fault detection of the power transmission line under the complex background are obviously improved.

Inventors

MA XINCHENG
LIU JIALIN
LIU WEILIN
XU GUANGDA
MA YUAN
LI WEIYU
JIANG XIN

Assignees

国网冀北电力有限公司电力科学研究院
国家电网有限公司

Dates

Publication Date: 20260512
Application Date: 20260224

Claims (10)

1. A transmission line fault unmanned aerial vehicle inspection and intelligent diagnosis method based on vision transducer is characterized by comprising the following steps of Acquiring a patrol image of a power transmission line, identifying line components in the image, and constructing a topological graph describing the physical connection relationship among the components; performing graph embedding learning on the topological graph to generate a topological feature vector corresponding to each circuit component; Dividing the inspection image into an image block sequence, aiming at each image block, carrying out fusion on the standard position code of the inspection image in the image and the topological feature vector of a line component covering the space of the image block, and generating an enhanced position code; adding the enhancement position codes and the image block characteristics, inputting a visual transducer network, and extracting multi-scale visual characteristics; Outputting fault identification and positioning results based on the multi-scale visual features; Wherein the enhanced position code is used to guide the visual transducer network to extract features related to the physical structure of the line.
2. The method according to claim 1, wherein the identifying the line components in the image and constructing a topology map describing the physical connection relationship between the components specifically comprises: processing the inspection image by using a pre-trained line component detection model, and outputting the category and boundary box information of each line component; the circuit assembly at least comprises a wire, an insulator, a pole tower and a connecting fitting; constructing an adjacency relation between the circuit components based on the boundary box information and a preset physical connection rule, wherein the physical connection rule comprises allowable connection relations and geometric constraint conditions between components of different types; and generating a topological graph representing the physical structure of the line by taking the line component as a node and the adjacent relation as an edge.
3. The method according to claim 2, wherein the constructing the adjacency between the line components based on the bounding box information and a preset physical connection rule specifically includes: calculating geometric relation parameters among different circuit components according to the spatial positions of the boundary box information, wherein the geometric relation parameters at least comprise a center point distance, an overlapping area proportion and a relative angle; Inputting the geometric relationship parameters into a relationship reasoning network based on an attention mechanism, wherein the relationship reasoning network dynamically predicts the probability of physical connection between components according to the input parameters; based on the probability, combining with preset connection category priori knowledge, judging and outputting a final component adjacency relation matrix.
4. The method of claim 1, wherein the performing graph-embedding learning on the topology graph to generate the topology feature vector corresponding to each line component specifically includes: Initializing nodes in the topological graph into node vectors based on visual characteristics of corresponding line components; Carrying out multi-round message transmission and feature aggregation on the topological graph by using a graph neural network, wherein in each round of aggregation, the node fuses the self features and the features of neighbor nodes according to the adjacent relation; after aggregation of the preset number of rounds, the final hidden state of each node is extracted and used as a topological feature vector corresponding to the line component, and the vector simultaneously codes the attribute of the component and the context relation of the component in the whole line structure.
5. The method according to claim 4, wherein the multi-round messaging and feature aggregation of the topology map using a graph neural network specifically comprises: In each round of message passing, for each node in the topological graph, calculating the attention weight of the node and all neighbor nodes through a multi-head attention mechanism; According to the attention weight, carrying out weighted aggregation on the characteristics of the neighbor nodes to obtain neighborhood aggregation characteristics of the nodes; Fusing the neighborhood aggregation features with the self features of the node through a gating mechanism, wherein the gating mechanism dynamically adjusts the fusion proportion according to the matching degree of the node features and the neighborhood features; and after the information transmission and the feature aggregation are carried out in multiple layers, outputting node features containing global structure information as topology feature vectors.
6. The method according to claim 1, wherein for each image block, the standard position code of the image block is fused with the topological feature vector of the line component covering the image block space to generate the enhanced position code, specifically comprising: establishing a mapping relation between the spatial position of the image block and nodes in the topological graph, and determining one or more line component nodes covered by a spatial region of each image block; Obtaining topological feature vectors corresponding to the one or more line component nodes, and aggregating the topological feature vectors through an attention pooling layer to generate an aggregate topological feature corresponding to the image block; And inputting the standard position codes of the aggregation topological features and the image blocks into a learnable feature fusion network, dynamically adjusting the fusion proportion of the two types of features by the feature fusion network through a gating mechanism, and outputting the enhanced position codes.
7. The method of claim 6, wherein the learnable feature fusion network learns a fusion strategy of topological features and position codes by an end-to-end training manner, wherein: the parameters of the fusion network are jointly optimized with the downstream visual task in the training process; The fusion network dynamically generates a continuous fusion weight matrix according to the relevance between the input aggregation topological characteristics and the standard position codes; And based on the fusion weight matrix, carrying out weighted combination on the standard position code and the aggregation topological feature to generate the enhanced position code.
8. The method according to claim 1, wherein adding the enhancement position code to the image block features, inputting a visual transducer network, extracting multi-scale visual features, specifically comprises: inputting the image block sequence added with the enhanced position codes to a visual transducer network based on a sliding window hierarchical attention mechanism; extracting fine grain texture characteristics of a circuit component through self-attention calculation of a small-size local window at a shallow layer of the network; In the middle layer of the network, cross-window feature interaction is realized through a shift window mechanism, and spatial association features between adjacent components are established by matching with topology structure information contained in the enhanced position codes; And integrating the context information of the full graph through global attention calculation in the deep layer of the network to generate the multi-scale visual feature fusing the local detail and the global topological structure.
9. The method of claim 8, wherein the cross-window feature interaction is implemented by a shift window mechanism, and the spatial association feature between adjacent components is established in cooperation with topology information contained in the enhanced position code, specifically including: Dividing an input feature map of a current attention layer into a plurality of non-overlapping local windows, performing self-attention calculation in each window, and extracting local features in the windows; Calculating the correlation score between the attention characteristic of each window and the topological structure information corresponding to the attention characteristic in the enhanced position coding, and dynamically adjusting the attention weight of the window according to the correlation score; Window shifting operation is carried out on the feature map, so that after the features are redistributed among different windows, self-attention calculation is carried out again under the new window division after shifting; based on the self-attention output under the two window divisions, a feature map is generated that contains both local detail and cross-component spatial associations.
10. A visual transducer-based unmanned transmission line fault inspection and intelligent diagnosis system applied to the method as claimed in any one of claims 1 to 9, characterized in that the system comprises: the image acquisition and topology construction module is used for acquiring a patrol image of the power transmission line, identifying line components in the image and constructing a topology diagram describing the physical connection relation among the components; The diagram embedding learning module is used for carrying out diagram embedding learning on the topological diagram to generate a topological feature vector corresponding to each circuit component; The position coding fusion module is used for dividing the inspection image into image block sequences, and fusing standard position codes of each image block in the image with topological feature vectors of line components covering the image block space to generate enhanced position codes; the visual feature extraction module is used for adding the enhancement position codes and the image block features, inputting a visual transducer network and extracting multi-scale visual features; the fault diagnosis module is used for outputting fault identification and positioning results based on the multi-scale visual characteristics; Wherein the enhanced position code is used to guide the visual transducer network to extract features related to the physical structure of the line.

Description

Visual transducer-based power transmission line fault unmanned aerial vehicle inspection and intelligent diagnosis method and system Technical Field The invention belongs to the technical field of intelligent monitoring and computer vision intersection of power equipment, and particularly relates to a transmission line fault unmanned aerial vehicle inspection and intelligent diagnosis method and system based on vision transformers. Background The power transmission line is taken as an important component of the power system, and the safe and stable operation of the power transmission line is directly related to the reliability and the power supply quality of the power grid. The traditional manual inspection mode has the defects of low efficiency, high risk, limited coverage and the like, and is difficult to meet the operation and maintenance requirements of a modern power grid on a large scale and at high frequency. With the maturation of unmanned aerial vehicle technology, unmanned aerial vehicle-based automatic inspection of transmission line has become industry development trend, can obtain high-resolution inspection image with high efficiency, provides data basis for subsequent intelligent fault diagnosis. In the field of image-based fault diagnosis, deep learning techniques, particularly convolutional neural networks and vision transformers, have demonstrated powerful feature learning and recognition capabilities. However, the transmission line inspection image has the characteristics of complex background, multiple targets, strong physical connection relationship among components and the like, and most of the existing methods only start from apparent visual characteristics and cannot effectively model and utilize structural topology information among line components. For example, faults such as insulator breakage and wire strand breakage are often closely related to the connection state and spatial relation of adjacent components, but the current pure vision-based method is difficult to understand the complex physical constraint, so that false detection and omission rate are high under complex scenes such as shielding, illumination change and background interference. In addition, the position code adopted by the standard visual transducer only contains absolute or relative position information in the image, and cannot characterize the special physical connection structure of the power transmission line. How to effectively integrate the topology priori knowledge of the line into the deep learning model to enable the topology priori knowledge to have the understanding capability of the physical structure of the line, so that faults related to the structure can be recognized more accurately, and the topology priori knowledge is a key technical bottleneck for intelligent inspection of the current power transmission line. Therefore, an intelligent diagnosis method capable of deeply fusing the physical topology information and visual characteristics of the line is needed to improve the accuracy, the robustness and the interpretability of the fault detection of the power transmission line. Disclosure of Invention The existing intelligent diagnosis method for the faults of the power transmission line mainly depends on visual characteristics, and cannot effectively model and utilize physical connection relations among line components, so that the identification accuracy rate of the faults related to the structure is low and the robustness is poor in a complex scene. The invention aims to overcome the defects of the prior art and provides a visual transducer-based unmanned aerial vehicle inspection and intelligent diagnosis method for power transmission line faults. In a first aspect, an embodiment of the present application provides a visual transducer-based method for inspection and intelligent diagnosis of a power transmission line fault unmanned aerial vehicle, where the method includes: Acquiring a patrol image of a power transmission line, identifying line components in the image, and constructing a topological graph describing the physical connection relationship among the components; performing graph embedding learning on the topological graph to generate a topological feature vector corresponding to each circuit component; Dividing the inspection image into an image block sequence, aiming at each image block, carrying out fusion on the standard position code of the inspection image in the image and the topological feature vector of a line component covering the space of the image block, and generating an enhanced position code; adding the enhancement position codes and the image block characteristics, inputting a visual transducer network, and extracting multi-scale visual characteristics; Outputting fault identification and positioning results based on the multi-scale visual features; Wherein the enhanced position code is used to guide the visual transducer network to extract features related to the p