CN-122023117-A - Substation scene image stitching fusion method based on end-to-end network

CN122023117ACN 122023117 ACN122023117 ACN 122023117ACN-122023117-A

Abstract

The invention discloses a substation scene image stitching fusion method based on an end-to-end network, which comprises the steps of obtaining an image set covering a panorama, constructing an improved ResNet-50 network to conduct feature extraction on substation images in the image set, and generating a multi-scale feature pyramid of each image. And determining candidate spliced edge band positions of each image based on the multi-scale feature pyramid, and sampling edge band features on different scales. And constructing a feature matching cost body by calculating the similarity of edge band features among different images, inputting an alignment network, and outputting an alignment offset and an adjacent relation among the images. According to the alignment relation between the images, an edge stitching pair list is generated, an optimal alignment path of an edge band is determined, the images are processed by combining a pre-trained semantic segmentation model, semantic categories of each pixel are determined, differential image stitching is performed, and a high-quality substation panoramic stitching image is generated.

Inventors

CHEN NING
SUN CHEN
ZHOU XIAOLAN
ZHAO CHENYU
CHEN KAI
ZHANG SHEN
JI SHANSHAN
ZHANG FENG
Ge Yuyou
CHANG HONGXIN
KONG YANLING
YANG XIN

Assignees

国网上海市电力公司

Dates

Publication Date: 20260512
Application Date: 20260112

Claims (10)

1. The substation scene image splicing and fusion method based on the end-to-end network is characterized by comprising the following steps of: Acquiring an image set covering the panorama of the transformer substation through a plurality of cameras distributed in different areas of the transformer substation; constructing an improved ResNet-50 network, extracting the characteristics of each substation image in the image set, and generating a multi-level characteristic image set of each image; Gradually fusing the multi-level feature image sets from top to bottom according to a network level, and generating a multi-scale feature pyramid of each image in an up-sampling and transverse connection mode; Determining candidate spliced edge band positions of each image based on a multi-scale feature pyramid of each image, and sampling edge band features on each scale to obtain an edge band feature set of each image; based on the edge band feature set of each image, calculating the similarity of the edge band features among different images, constructing a feature matching cost body, inputting the feature matching cost body into an alignment network, and outputting the alignment offset and the adjacent relation among the images; determining the edge bands of each image to be spliced with the edge bands of the adjacent images according to the alignment offset and the adjacent relation between the images, and generating an edge splicing pair list which is arranged according to the splicing order; Processing the image set based on a pre-trained semantic segmentation model, and determining semantic categories of pixels at the edge of each image; And performing differential image stitching on the semantic categories of each pixel of the list and the edge zone based on edge stitching to obtain a panoramic stitching image of the transformer substation.
2. The method for splicing and fusing the images of the transformer substation scene based on the end-to-end network according to claim 1, wherein a partial overlapping area exists between the images of each transformer substation in the image set.
3. The substation scene image stitching fusion method based on the end-to-end network according to claim 1, wherein the improvement of the improved ResNet-50 network comprises the steps of introducing a pyramid convolution module into residual blocks of the ResNet-50 network, extracting different scale space features through multi-scale convolution kernels, and embedding an efficient channel attention module into output ends of the residual blocks.
4. The method for splicing and fusing the substation scene images based on the end-to-end network according to claim 1, wherein the feature extraction is performed on each substation image in the image set to generate a multi-level feature image set of each image, specifically comprising: Inputting each substation image into the improved ResNet-50 network, and carrying out feature preprocessing through an initial convolution layer and a pooling layer to obtain a basic feature map; sequentially inputting the basic feature map into a plurality of residual blocks comprising a pyramid convolution module and a high-efficiency channel attention module, wherein the pyramid convolution module is used for extracting multi-scale space features through convolution kernels with different sizes, and the high-efficiency channel attention module is used for weighting in channel dimensions; Outputting multiple feature graphs with different resolutions at different levels of ResNet-50 networks to form a multi-level feature graph set from low-level detail features to high-level semantic features And obtaining a multi-level feature map set of each substation image, wherein the multi-level feature map set comprises feature maps of different levels from a ResNet-50 improved network, the feature maps comprise low-level detail features, middle-level texture features and high-level semantic features, and each layer of feature map has different spatial resolutions and channel numbers.
5. The method for splicing and fusing the transformer substation scene images based on the end-to-end network according to claim 1, wherein the step-by-step fusion of the multi-level feature image set from top to bottom is performed according to a network level, and a multi-scale feature pyramid of each image is generated by means of up-sampling and transverse connection, specifically comprising the following steps: the multi-level feature set includes feature maps from four phases of a modified ResNet-50 network bottom-up path Wherein For the highest resolution low-level detail feature map, And In order to be a texture feature map of the middle layer, Each feature map is processed by a pyramid convolution module and a high-efficiency channel attention module; From high-level semantic feature graphs Initially, layer-by-layer texture feature map And a low level detail feature map Up-sampling is carried out; Up-sampled high-level semantic feature graph The method comprises the steps of transversely connecting the low-level feature images with corresponding low-level feature images, processing the low-level feature images by a pyramid convolution module and a high-efficiency channel attention module before transversely connecting the low-level feature images, adjusting the number of channels through convolution operation to ensure consistent dimension of fusion feature, and then adding the feature images after transversely connecting element by element to generate the fusion feature images; Repeating the upsampling, transverse connection and fusion operations to sequentially generate each layer of feature map of the multi-scale feature pyramid Wherein The fusion result of the corresponding low-level detail feature map contains rich local detail information, The fusion result of the corresponding high-level semantic feature map contains strong semantic information; highest level fusion feature map in multi-scale feature pyramid Introducing a cavity space pyramid pooling module, and carrying out cavity convolution and global average pooling by using different expansion rates; will be finally generated As a multi-scale feature pyramid 。
6. The method for splicing and fusing the scene images of the transformer substation based on the end-to-end network according to claim 1, wherein the method for splicing and fusing the scene images of the transformer substation based on the end-to-end network is characterized by determining candidate splicing edge band positions of each image and sampling edge band features on each scale to obtain an edge band feature set of each image, and specifically comprises the following steps: For each substation image in the image set At its corresponding multi-scale feature pyramid Determining candidate spliced edge band areas based on space boundary information of an input image, wherein the edge band areas are band-shaped areas which are close to the peripheral edges of the image and have overlapping probability; mapping the edge band region into each scale feature map to obtain an edge band region set under different scales Wherein Representing a feature hierarchy; At each scale feature map In the corresponding edge zone Feature sampling is carried out, feature vectors of pixel positions in the edge band region are extracted, and an edge band feature set is formed Expressed as: Wherein, the Representing the characteristic vector of the ith transformer substation image at the pixel x under the scale l; representing a candidate spliced edge zone area of an ith transformer substation image under a scale l; and representing the edge band characteristic set of the ith transformer substation image under the scale l.
7. The method for splicing and fusing the transformer substation scene images based on the end-to-end network according to claim 1, wherein the method for splicing and fusing the transformer substation scene images based on the end-to-end network is characterized by calculating similarity of edge band features among different images and constructing a feature matching cost body, inputting the feature matching cost body into an alignment network and outputting alignment offset and adjacent relation among the images, and specifically comprises the following steps: For any transformer substation image i and j and any scale layer l, taking an edge band feature set of the ith transformer substation image Edge band feature set for jth substation image ; On the same scale l, constructing an initial matching cost body based on cosine similarity The formula is: Wherein, the Representing an initial matching cost body between an ith transformer substation image pixel x and a jth transformer substation image pixel y on a scale l; linearly combining the obtained initial matching cost body and the local structure consistency to form a texture perception matching cost body The calculation formula is as follows: Wherein, the Texture perception matching cost bodies weighted by structural similarity on the scale l; Is a balance coefficient; Is a structural similarity index formula; the structural similarity index formula is expressed as: Wherein, the 、 Respectively the feature vectors And (3) with Is a local mean of (2); Respectively the feature vectors And (3) with Is a local variance of (2); Is that And (3) with Is a local covariance of (2); 、 Is a preset constant; stacking texture perception matching cost volumes of all scales according to a preset arrangement rule to form a cost volume tensor And will The alignment network is of a network structure based on three-dimensional convolution, and outputs a corresponding alignment offset field and an image adjacency score after passing through a plurality of three-dimensional convolution layers, normalization layers and nonlinear activation layers; the alignment network output includes, for each scale l and each pixel x, outputting an alignment offset Confidence map with corresponding matches Wherein the offset is aligned Spatial offset vector representing pixel x of ith substation image relative to jth substation image under scale l, matching confidence mapping Representing the matching strength or likelihood of adjacency for that location; inferring adjacency relationships between different images based on the output alignment offset and the matching confidence and generating an adjacency relationship set A set of candidate paired pixels or list of edge band pairs for image i and image j.
8. The method for splicing and fusing the scene images of the transformer substation based on the end-to-end network according to claim 7, wherein determining which adjacent image edge band the edge band of each image should be spliced with according to the alignment offset and the adjacent relation between the images, and generating an edge splicing pair list arranged according to the splicing order, specifically comprises: for the set of adjacencies Each pair of (a) Based on paired pixel sets And corresponding multi-scale information calculation statistics including pairing scores Overlap ratio And offset consistency metrics The pairing score The overlapping ratio is formed according to the summarized value of the matching confidence coefficient at the matched pixel Based on the ratio of the estimated number of overlapping pixels to the reference area, the offset uniformity metric Forming according to discrete statistics of alignment offset fields at paired pixels; screening adjacent pairs according to statistics and preset thresholds, and reserving the adjacent pairs meeting the preset thresholds to form a primary candidate splicing pair set; Candidate stitching pairs with mutual exclusion relation in the same image or the same region in the preliminary candidate stitching pair set are scored according to pairing The method comprises the steps of executing conflict resolution, selecting candidate pairs with highest priority or score, and eliminating conflict items to obtain a conflict-free edge splicing pair list, wherein the edge splicing pair list comprises a plurality of splicing pairs, and each splicing pair comprises two image identifications participating in splicing, an edge band identification respectively participating in splicing and an alignment offset.
9. The substation scene image stitching fusion method based on the end-to-end network according to claim 1, wherein the semantic category comprises a background area and a device area.
10. The method for splicing and fusing the scene images of the transformer substation based on the end-to-end network according to claim 1, wherein the method for splicing and fusing the scene images of the transformer substation based on the edge splicing performs differential image splicing on the semantic categories of the list and the pixels of the edge zone to obtain a panoramic spliced image of the transformer substation specifically comprises the following steps: Sequentially reading edge splicing pairs among the images according to the edge splicing pair list and the splicing order; aiming at each splicing pair, according to the semantic category of pixels in the edge zone, respectively adopting a strategy of the smooth transition of the device region preferential splicing and the background region to carry out pixel-level splicing; In the equipment area, adopting a pixel fusion mode of semantic consistency constraint, and carrying out pixel value weighted average according to pixel pairs with consistent semantic categories between adjacent images; In the background area, adopting a fusion mode based on weighted feathering to carry out weighted smooth transition on pixel values in the overlapping area; And after the fusion treatment of all the splicing pairs is sequentially completed, combining all the spliced local areas according to the splicing sequence to generate a complete panoramic splicing image of the transformer substation.

Description

Substation scene image stitching fusion method based on end-to-end network Technical Field The invention belongs to the technical field of image processing, and particularly relates to a substation scene image stitching fusion method based on an end-to-end network. Background In modern power systems, substations serve as important hinges for power transmission and distribution, the operating state of which is directly related to the safety and stability of the whole power system. In order to realize comprehensive monitoring and management of the transformer substation, acquiring high-precision image information covering the whole scene becomes a basic task. However, due to the limited installation angle and field of view of the camera device, a single camera can only generally collect images of local areas of the transformer substation, and it is difficult to completely reflect the spatial layout and equipment distribution of the whole transformer substation. For this purpose, it is necessary to stitch the partially overlapping images acquired by the plurality of cameras to generate a complete substation panoramic image. The existing image stitching method mainly comprises a traditional algorithm based on feature point matching and an end-to-end stitching algorithm based on deep learning. The conventional method generally adopts a Scale Invariant Feature Transform (SIFT) algorithm, an acceleration robust feature (SURF) algorithm or other algorithms based on adaptive threshold and multi-scale feature extraction, and realizes image registration and stitching by detecting and matching feature points between images. The method can obtain a good splicing effect in a conventional scene, but has obvious defects in complex industrial environments such as a transformer substation. The number of devices in a transformer substation scene is large, the structure is complex, the appearance similarity is high, and the device is influenced by factors such as illumination change, reflection interference and the like, so that the accuracy of feature point detection and matching is reduced, and the splicing accuracy and the image continuity are further influenced. In addition, the traditional algorithm has large calculated amount and strong dependence on parameter adjustment, and is difficult to meet the requirements of real-time monitoring and large-scale data processing. In recent years, with the development of deep learning technology, an end-to-end image stitching method based on a Convolutional Neural Network (CNN) is gradually applied to the field of image fusion. For example, in the prior art, chinese patent CN106952220A discloses a panoramic image fusion method based on deep learning, which relates to the technical field of image stitching and comprises the following steps of S1, constructing a deep learning training data set, S2, constructing a convolutional neural network model, including S201, S202 and S203, S201, constructing a deep convolutional neural network model, S202, setting convolutional sampling layer parameters, S203, training a deep convolutional neural network by using a training data set, S3, obtaining a fusion area of the testing data set based on the testing data set and the trained deep convolutional neural network model, and the method can automatically extract multi-level features of an image by constructing a multi-level neural network structure, thereby realizing automatic optimization of feature matching and stitching parameters, reducing manual intervention and improving stitching efficiency. Part of the research also introduces the generation of a countermeasure network (GAN) to optimize the stitching results to promote the visual consistency and quality of the stitched images. However, most of the existing end-to-end splicing methods are designed in a general scene, and are not optimized for the complex structural characteristics of the transformer substation. In a transformer substation image, the edge profile of equipment is complex, the problems of high reflection, shielding, geometric distortion and the like exist, and the phenomena of obvious splicing seams, distortion of equipment geometric structures and the like are often caused by directly applying a general splicing network. In addition, the existing method lacks modeling on structural characteristics and spatial topological relation of the power equipment, and is difficult to maintain the consistency of the equipment form and space in the splicing process. Therefore, there is a need for an end-to-end network structure that can combine multi-scale feature extraction, edge feature matching and semantic segmentation, not only can realize high-precision edge alignment of multi-view images of a transformer substation, but also can distinguish a device region from a background region in a stitching process, thereby obtaining panoramic stitching images of the transformer substation with consistent structure and continuous texture. Disc