CN-121980045-A - Decoupling characterization learning method and system for classified retrieval of power equipment

CN121980045ACN 121980045 ACN121980045 ACN 121980045ACN-121980045-A

Abstract

The invention discloses a decoupling characterization learning method and a decoupling characterization learning system for power equipment classification retrieval, comprising the steps of collecting power equipment source domain images, labeling equipment types and environment descriptions, constructing a power equipment image dataset, constructing a double-branch visual encoder comprising environment branches and content branches, jointly inputting environment embedded and content embedded as environment conditioning prompt generation networks to obtain environment prompts and content-type prompts, splicing the obtained environment prompts and content-type prompts to construct a final dynamic prompt sequence, training a model to obtain an optimized model, generating the dynamic prompt sequence by using the optimized model as a prompt vector, comparing and sequencing the similarity of the prompt vector and vectors stored in a database, and outputting the best matched result as a retrieval result.

Inventors

WANG XIN
SHI WEIHAO
LI ZHOU
HU HAO
RUAN ZHAOWEN
Huang Houzhu
ZHANG SHUJUAN
SUN WEI

Assignees

国网安徽省电力有限公司电力科学研究院

Dates

Publication Date: 20260505
Application Date: 20260122

Claims (10)

1. A decoupling characterization learning method for classification retrieval of electrical equipment, comprising: S1, acquiring a power equipment source domain image, labeling equipment types and environment description, and constructing a power equipment image data set; S2, constructing a double-branch visual encoder comprising an environment branch and a content branch, wherein the environment branch is used for extracting environment embedding contained in an image, the content branch is used for extracting content embedding contained in the image, and a joint loss function is designed; S3, the environment embedding and the content embedding are used as joint input of an environment conditioning prompt generation network to obtain an environment prompt and a content-category prompt, and the obtained environment prompt and the content-category prompt are spliced to construct a final dynamic prompt sequence; And S4, carrying out weighted fusion on the combined loss function and the contrast learning target loss function to serve as a total loss function, training the model, stopping training to obtain an optimized model when the total loss function is minimum, generating a prompt vector by using a text encoder when the input is text, generating a dynamic prompt sequence by using the optimized model to serve as the prompt vector when the input is an image in a retrieval stage, carrying out similarity comparison and sequencing on the prompt vector and the vectors stored in the database, and outputting the best matched result as a retrieval result.
2. The decoupling characterization learning method for power device classification retrieval of claim 1, wherein S1 comprises: s11, collecting multiple types of image samples of the power equipment, and labeling each sample with a type label and an environment description, wherein the type label comprises a transformer, a circuit breaker and a lightning arrester, and the environment description comprises color, illumination and weather; s12, performing size standardization, random cutting, overturning and brightness adjustment on all samples; and S13, distinguishing samples with consistent content and different environments from samples with consistent environment and different content from all samples, and constructing a complete power equipment image data set.
3. A method of decoupling characterization learning for power device classification retrieval as defined in claim 1, wherein constructing a dual-branch visual encoder including an environmental branch and a content branch comprises: S21, inputting the image Feeding a pre-trained CLIP visual encoder to obtain a generic visual feature representation: where i represents the i-th sample, Is a frozen base visual encoder; S22, establishing an asymmetric double-branch visual encoder structure comprising an environment branch and a content branch, wherein the environment branch adopts a shallow convolution mapping network to extract low-layer outer layer information of an image to obtain environment embedding , Representing shallow convolution mapping network, extracting high-level semantic features of equipment main body structure by using deep convolution network to obtain content embedding , Representing a deep convolutional network.
4. A decoupling characterization learning method for power device classification retrieval as defined in claim 3, wherein the designing a joint loss function comprises: S23, applying content aggregation constraint to sample pairs of the same device content: Wherein, the Representing the loss of the content aggregation constraint, Representing the square of the L2 norm, N is the number of samples per batch for the content feature center vector; Applying an environmental aggregation constraint to a sample pair of the same environment: Wherein, the Representing the loss of the constraint of the environment aggregation, Is an environmental feature center vector; S24, introducing orthogonal constraint terms of content and environment , wherein, Represents an L1 norm; S25, joint loss function is , wherein, 、 The balance coefficient for each loss term.
5. The decoupling characterization learning method for power device classification retrieval of claim 4, wherein the generating process of the dynamic prompt sequence is as follows: explicitly designing multi-layer outputs in the environment branches, extracting feature vectors from different layers: , wherein, Is from Is characterized by the shallow layer characteristics of (a), reflecting the local color; Is from Is characterized by the middle layer of the background texture, reflecting the regional background and the background texture; Is from Reflecting the overall viewing angle and the environmental semantics; each layer is projected through the corresponding environment Projecting to the prompt space to form a multi-level environment prompt vector set = ( ) Wherein the environment projects the network The use of the MLP mapping layer is made, Respectively is Corresponding environment prompt vectors; For each sample class Embedding a category template constructed by combining the sample and the category label corresponding to the sample Embedding the category templates and content Input to content mapping module Get content-category cues Content mapping module Adopts bilinear fusion; Combining the sample and the environment description corresponding to the sample to construct a text environment template for embedding, and enabling the text environment template for embedding and the multi-level environment prompt vector set to pass through an environment mapping module Obtaining an environmental prompt Splicing the obtained environment prompt and the content-category prompt to construct a final dynamic prompt sequence 。
6. The method for learning a decoupling characterization for power device classification retrieval of claim 5, wherein the designing a visual-text based contrast learning objective loss function based on CLIP comprises: Wherein, the Representing a contrast learning objective loss function, Text embedding and image embedding for the i-th sample respectively, The temperature coefficient is represented by a temperature coefficient, Representing similarity calculations.
7. The method for learning decoupling characteristics for power device classification retrieval of claim 6, wherein said total loss function is Wherein, the Representing the weighting coefficients.
8. A decoupling characterization learning system for classification retrieval of electrical devices, comprising: the data acquisition module is used for acquiring the source domain image of the power equipment, labeling equipment types and environment descriptions and constructing an image data set of the power equipment; The environment-content decoupling module is used for constructing a double-branch visual encoder comprising an environment branch and a content branch, wherein the environment branch is used for extracting environment embedding contained in an image, the content branch is used for extracting content embedding contained in the image, and a joint loss function is designed; The prompt sequence generation module is used for jointly inputting environment embedding and content embedding serving as an environment conditioning prompt generation network to obtain an environment prompt and a content-category prompt, splicing the obtained environment prompt and the content-category prompt to construct a final dynamic prompt sequence; And in the retrieval stage, when the input is text, a text encoder is used for generating a prompt vector, when the input is an image, a dynamic prompt sequence is generated by using the optimized model as the prompt vector, the prompt vector is subjected to similarity comparison and sequencing with vectors stored in a database, and the best matched result is output as a retrieval result.
9. The decoupling characterization learning system for power device classification retrieval of claim 8, wherein the data acquisition module is further configured to: s11, collecting multiple types of image samples of the power equipment, and labeling each sample with a type label and an environment description, wherein the type label comprises a transformer, a circuit breaker and a lightning arrester, and the environment description comprises color, illumination and weather; s12, performing size standardization, random cutting, overturning and brightness adjustment on all samples; and S13, distinguishing samples with consistent content and different environments from samples with consistent environment and different content from all samples, and constructing a complete power equipment image data set.
10. A decoupling characterization learning system for power device classification retrieval as claimed in claim 8, wherein said constructing a dual-branch visual encoder including an environmental branch and a content branch comprises: S21, inputting the image Feeding a pre-trained CLIP visual encoder to obtain a generic visual feature representation: where i represents the i-th sample, Is a frozen base visual encoder; S22, establishing an asymmetric double-branch visual encoder structure comprising an environment branch and a content branch, wherein the environment branch adopts a shallow convolution mapping network to extract low-layer outer layer information of an image to obtain environment embedding , Representing shallow convolution mapping network, extracting high-level semantic features of equipment main body structure by using deep convolution network to obtain content embedding , Representing a deep convolutional network.

Description

Decoupling characterization learning method and system for classified retrieval of power equipment Technical Field The invention relates to the field of computer vision and machine learning, in particular to a decoupling characterization learning method and system for power equipment classification retrieval. Background The power equipment image classification retrieval is one of key technologies in the visual operation and maintenance of the smart grid, and is widely applied to the fields of equipment identification, defect detection, state monitoring and the like. The technology realizes the classified retrieval of the power equipment by comparing the characteristics of the text and the field acquisition image with the database sample image. However, the power field environment is complex and changeable, and the image acquisition process is easily influenced by factors such as illumination, weather, shooting angles, camera types and the like, so that the same equipment presents obvious appearance differences in different scenes. The variation of the appearance environment and the imaging condition can cause inconsistent image characteristic distribution, thereby affecting the identification accuracy of the model to the content of the equipment body. The existing method for solving the problem is difficult to simultaneously consider the stability of the characteristics and the self-adaptability of the prompt. The conventional methods still have the defects that the content is mixed with the background, the environment and the imaging condition characterization, the conventional visual trunk is difficult to effectively separate the content characteristics related to the equipment body from the environment and the imaging condition characteristics, the environment and the imaging condition are caused to interfere with the retrieval decision, for example, the method disclosed in Chinese patent publication No. CN118097442A for the generalization of the identification of a few-sample remote sensing target by the diversity prompt learning only has one CLIP visual encoder to output the image characteristics, the content characteristics and the environment characteristics are not distinguished, the prompt learning is completely carried out on a text side, the image side is still represented by a single characteristic mixed with the content and the environment factors, in addition, the prompt space in the prior art does not carry out explicit conditioning on the environment and the imaging condition, the prompt adaptation cannot be carried out by fully utilizing the environment information, and the generalization capability is limited. Therefore, a hybrid method is needed to realize the separation of content and environment in the visual characteristic layer and realize the environment self-adaption and domain self-adaption in the prompt space, so that the robustness and the cross-environment generalization performance of the power equipment image retrieval in a complex environment are improved. Disclosure of Invention The invention aims to solve the technical problem of improving the robustness and cross-environment generalization performance of the image retrieval of the power equipment in a complex environment. The invention solves the technical problems by the following technical means that the decoupling characterization learning method for the classified retrieval of the power equipment comprises the following steps: S1, acquiring a power equipment source domain image, labeling equipment types and environment description, and constructing a power equipment image data set; S2, constructing a double-branch visual encoder comprising an environment branch and a content branch, wherein the environment branch is used for extracting environment embedding contained in an image, the content branch is used for extracting content embedding contained in the image, and a joint loss function is designed; S3, the environment embedding and the content embedding are used as joint input of an environment conditioning prompt generation network to obtain an environment prompt and a content-category prompt, and the obtained environment prompt and the content-category prompt are spliced to construct a final dynamic prompt sequence; And S4, carrying out weighted fusion on the combined loss function and the contrast learning target loss function to serve as a total loss function, training the model, stopping training to obtain an optimized model when the total loss function is minimum, generating a prompt vector by using a text encoder when the input is text, generating a dynamic prompt sequence by using the optimized model to serve as the prompt vector when the input is an image in a retrieval stage, carrying out similarity comparison and sequencing on the prompt vector and the vectors stored in the database, and outputting the best matched result as a retrieval result. According to the invention, explicit decoupling of the content