CN-121982637-A - Photovoltaic image identification method based on characterization hybrid guidance and hierarchical semantic coordination
Abstract
The invention provides a few-sample photovoltaic defect image identification method based on characterization hybrid guidance and hierarchical semantic coordination. Step one, based on the CLIP, a learnable prompt vector is introduced, an output representation is obtained, and cross entropy loss is calculated. Step two, designing a characterization mixing guide module, performing independent characterization mixing corresponding to the category in each mode to expand a characterization space, and introducing self-adaptive consistency constraint to reduce distribution disturbance. Step three, designing a hierarchical semantic coordination mechanism, embedding lightweight modulation units in multiple layers in a visual and text encoder, and constructing a shared modulation unit to promote cross-branch deep semantic alignment. And step four, introducing an auxiliary discrimination branch based on ResNet to provide auxiliary supervision. And fifthly, through combined optimization of main classification, characterization mixing, consistency constraint and auxiliary discrimination loss, the high-efficiency adaptation of the few-sample photovoltaic defect image recognition task is realized under the condition of avoiding full fine tuning, and the method has the advantages of low training cost, strong generalization capability and easiness in deployment.
Inventors
- ZHANG BING
- WANG SHENGSHENG
Assignees
- 吉林大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260108
Claims (4)
- 1. A photovoltaic image identification method based on characterization hybrid guidance and hierarchical semantic coordination is characterized by at least comprising the following steps: Step one, constructing a photovoltaic defect multi-mode characterization based on a visual language pre-training model. And meanwhile, constructing a text prompt template containing the defect class, splicing the text prompt template with the learnable text prompt, inputting the text prompt template into the text encoder to obtain a corresponding class-level text characterization, and calculating cross entropy classification loss based on the image characterization and the text characterization. The resulting hierarchical representation is output along with the final visual and textual representations and the classification penalty. And step two, based on the visual language multi-mode characterization constructed in the step one, providing a characterization mixed guiding mechanism for enhancing visual language characterization. The visual representation and the text representation obtained in the first step are input, weighted linear interpolation is carried out on the visual representations in the same training batch in the training stage, the mixed visual representations are generated, and meanwhile, the corresponding text semantic representations are synchronously mixed according to the mixing mode of the visual representations. By introducing consistency constraint between the original characterization and the mixed characterization, limiting semantic offset, realizing balance of characterization distribution expansion and discrimination semantic stability, and outputting mixed enhancement loss and consistency constraint loss. And thirdly, providing a branch alignment optimization method based on hierarchical semantic coordination for further improving alignment precision between visual representation and photovoltaic defect semantics on the basis of completing visual language representation enhancement in the second step. And (3) inputting the hierarchical representation obtained in the step one, introducing a hierarchical semantic coordination mechanism into a multilayer structure of the visual encoder and the text encoder, and outputting the representation after optimization and adaptation to the visual and text branches for self-adaptive modulation and feedback by arranging a visual representation modulation unit, a text representation modulation unit and a cross-branch sharing representation modulation unit in different layers, thereby realizing fine granularity alignment of visual representation and photovoltaic defect text semantics on multiple layers. And fourthly, constructing a double-branch distinguishing structure consisting of a visual language model branch and a ResNet distinguishing branch. And inputting a photovoltaic defect image with defect type labeling information, completing classification and discrimination of defect types under supervision of the labeling information, and outputting auxiliary classification loss. And fifthly, carrying out weighted fusion on the mixed enhancement loss and the consistency constraint loss obtained by the representation mixing guide module in the step two and the auxiliary classification loss obtained in the step four, constructing a joint optimization objective function, completing model training through multi-loss collaborative constraint, and improving the stability and accuracy of photovoltaic defect image identification under the condition of few samples.
- 2. The photovoltaic image recognition method based on the characteristic mixed guiding and the hierarchical semantic coordination according to claim 1 is characterized in that a new characteristic mixed guiding method is provided in the second step to expand the characteristic space. On the basis of the representation acquisition in the step one, carrying out combined enhancement processing on the sample representations in the training batch, and carrying out representation recombination on the visual representations of the current sample and the visual representations of the other image samples in the same batch by distributing mixed weights for different sample representations to generate intermediate visual representation representations between the original sample representations. And according to the recombination relation of the visual representation, carrying out consistent representation combination on the text semantic representation corresponding to the visual representation so as to keep consistency of cross-modal branch semantic association. Through the similarity relation between the constrained combination characterization and the original characterization in the characterization space, semantic drift generated in the characterization combination process is restrained, so that the judgment capability of the model on the defect category is maintained while the characterization diversity is increased, and an enhancement loss term and a consistency constraint loss term for model optimization are constructed based on the characterization combination result.
- 3. The photovoltaic image recognition method based on the characteristic mixed guidance and the hierarchical semantic coordination according to claim 1 is characterized in that a visual language branch alignment optimization method based on the hierarchical semantic coordination is provided in the third step. In the representation extraction process in the step one, hierarchical semantic coordination adaptation structures are introduced into different representation levels of the visual coding path and the text coding path, so that multi-mode representation is adjusted and fused layer by layer. And introducing a visual side representation adjusting module aiming at visual representations output by each layer of the visual encoder, and carrying out hierarchical correction on the original visual representations. And introducing a text side representation adjusting module corresponding to the visual side aiming at text semantic representation output by each layer of the text encoder, and carrying out hierarchical correction on the text semantic representation. The cross-modal shared semantic representation adjusting module is constructed, visual representations and text representations are mapped to a unified semantic space, representation information from different modalities is adaptively fused according to the learnable weight parameters, and the fused shared semantic representations are fed back and injected into the visual branches and the text branches, so that cross-modal branch collaborative alignment under a multi-level semantic structure is realized, and matching consistency between the visual representations and photovoltaic defect text semantic representations is improved.
- 4. The photovoltaic image recognition method based on the characterization mixed guidance and the hierarchical semantic coordination according to claim 1, wherein a dual-path collaborative discrimination mechanism is provided in the fourth step, and the dual-path collaborative discrimination mechanism comprises a cross-modal discrimination path based on a visual language model and an auxiliary discrimination path based on a residual neural network ResNet. The cross-mode discrimination path realizes defect type discrimination by comparing the similarity between image characterization and defect type text semantic characterization, the auxiliary discrimination path receives a photovoltaic defect image with defect type labeling information as input, characterization extraction is carried out through a preset residual neural network ResNet, and defect types are predicted on the premise of having label basis so as to generate auxiliary discrimination signals for model training. The auxiliary judging path only carries out parameter updating on the classification decision layer, and the network layer for image characterization extraction keeps the parameters fixed, so that the judging coordination between the two paths is enhanced on the premise of not weakening the general information learned by the pre-training model.
Description
Photovoltaic image identification method based on characterization hybrid guidance and hierarchical semantic coordination Technical Field The invention relates to a method for classifying under the condition of few samples by using a method based on characterization hybrid guidance and hierarchical semantic coordination, in particular to an intelligent recognition and classification method and system for a surface defect image of a photovoltaic module. Background Photovoltaic (PV) systems are a green sustainable energy solution. With the increasing global demand for renewable resources, the duty cycle of photovoltaic power generation in energy structures is steadily increasing. Photovoltaic technology converts solar energy into clean, renewable electricity, significantly reduces dependence on fossil fuels, and effectively reduces negative impact on the environment. However, in the process of manufacturing, transporting and long-term outdoor operation of the photovoltaic module, various defects such as cracks, stains, bird droppings, electrical damage, hot spots and the like are extremely easy to generate due to environmental erosion and physical impact, and the defects can obviously reduce the power generation efficiency, shorten the service life of the module and even cause serious potential safety hazards such as fire disasters. Therefore, the surface defects of the photovoltaic module are detected and classified rapidly and accurately, and the method is very important to guaranteeing the stable operation of the photovoltaic power station and improving the power generation benefit. The existing photovoltaic defect detection method mainly depends on manual inspection or analysis means based on Infrared (IR), electroluminescence (EL) and other imaging modes. The manual detection efficiency is low, the subjectivity is strong, and the method is not suitable for large-scale photovoltaic power stations. The infrared imaging technology is difficult to detect the defect that no obvious temperature difference exists in imaging, and the imaging quality is easily interfered by environmental factors such as irradiation intensity, wind speed, surface reflection and the like. The electroluminescent imaging needs to apply electric excitation and control shooting conditions, cannot meet the outdoor rapid inspection requirement, and has limited application range. In contrast, the detection method based on the visible light (RGB) image has the advantages of low cost, flexible acquisition and the like, can be used for shooting by using a common camera or an unmanned aerial vehicle, is not strictly limited by imaging environment, and is more suitable for low-cost and large-scale quick inspection in an actual power station. Covers such as dust, snow and bird droppings on the surface of the photovoltaic module can obviously reduce the power generation efficiency, so that establishing an effective monitoring and cleaning strategy has important significance for improving the system performance, reducing the operation and maintenance cost and reducing the resource consumption. In recent years, deep learning methods such as convolutional neural networks and vision transformers obviously promote the development of photovoltaic defect detection technology, and effectively improve the automation level of defect identification. However, such methods generally rely on a large number of high quality labeling samples for training, and in actual photovoltaic scenarios, most defects occur less frequently, and sample acquisition costs are high, resulting in serious shortages of training data. Under the condition of few samples, the traditional deep learning model is easy to reduce classification performance due to overfitting. Large-scale visual language models (such as CLIP) obtain strong cross-modal understanding and generalization capability by performing joint pre-training on a large amount of image text data, and show great potential in few-sample tasks. However, as the model is mostly trained based on general visual data, and the photovoltaic module image has the characteristics of weak texture, subtle differences among classes, strong environmental noise and the like, the model has distribution differences with a general image recognition data set, and when the model is directly applied to professional industrial scenes such as photovoltaic defects, the problems of field deviation, insufficient representation expression of fine granularity, insufficient alignment of cross-mode semantics and the like can occur. In summary, in order to effectively adapt the CLIP of the large-scale visual language pre-training model to the task of classifying few samples of the photovoltaic defect image, we propose a method based on the characterization hybrid guidance and the hierarchical semantic coordination, so as to achieve the accurate and stable classification of the photovoltaic assembly defect image under the condition of few labeling samples.