CN-121660920-B - Visual perception and regulation method suitable for inspection of wheel-foot robot

CN121660920BCN 121660920 BCN121660920 BCN 121660920BCN-121660920-B

Abstract

The invention provides a visual perception and regulation method suitable for inspection of a wheel-foot robot, and belongs to the technical field of image processing. The method comprises the steps of collecting an original weak light image of industrial equipment, inputting the original weak light image into a weak light image enhancement model for image enhancement, and processing the weak light image enhancement model, wherein the processing process of the weak light image enhancement model comprises the steps of inputting an image into U-net network prediction noise, denoising the original image based on the prediction noise to obtain a reconstructed image serving as an enhanced image, mapping the reconstructed image from a space domain to a frequency domain to obtain a frequency domain representation, carrying out frequency division on the frequency domain representation, distributing a leavable weight for each frequency band, and constructing a frequency domain constraint item based on a weighting result. The method introduces an uncertainty network in the inversion stage of the diffusion model, adaptively adjusts the weights of a noise branch and a residual branch according to pixel-level uncertainty and time steps, strengthens denoising in a high noise area, reserves a structure in a detail area, enhances the balance of image details and brightness, naturally smoothes output, and realizes dynamic balance of different areas and stages.

Inventors

SONG TAO
XIE BO
ZHANG WENMING
LI YAQIAN

Assignees

燕山大学

Dates

Publication Date: 20260508
Application Date: 20251215

Claims (7)

1. The visual perception and regulation method suitable for the inspection of the wheel-foot robot is characterized by comprising the following steps of: Collecting an original weak light image of industrial equipment; Inputting an original weak light image into a weak light image enhancement model for image enhancement; the weak light image enhancement model processing process comprises the following steps: inputting the image into U-net network prediction noise; denoising the original image based on the prediction noise to obtain a reconstructed image serving as an enhanced image; frequency division is carried out on the frequency domain representation, leachable weights are distributed for each frequency band, and a frequency domain constraint item is constructed based on a weighting result; The U-net network comprises: replacing a characteristic between an encoder and a decoder of the U-net network by a lightweight kernel activation function module to extract a convolution residual block in the backbone network; The lightweight core activation function module comprises: channel selection and nonlinear mapping are carried out on local detail features to obtain feature results fusing linear and nonlinear output, and the formula is expressed as follows: Wherein, the The characteristic results of linear and nonlinear output are fused; Selecting masks for channels only Applying a non-linear mapping to the channels of the plurality of channels to reduce computational cost; for element-by-element multiplication operations; providing a learnable nonlinear transformation capability for KAN kernel function mapping when When the corresponding channel is subjected to KAN nonlinear transformation When the channel remains linear; Wherein, KAN kernel function: Wherein, the Is the characteristic value of the input channel; Is the first Segment spline weights; is the segment node position; Is the number of segments; a linear rectification function, which enables the mapping to have nonlinearity and sparsity; frequency dividing the frequency domain image and assigning a learnable weight to each frequency band, comprising: the loss weights for each band are dynamically adjusted by the attention mask, expressed as: Wherein, the For the channel-averaging pooling feature, Maximizing pooling characteristics for channels; are all the parameters which can be learned, Is a matrix of weights of the convolution, Is a corresponding bias term, different Corresponding to different frequency bands; is a mapping function; is a learnable spatial mask for dynamically adjusting the loss weight of each band And in the frequency-domain constraint, The loss calculation is not entered directly.
2. The method of claim 1, wherein denoising the original image based on the prediction noise to obtain a reconstructed image comprises: Wherein, the Is the first Reconstructing an image; Self-adaptive scheduling weights for residual branches; The current step noise proportion; is prediction noise; Is the prediction residual.
3. The method of claim 1, wherein the training process of the low-light image enhancement model comprises: training the process of obtaining the reconstructed image in a first stage; The first stage loss function is: Wherein, the The term is reconstructed for the pixel and, Is a structure holding item; Is a frequency domain constraint; Training the whole weak light image enhancement model in the second stage, wherein the loss function in the second stage is as follows: Wherein: Reconstructing a loss for the image; Is a frequency domain constraint; Regularization is guided for uncertainty.
4. A method according to claim 3, wherein the frequency domain constraints: Wherein, the Low/medium/high frequency band; Is composed of Dynamically adjusted band weights; Is a phase balance parameter; the brightness and contrast information is contained for the amplitude spectrum; the phase spectrum contains structure and edge information; Representing the diffusion model at the first -1 A reconstructed image generated in step; Representing a reference original image; 、 respectively is the first The magnitude spectrum of the image under each frequency band reflects the brightness and contrast distribution information; 、 respectively is the first The phase spectrum of the image under each frequency band reflects the structure, edge and texture information; Mask by spatial attention By linear mapping functions The generated frequency band weight coefficient is used for dynamically adjusting the contribution of each frequency band in loss; The phase balance parameter is used for controlling the relative weight of two loss parts of amplitude and phase; Representation of Norms, which measure absolute distance of spectral differences.
5. The method of claim 3, wherein the uncertainty directs regularization: Wherein if it If the high result model is not believed, the regularized item has heavy weight and the forced prediction noise is consistent with the real noise Low results in areas where the model has stabilized, reducing constraints, preventing overfitting.
6. A method according to claim 3, wherein the image reconstruction is lost: Wherein, the A reconstructed image obtained by inversion of the model is obtained; inputting an image for original weak light; Is the first Step C, adding the noisy image; outputting for residual branches; a pixel level uncertainty map for a corresponding time step; The variance of the uncertainty graph is used for measuring the degree of dispersion of the prediction confidence; Each balance weight is used for coordinating pixel consistency, residual consistency and uncertainty stability; the brightness of the inversion image is guaranteed to be consistent with that of the real image for the pixel consistency item; as a residual consistency item, the prediction residual of the constraint model is close to the real residual, and the reconstruction capability of structural details is improved; and as an uncertainty stable term, the variance of the pixel-level uncertainty graph is suppressed, and the stability of the sampling process is ensured.
7. A method according to claim 3, wherein the structure-holding item comprises: Wherein, the Is a structure-keeping term for keeping the edge of the original image consistent with the structure during enhancement; A gradient map representing a reconstructed image at a diffusion inversion stage; a gradient map representing an input low-light image; Is characterized by Norms for measuring gradient differences.

Description

Visual perception and regulation method suitable for inspection of wheel-foot robot Technical Field The invention relates to the technical field of image processing, in particular to a visual perception and regulation method suitable for inspection of a wheel-foot robot. Background The belt conveyor is used as core transportation equipment and is widely applied to industrial scenes such as mines, metallurgy, chemical industry, ports and the like. Due to the common problems of insufficient illumination, uneven illumination, dust dispersion, equipment shielding and the like in the field environment, the inspection image has the characteristics of low brightness, poor contrast, obvious noise, missing details and the like, and the recognition precision and the system stability of key faults such as deviation, tearing, material scattering and the like are directly influenced. In recent years, mobile robots are gradually applied to industrial inspection by virtue of the fusion capability of autonomous navigation and multiple sensors, but the carried vision system still faces the problem of image quality degradation under the condition of weak light and non-uniform illumination, and becomes one of main bottlenecks restricting the improvement of intelligent inspection performance. In the field of weak light image enhancement, the prior art is mainly divided into three types, namely a traditional image processing method based on gray level transformation, histogram equalization and the like, wherein the first type has higher calculation efficiency, but is easy to cause excessive enhancement, color distortion and noise amplification and difficult to adapt to complex industrial scenes, the second type has insufficient robustness to non-uniform illumination and contradiction between detail recovery and noise suppression based on a Retinex theory method through an illumination-reflection decomposition model, and the third type has the advantages that the performance in a general scene is superior to that of the traditional method through end-to-end training, but the problems of weak cross-domain generalization capability, poor non-uniform illumination adaptability, high edge calculation resource occupation and the like still exist in the specific scene of industrial inspection. The method comprises the following steps of firstly, locally obviously changing the brightness difference of an image caused by the shadow of a belt conveyor protective cover and equipment, globally enhancing the image, easily causing overexposure of a bright area and detail loss of a dark area, secondly, synchronously amplifying frequency domain noise caused by dust scattering in the enhancement process, and causing texture artifacts, thirdly, having limited platform force, and being difficult for the existing complex network to meet the real-time processing requirement. Therefore, a solution that combines enhancement, noise suppression and computational efficiency is needed to support reliable visual perception in industrial inspection scenarios. Disclosure of Invention In view of the above, the invention provides a visual perception and regulation method suitable for inspection of a wheel-foot robot, which is used for distributing dynamic proportion to noise update through an uncertainty network in a diffusion stage, predicting noise through a lightweight U-net network in an image reconstruction stage, realizing the purpose of ensuring image quality, simultaneously remarkably reducing calculation complexity and reasoning delay, and being suitable for industrial equipment inspection requirements. For this purpose, the invention provides the following technical scheme: A visual perception and regulation method suitable for inspection of a wheel-foot robot comprises the following steps: Collecting an original weak light image of industrial equipment; Inputting an original weak light image into a weak light image enhancement model for image enhancement; the weak light image enhancement model processing process comprises the following steps: inputting the image into U-net network prediction noise; denoising the original image based on the prediction noise to obtain a reconstructed image serving as an enhanced image; And frequency division is carried out on the frequency domain representation, a leachable weight is allocated for each frequency band, and a frequency domain constraint term is constructed based on a weighted result. Further, the U-net network includes: replacing a characteristic between an encoder and a decoder of the U-net network by a lightweight kernel activation function module to extract a convolution residual block in the backbone network; The lightweight core activation function module comprises: channel selection and nonlinear mapping are carried out on local detail features to obtain feature results fusing linear and nonlinear output, and the formula is expressed as follows: Wherein, the The characteristic results of linear a