CN-121985220-A - Micro-assembly two-stage automatic focusing method based on space-frequency combined image quality evaluation

CN121985220ACN 121985220 ACN121985220 ACN 121985220ACN-121985220-A

Abstract

The invention provides a micro-assembly two-stage automatic focusing method based on space-frequency combined image quality evaluation, which belongs to the field of micro-scale precise operation, adopts a double-camera micro-vision architecture design and mainly comprises four major core modules, namely WaveMamba-IQA definition evaluation model, horizontal camera large-scale automatic focusing, diagonal camera initial position calculation based on geometric priori and diagonal camera small-scale precise automatic focusing. In the focusing process, images acquired by a horizontal camera are input into WaveMamba-IQA models for reference-free definition scoring, the models are used for jointly modeling spatial domain and frequency domain features of the images, reliable definition scores are calculated for the images at different positions, the horizontal camera performs global optimization in a large search space, the optimal position is accurately positioned, the geometric relationship between the known microsphere size priori information and the double cameras is combined, the theoretical clear position of the inclined-path camera is deduced, the small-range search is performed in the peripheral area of the theoretical position, and finally the whole-course high-precision automatic focusing of the double cameras is realized.

Inventors

SUN MINGZHU
ZHANG JIANPENG
ZHAO XIN
Kang Tianbo

Assignees

南开大学

Dates

Publication Date: 20260505
Application Date: 20260316

Claims (10)

1. A micro-assembly two-stage automatic focusing method based on space-frequency combined image quality evaluation is characterized by comprising the following steps: S1, stably placing microspheres, and erecting a double-camera micro-vision framework according to the positions of the microspheres, wherein the double-camera micro-vision framework comprises a horizontal camera and an inclined-path camera; S2, building WaveMamba-IQA definition evaluation models, carrying out joint modeling on spatial domain and frequency domain characteristics of images, and calculating reliable definition scores for the images at different positions; S3, the horizontal camera executes large-range focusing, the horizontal camera adopts a covariance matrix self-adaptive evolution strategy to execute global optimization in a large search space, and the optimal position is precisely positioned so as to rapidly determine the optimal position of the camera; s4, after the horizontal camera is focused, focusing is further carried out on the inclined-path camera, and the clear imaging position of the inclined-path camera is estimated a priori by combining the microsphere size information and the position change of the horizontal camera; s5, carrying out small-range fine focusing of the inclined-path camera, and carrying out small-range searching by taking the estimated clear position of the inclined-path camera as the center in the small-range fine focusing of the inclined-path camera, so as to finally realize whole-course high-precision automatic focusing of the camera.
2. The micro-assembly two-stage automatic focusing method based on the space-frequency combined image quality evaluation of claim 1 is characterized in that in S1, the horizontal camera adopts a low-magnification design for global observation, the diagonal camera adopts a high-magnification design for capturing high-resolution edges of the microspheres, clear imaging is achieved in a diagonal view field, center fitting of the microspheres is completed, and a camera calibration parameter is combined to provide accurate geometric basis for subsequent space positioning.
3. The micro-assembly two-stage automatic focusing method based on the space-frequency combined image quality evaluation of claim 1, wherein in S2, the given input image is The WaveMamba-IQA model first images Divided into The patches which are not overlapped with each other are expressed as All the patches are fed into two complementary feature extraction branches in parallel, wherein the feature extraction branches comprise frequency domain branches and spatial semantic branches so as to respectively model frequency domain detail information and spatial semantic information closely related to image definition, and features from the two branches are spliced in a channel dimension to obtain a joint feature representation: ; Wherein, the For the frequency domain feature representation, Is a spatial semantic feature representation.
4. A micro-assembly two-stage automatic focusing method based on space-frequency joint image quality evaluation as defined in claim 3, wherein the frequency domain branches carry out multi-scale frequency decomposition on images through wavelet transformation and combine Transformer Layer to model high-frequency sub-bands so as to effectively capture edge and texture detail characteristics highly related to image definition and finally obtain frequency domain characteristic representation 。
5. The method for micro-assembly two-stage automatic focusing based on space-frequency joint image quality evaluation according to claim 4, wherein the step of extracting the frequency domain features in the frequency domain branches is as follows: 1) Performing discrete wavelet transform on each input patch, decomposing the patch into a low frequency sub-band and a plurality of high frequency sub-bands, thereby explicitly separating structural information and detail information of an image; let a certain input patch be expressed as The wavelet decomposition process thereof can be formally expressed as: ; Wherein, the Representing low-frequency approximation components, mainly comprising the overall brightness and structural information of the image 、 And Respectively representing high-frequency detail components in different directions, and intensively reflecting the edge, the texture and the fine structure change; 2) All the low-frequency and high-frequency sub-bands are spliced in the channel dimension and are uniformly fed into a transducer layer for feature modeling, and the process can learn the global dependency relationship among different frequency components while retaining multi-scale frequency information, so that frequency domain feature representation with more discriminant is extracted.
6. The micro-assembly two-stage automatic focusing method based on space-frequency joint image quality evaluation of claim 3, wherein the spatial semantic branches extract global spatial semantic features of images based on Vision Transformer, and splice and fuse intermediate features of layers 6 to 9 of Vision Transformer to form spatial semantic feature representation for enhancing the representation capability of a model on different sensitive field scale information 。
7. The micro-assembly two-stage automatic focusing method based on space-frequency combined image quality evaluation as claimed in claim 3, wherein in S2, the WaveMamba-IQA model is introduced into MDTA Mamba Block module for realizing efficient fusion of the frequency domain branch and space semantic branch characteristics, and the characteristics after fusion The method is characterized in that the method comprises the steps of sending the result to a MDTA Mamba Block module for global modeling to enhance modeling capability of long-range dependency, outputting final image definition scores through regression of a linear layer and a ReLU activation function for definition evaluation in an automatic focusing task, wherein the MDTA Mamba Block module comprises an MDTA Block unit and a Vision Mamba Block unit, the MDTA Bloc unit can be used for obviously reducing computational complexity while implicitly coding global context information and enhancing fusion characteristics from spatial domain and frequency domain branches at high efficiency, and the Vision Mamba Block unit is used for carrying out high-efficiency global modeling on the spatial frequency fusion characteristics, so that definition evaluation accuracy and robustness in the automatic focusing task are improved on the premise of ensuring computational efficiency.
8. The micro-assembly two-stage automatic focusing method based on the space-frequency combined image quality evaluation of claim 1 is characterized in that in S3, a WaveMamba-IQA model is adopted for carrying out definition evaluation, and CMA-ES is adopted for carrying out global optimization search on the camera position: let the position of the camera along the optical axis direction be At different times The acquired image is recorded as The WaveMamba-IQA model output score may be expressed as: , Wherein, the Representing the WaveMamba-IQA model, and the optimization process of the CMA-ES can be expressed as: ; Wherein, the And Respectively represent CMA-ES at the first Searching distribution mean and covariance matrix in multiple iterations for describing central position and distribution shape of candidate solution in current camera position search space Representing randomly sampling camera positions under the gaussian distribution to generate a set of candidate camera positions, for each candidate position Obtaining sharpness scores by capturing corresponding images and inputting WaveMamba-IQA models The score is a function of the fitness of the CMA-ES; And representing the candidate camera position with the highest definition score in the current round.
9. The micro-assembly two-stage automatic focusing method based on the space-frequency combined image quality evaluation of claim 1, wherein in S4, the step of clear imaging position priori estimation of the diagonal camera is as follows: 1) It is assumed that when the microsphere radius is When the horizontal camera and the inclined path camera reach clear imaging states, the clear positions of the corresponding inclined path cameras are recorded as The horizontal camera position is recorded as When the actual radius of the microsphere to be assembled becomes When the camera is in the clear position, the corresponding clear position and the corresponding horizontal position of the inclined camera are respectively recorded as And ; 2) Under ideal geometric conditions, the radial change of the microspheres causes the displacement of the top ends of the microspheres along the vertical direction, meanwhile, due to different placement positions of the microspheres, the position of the horizontal camera is correspondingly changed in the refocusing process, and the clear imaging position of the oblique-path camera is comprehensively considered by the two factors Can be approximated as: ; Wherein, the Represents a unit vector in the direction of the optical axis of the camera, Reflecting the overall geometric displacement of the system introduced by the change in the focus position of the horizontal camera, Representing the edge of the center of sphere caused by the radius change of the microsphere Displacement in the axial direction.
10. The method for micro-assembly two-stage automatic focusing based on spatial-frequency joint image quality evaluation as set forth in claim 9, wherein S5 is performed to estimate the clear position of the diagonal camera For the center, performing a small-range search, specifically, a fixed step in the direction of the optical axis of the camera A set of candidate images is acquired within its neighborhood, the set of candidate images of which can be expressed as: ; Wherein, the Represents the displacement amount relative to the initial position of the diagonal camera, The image of the location is represented by a map of the location, Representing the number of steps corresponding to the search radius, and determining the range of the local search interval; all candidate images are input into the WaveMamba-IQA model for definition scoring, and finally the position with the highest score is selected as the optimal clear imaging position of the diagonal camera: ; The searching range of the stage is effectively constrained by geometric prior estimation of the previous stage, and the focusing process is only required to be carried out in a limited local interval, so that high-precision automatic focusing can be realized under the condition of extremely few image sampling times, and the focusing efficiency and the robustness of the system are effectively considered.

Description

Micro-assembly two-stage automatic focusing method based on space-frequency combined image quality evaluation Technical Field The invention relates to the field of micro-scale precise operation, in particular to a micro-assembly two-stage automatic focusing method based on space-frequency combined image quality evaluation. Background The micro-assembly technology is a core technology for completing micro-scale precise operation, is widely applied to the high-end manufacturing fields of aerospace, biomedicine, micro-electromechanical systems and the like, wherein the precise assembly of microspheres and microtubules is a typical micro-assembly operation task, and has extremely high requirements on the operation precision and automation level of a micro-assembly robot. The integrated micro-vision system is a key means for guiding the micro-assembly robot to realize precise assembly of the microsphere, and the micro-vision system needs to clearly capture the outline of the microsphere to accurately estimate the spatial position of the microsphere and then provide vision guidance for the operation of the robot. However, in the actual micro-assembly operation environment, complex reflection interference exists, the sizes of the microspheres to be assembled are different and changed, the traditional automatic focusing strategy is easily influenced by the factors to cause unstable performance, clear and reliable microsphere imaging results are difficult to obtain, the current micro-assembly process still depends on manual focusing in a large amount, and the operation efficiency and full-automatic development sense of a micro-assembly system are severely restricted. The image definition evaluation is the core of the micro-vision system automatic focusing technology, and by carrying out definition grading on images of different focusing positions, an imaging position with the highest grading is searched to be used as the optimal focusing position, and two technical systems of a traditional gradient feature-based method and a new generation of depth learning-based reference-free image quality evaluation method are formed around the image definition evaluation in the prior art, wherein the two most similar technologies are a Tenengard gradient operator combined quadratic curve fitting-based micro-assembly automatic focusing method and a MANIQA model-based reference-free image quality evaluation method respectively. The micro-assembly automatic focusing method combining Tenengard gradient operators with quadratic curve fitting has the defects that 1) characteristic extraction capability is limited, the method depends on artificially designed Tenengard gradient characteristic quantification definition, only can detect simple edge gradient information of an image, cannot extract depth semantics and structural characteristics of a microsphere image, is highly sensitive to noise, illumination change and background interference in a micro-assembly environment, and has large fluctuation of definition scoring results and unstable focusing performance in an actual micro-assembly operation environment with complex reflection interference, 2) focusing precision is insufficient, namely the essence of quadratic curve fitting is approximate fitting to discrete scoring sample points, is easily influenced by abnormal sample points and falls into local extremum, cannot accurately find a globally optimal clearest imaging position, has poor suitability to microsphere size change, has larger focusing errors of microspheres with different sizes, and 3) has poor robustness, cannot cope with the problem of microsphere size difference change in the micro-assembly scene, has weak generalization and is difficult to meet the operation scale of micro-assembly to be assembled. The method for evaluating the non-reference image quality based on MANIQA models has the defects that 1) the model is a general non-reference image quality evaluation model, the design aims at identifying various image distortion types such as Gaussian noise, compression artifacts, motion blur, defocus blur and the like, is not specially designed for defocus blur identification of a micro-assembly scene, has insufficient sensitivity to micro-position change of a camera and cannot meet the core requirement of micro-assembly technology on micro-assembly precision, 2) the feature extraction dimension is single, only the extraction and utilization of image space domain features are focused, frequency domain information (such as high-frequency detail features of microsphere outlines) which is critical to definition discrimination is not mined at all, fine edge details of microspheres in the micro-assembly scene are difficult to capture, the fine edges of microsphere outlines are keys for accurately estimating the spatial positions of the microspheres, and finally the accuracy of definition grading is insufficient, 3) the model is not specially designed for defocus blur identific