CN-122023535-A - Space non-cooperative target pose estimation method based on two-dimensional-three-dimensional matching
Abstract
The invention relates to a space non-cooperative target pose estimation method based on two-dimensional-three-dimensional matching, which comprises the steps of constructing a three-dimensional feature reference model of a target based on multi-view images, carrying out two-dimensional feature extraction on an input target image to be detected, establishing a high-robustness corresponding relation between the two-dimensional feature of the target image to be detected and the three-dimensional feature of a reference model by adopting a coarse-to-fine matching strategy, estimating initial pose parameters of the target by a robust pose solving algorithm based on the two-dimensional-three-dimensional feature matching result, carrying out consistency estimation and optimization on the pose result by combining differentiable reprojection error constraint, and finally outputting six-degree-of-freedom pose information of the target. According to the invention, the pose estimation with high precision and high stability can be realized under the conditions of complex illumination change, visual angle change and local shielding without depending on an artificial mark or an accurate target priori model.
Inventors
- LIU HAIBO
- Zeng Jingshuo
Assignees
- 湖南大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260414
Claims (8)
- 1. The method for estimating the pose of the spatial non-cooperative target based on two-dimensional-three-dimensional matching is characterized by comprising the following steps of: S1, acquiring spatial non-cooperative target multi-view image data; S2, reconstructing a three-dimensional feature reference model of a space non-cooperative target by utilizing multi-view triangulation based on the multi-view image data, wherein the three-dimensional feature reference model comprises a three-dimensional space point set of the target and three-dimensional feature description information corresponding to the three-dimensional space point set; S3, inputting a query image of a target to be detected, extracting two-dimensional characteristic information of the image, and establishing a high-robustness corresponding relation between the two-dimensional characteristic of the query image and the three-dimensional characteristic of the three-dimensional characteristic reference model by adopting a coarse-to-fine matching strategy; S4, solving the initial pose of the target to be detected through PNP based on the corresponding relation; and S5, continuously optimizing and refining the initial pose based on the differentiable reprojection error, and outputting final six-degree-of-freedom pose information of the target to be detected.
- 2. The method for estimating the pose of the spatial non-cooperative target based on two-dimensional-three-dimensional matching according to claim 1, wherein the step S1 is specifically: imaging a space non-cooperative target at different observation positions by a monocular camera, acquiring a plurality of images covering different view angles of the target, and representing the acquired images as Wherein Representing the number of images acquired, each image Corresponding to the unique imaging pose and imaging parameters; The imaging process of the image meets a pinhole imaging model, and a world coordinate system, a camera coordinate system and a pixel coordinate system are established and used for describing the mapping relation between the three-dimensional point of space and the two-dimensional image pixel point; any three-dimensional point of the object in the world coordinate system is represented as homogeneous coordinates The pixel coordinates in the image satisfy the following projection relationship, Representing the world coordinate system of the world, Representing the transpose, i.e. converting the row vectors into column vectors: (1.1), Wherein, the And The rotation matrix and translation vector of the world coordinate system to the camera coordinate system respectively, Representing the pixel coordinates of the three-dimensional point in the image, As a scale factor of the dimensions of the device, Is an internal parameter matrix of a camera, which is in the form of: (1.2), Wherein, the , , , Is an internal reference of the camera.
- 3. The method for estimating pose of spatial non-cooperative targets based on two-dimensional-three-dimensional matching according to claim 1, wherein the constructing a three-dimensional feature reference model in step S2 specifically comprises: Extracting two-dimensional characteristics of each image in the multi-view image data to obtain two-dimensional characteristic points and corresponding characteristic description vectors, and setting the two-dimensional characteristic points and the corresponding characteristic description vectors at a first position The set of two-dimensional feature points extracted from the image is expressed as , Represent the first The first in the image A number of two-dimensional feature points, The number of the feature points extracted from the image is the number; Based on geometric constraint among multi-view images, performing cross-view matching on two-dimensional feature points in different view images, and recovering corresponding three-dimensional space points by using a multi-view triangulation method to obtain a three-dimensional point set of a target under a space coordinate system: , representing the number of three-dimensional points, Is the first Three-dimensional space points; for each three-dimensional spatial point Collecting corresponding two-dimensional feature description vectors of the three-dimensional points under multiple view angles to form a feature set And will be the same three-dimensional point The corresponding multi-view two-position feature description vector carries out weighted feature aggregation to construct the three-dimensional feature description of the three-dimensional point, (1.4), Wherein the method comprises the steps of The feature-aggregation function is represented as a function of feature aggregation, An initial feature representation representing a three-dimensional point, Representing a set of two-dimensional features associated with the three-dimensional point, For the weight coefficient of the corresponding two-dimensional feature, the method is used for representing the contribution degree of the three-dimensional feature construction of the features of different visual angles, Representing that three-dimensional points can be observed Is a view index set of (1); through the weighted feature aggregation mode, multi-view two-dimensional feature information is effectively mapped and fused to three-dimensional space points, and the three-dimensional feature reference model is obtained: 。
- 4. the two-dimensional-three-dimensional matching-based spatial non-cooperative target pose estimation method according to claim 1, wherein the coarse-to-fine matching strategy in step S3 comprises: based on the three-dimensional feature reference model obtained in the step S2, the three-dimensional feature of the coarse layer is expressed as , The number of three-dimensional feature points is represented, Representing a feature dimension; preprocessing a query image of a target to be detected, inputting the query image into a feature extraction network, and obtaining a coarse layer two-dimensional feature map of the query image And flatten it into Wherein , And Respectively representing the height and width of the input image, and the spatial resolution of the feature image is reduced to the original size after downsampling ; A coarse matching stage, namely respectively introducing position codes to coarse layer two-dimensional features of the query image and coarse layer three-dimensional features of a three-dimensional feature reference model, carrying out feature enhancement through stacked attention modules comprising self attention and cross attention, and stacking Secondary to obtain enhanced coarse layer features And (3) with In the form of calculation expressed as (1.5), Wherein the method comprises the steps of Representing a self-attention function for modeling global dependencies between different locations within the same feature, Represents a cross-attention function for fusing information between two-dimensional and three-dimensional different modality features, A layer index representing the feature enhancement module, , And Respectively represent the first Representation of two-dimensional features and three-dimensional features in the layer feature enhancement module, And Is shown in the first The layer obtains updated characteristics after self-attention and cross-attention calculation; based on enhanced two-bit features And three-dimensional features Calculating a similarity score matrix between two-dimensional and three-dimensional features Wherein the three-dimensional feature points And two-dimensional feature points The similarity of (c) is defined as: (1.6), Wherein the method comprises the steps of Representing the operation of the inner product of the vector, Is a scaling factor; Scoring the similarity matrix Respectively applying Softmax operation on the two-dimensional dimension and the three-dimensional dimension, and constructing a rough matching probability matrix through bidirectional consistency constraint The elements are defined as follows: (1.7); For the coarse matching probability matrix By mutually nearest neighbors in combination with a match confidence threshold Screening to obtain a coarse layer two-dimensional-three-dimensional corresponding relation set: (1.8), Wherein, the Representing a set of matching pairs constructed based on mutually exclusive criteria for use from a similarity matrix Screening the matching relation of the two-way nearest neighbors; the fine matching stage is to roughly match the determined three-dimensional feature points of each group With corresponding two-dimensional rough matching positions Centered, the size is cut into a fine-layer two-dimensional feature map of the query image Is a local window of (2) Obtaining local fine layer two-dimensional characteristics Wherein Representing the dimension of the characteristic channel and combining the fine-layer three-dimensional characteristics of the corresponding three-dimensional points Local relevance enhancement is performed through a plurality of layers of self-attention and cross-attention, and each position in a local window is calculated Is a conditional matching probability distribution of (1): (1.9); Carrying out sub-pixel level refinement on the two-dimensional position by adopting an expected form based on the conditional matching probability distribution to obtain a fine matching two-dimensional position: (1.10), thereby establishing a high-robustness corresponding relation set between the two-dimensional characteristics of the query image and the three-dimensional characteristics of the three-dimensional characteristic reference model 。
- 5. The method for estimating pose of spatial non-cooperative target based on two-dimensional-three-dimensional matching according to claim 1, wherein said step S4 specifically comprises: high-robustness corresponding relation set based on step S3 fine matching Each corresponding element of (a) Wherein the three-dimensional points Representing three-dimensional feature point coordinates in the reference model, Representing finely matched two-dimensional pixel point coordinates in query images combined with camera reference matrix The two-dimensional-three-dimensional projection relationship is expressed as: , Wherein, the As a scale factor of the dimensions of the device, For the rotation matrix to be solved for, A three-dimensional rotating group is represented, In order to translate the vector of the vector, The six-degree-of-freedom pose of the target to be detected relative to the camera coordinate system is formed together; Introducing a random sampling consistency mechanism in the process of solving the formula (1.11), and estimating initial pose parameters of the target to be detected by screening an interior point set meeting the constraint of the re-projection error threshold 。
- 6. The method for estimating pose of spatial non-cooperative target based on two-dimensional-three-dimensional matching according to claim 1, wherein said step S5 specifically comprises: for each three-dimensional point Under the current pose parameters The projection position of (2) can be written as (1.12), Wherein the method comprises the steps of As an internal reference matrix of the camera, For the index mapping of lie algebra to lie groups, a reprojection error model is constructed based on this: (1.13); And (3) incorporating the re-projection error obtained by the re-projection error model into a differentiable Levenberg-Marquardt optimization framework to continuously optimize and refine the initial pose obtained in the step (S4) until final six-degree-of-freedom pose information of the target to be detected is output.
- 7. The method for estimating pose of spatial non-cooperative target based on two-dimensional-three-dimensional matching according to claim 6, wherein the step of optimizing the initial pose in the differentiable Levenberg-Marquardt optimization framework in step S5 is specifically as follows: in the forward propagation stage, according to the current pose parameters Calculating residual vectors Its jacobian matrix: , Building a normal equation of Levenberg-Marquardt based on the obtained residual vector and a corresponding comparable matrix , And solving the pose parameter increment Complete the pose update Obtaining a gradient of the reprojection error, wherein Is a damping factor; in the back propagation stage, the gradient of the reprojection error is back propagated to jointly update pose parameters and parameters of the feature extraction and matching network; After repeated iterative optimization, pose parameters Converging to an optimal solution, and converting the optimal solution into a rigid body transformation matrix of the target to be measured under a camera coordinate system through exponential mapping: (1.14), Wherein, the Representing the optimal rotation matrix of the object, Representing the optimal translation vector of the target.
- 8. The method for estimating a pose of a spatial non-cooperative target based on two-dimensional-three-dimensional matching as set forth in claim 3, wherein said step S2 is a step of matching a set of three or more two-dimensional feature points satisfying a matching relationship across view angles, corresponding three-dimensional points The estimation is performed by the following minimum re-projection error criterion: (1.3), Wherein, the Represent the first The three-dimensional point is at the first The two-dimensional observation position in the image, Representing a collection of images from which the three-dimensional point can be observed, Is a projection function.
Description
Space non-cooperative target pose estimation method based on two-dimensional-three-dimensional matching Technical Field The invention relates to the technical field of space target pose estimation, in particular to a space non-cooperative target pose estimation method based on two-dimensional-three-dimensional matching. Background With the rapid development of tasks such as on-orbit service, space debris cleaning, autonomous rendezvous and docking, pose estimation of a space non-cooperative target becomes a key technology for realizing autonomous operation and high-precision control of a spacecraft. Because of lack of manual identification and communication coordination of non-cooperative targets, pose acquisition of the non-cooperative targets faces a plurality of challenges in feature extraction, matching robustness and model generalization. The existing space non-cooperative target pose estimation method mainly comprises a traditional vision method based on a geometric model and a method based on deep learning. The traditional method depends on the known three-dimensional model and manually marked features, the method has strong dependence on the accuracy and imaging conditions of the target model, and the pose estimation failure is easy to cause due to poor stability of feature matching under the condition of illumination change, large visual angle change or insufficient texture of the target surface. Along with the rapid development of computer vision and deep learning technology, a spatial non-cooperative target pose estimation method based on learning features gradually becomes the main stream direction of research and application, the method utilizes a neural network to automatically extract target features from images and complete target recognition, key point positioning or pose parameter estimation, improves the processing capacity under complex scenes to a certain extent, but the existing method is trained by depending on large-scale labeling data, has certain dependence on target types and imaging conditions, and has limited generalization capacity in the face of unknown targets, shielding or illumination change and other spatial environments. The method is used for extracting two-dimensional characteristic points of a target from a query image and establishing a corresponding relation with a pre-constructed target three-dimensional model characteristic, so that a matching pair between the two-dimensional image characteristic and a three-dimensional space point is formed. After a sufficient number of matching relationships are obtained, the position and the posture of the target can be estimated by utilizing a perspective projection geometric model and a posture solving algorithm. The method does not depend on artificial CAD modeling any more, can fully utilize the geometric constraint of the three-dimensional space, and enhances the adaptability to complex illumination, low texture and local shielding conditions through a rough-to-fine feature matching mechanism, so that the invention develops a space non-cooperative target pose estimation method based on two-dimensional and three-dimensional matching. Disclosure of Invention Aiming at the problems, the invention provides a space non-cooperative target pose estimation method based on two-dimensional-three-dimensional matching, which is used for estimating with high precision and strong robustness and does not need a target accurate priori CAD model. The technical scheme adopted for solving the technical problems is that the method for estimating the pose of the spatial non-cooperative target based on two-dimensional-three-dimensional matching comprises the following steps: S1, acquiring spatial non-cooperative target multi-view image data; S2, reconstructing a three-dimensional feature reference model of a space non-cooperative target by utilizing multi-view triangulation based on the multi-view image data, wherein the three-dimensional feature reference model comprises a three-dimensional space point set of the target and three-dimensional feature description information corresponding to the three-dimensional space point set; S3, inputting a query image of a target to be detected, extracting two-dimensional characteristic information of the image, and establishing a high-robustness corresponding relation between the two-dimensional characteristic of the query image and the three-dimensional characteristic of the three-dimensional characteristic reference model by adopting a coarse-to-fine matching strategy; S4, solving the initial pose of the target to be detected through PNP based on the corresponding relation; and S5, continuously optimizing and refining the initial pose based on the differentiable reprojection error, and outputting final six-degree-of-freedom pose information of the target to be detected. Preferably, the step S1 specifically includes: imaging a space non-cooperative target at different observation positions by a monocular