CN-121982447-A - Point-of-interest detection model training method, product positioning method, device and medium

CN121982447ACN 121982447 ACN121982447 ACN 121982447ACN-121982447-A

Abstract

The application discloses a training method of an interest point detection model, a product positioning method, equipment and a medium, and relates to the technical field of computer vision; the method comprises the steps of extracting first information of all interest points in an original image through an interest point detection model to be trained, extracting second information of all interest points in a transformation image, matching the positions of all the interest points in the original image with the positions of all the interest points in the transformation image after homography transformation to obtain all the interest point pairs, matching the positions of all the interest points in the original image to obtain all the interest point pairs, calculating sample loss of a sample image pair based on the first information and the second information corresponding to all the interest point pairs, and adjusting parameters of the interest point detection model based on the sample loss of a plurality of sample image pairs. The application provides an unsupervised interest point detection model training method, which does not need to rely on any manual annotation data, and improves training efficiency and adaptability of the model in different scenes.

Inventors

LV JIANTAO
HOU CHUANYONG
ZHAO SHIWEN
MA CHENHUI
LIU KUN
DING HAORAN

Assignees

歌尔股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. The interest point detection model training method is characterized by comprising the following steps of: Acquiring a sample image pair, wherein the sample image pair comprises an original image and a transformed image, and the transformed image is obtained by performing homography transformation on the corresponding original image; Extracting first information of each interest point in the original image and second information of each interest point in the transformation image through an interest point detection model to be trained, wherein the first information and the second information at least comprise positions of corresponding interest points; Matching the positions of all the interest points in the original image with the positions of all the interest points in the transformation image after the homography transformation to obtain all the interest point pairs, or matching the positions of all the interest points in the transformation image with the positions of all the interest points in the original image after the inverse transformation of the homography transformation to obtain all the interest point pairs, wherein each interest point pair comprises one interest point in the original image and one interest point in the transformation image; calculating a sample loss of the sample image pair based on the first information and the second information corresponding to each point of interest pair; parameters of the point of interest detection model are adjusted based on the sample losses for a plurality of the sample image pairs.
2. The method for training the interest point detection model according to claim 1, wherein the step of matching the position of each interest point in the original image with the position of each interest point in the transformed image after the homography transformation to obtain each interest point pair comprises: Carrying out homography transformation on the positions of all the interest points in the original image to obtain transformed positions of all the interest points in the original image; And performing bidirectional nearest neighbor matching based on the transformed positions of all the interest points in the original image and the positions of all the interest points in the transformed image to obtain interest point pairs which are nearest neighbors of each other in the original image and the transformed image and have a distance smaller than a preset threshold value.
3. The method of claim 1, wherein the first information and the second information further comprise scores and descriptors for respective points of interest, the step of calculating a sample loss for the sample image pair based on the first information and the second information corresponding to each of the point of interest pairs comprising: Calculating unsupervised point losses based on the scores and the positions of the interest points in the pair of interest points; calculating descriptor loss based on the distance between descriptors of interest points in the interest point pairs; Sample loss for the sample image pair is calculated based on the unsupervised point loss and the descriptor loss.
4. The method of claim 3, wherein the step of calculating an unsupervised point loss based on the points of interest in each of the pairs of points of interest' scores and locations comprises: calculating the square of the difference value of the scores of the two interest points in the interest point pairs, and summing the calculation results of the interest point pairs to obtain a first term loss; Calculating the product of the distance between the position of one interest point of the interest point pairs and the transformed position of the other interest point and the average score for each interest point pair, and summing the calculation results of the interest point pairs to obtain a second term loss, wherein the average score is the average value of the scores of the two interest points in the interest point pairs; An unsupervised point loss for the point of interest pair is calculated based on the first term loss and the second term loss.
5. The point of interest detection model training method of claim 4, wherein said step of calculating an unsupervised point loss for said point of interest pair based on said first term loss and said second term loss comprises: Multiplying a first summation value by a second summation value to obtain a third term loss, wherein the first summation value is obtained by summing the position distances corresponding to all the interest point pairs, the position distances corresponding to the interest point pairs are the distances between the position of one interest point of the interest point pairs and the transformed position of the other interest point, and the second summation value is obtained by summing the complements of the average scores corresponding to all the interest point pairs; sample loss for the sample image pair is calculated based on the first, second, and third term losses.
6. The method of claim 1, wherein the point of interest detection model comprises a backbone network and a point of interest decoder, the backbone network being modified from ResNet-18 by replacing the first two layers of ResNet-18 with a1 x1 convolutional layer, removing the last two layers of ResNet-18, adding a spatial attention module to each base residual block of ResNet-18 prior to residual connection within the base residual block, and the point of interest decoder comprising three parallel subtask modules for predicting the location, score, and descriptor of the point of interest, respectively.
7. The method of training a point of interest detection model of any one of claims 1 to 6, wherein the step of acquiring a sample image pair comprises: Acquiring a large-size image; Dividing the large-size image by adopting a sliding window strategy to obtain a plurality of original images; and respectively carrying out homography transformation on each original image to obtain corresponding transformed images, and obtaining a plurality of sample image pairs consisting of each original image and the corresponding transformed images.
8. A method of product positioning, the method comprising: acquiring a reference product image and a target product image; Extracting third information of each interest point in the reference product image and fourth information of each interest point in the target product image through an interest point detection model; matching interest points in the reference product image and the target product image based on the third information and the fourth information, and calculating to obtain a homography transformation matrix based on a matching result; positioning a product or a device of the product in the target product image based on the homography transformation matrix; The interest point detection model is trained by the training method of the interest point detection model according to any one of claims 1 to 7.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the point of interest detection model training method according to any one of claims 1 to 7 or the steps of the product localization method according to claim 8.
10. A storage medium, characterized in that the storage medium is a computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the point of interest detection model training method according to any one of claims 1 to 7, or the steps of the product localization method according to claim 8.

Description

Point-of-interest detection model training method, product positioning method, device and medium Technical Field The application relates to the technical field of computer vision, in particular to a training method of a point of interest detection model, a product positioning method, electronic equipment and a storage medium. Background At present, a positioning method based on template matching is widely applied to an assembly process of an electronic product. The early stage mainly adopts a rule-based matching method, namely firstly extracting the characteristics of angular points, edges, gray scales and the like of a specific area in an image, then comparing the characteristics of a current image with the characteristics of a reference image, and further solving a transformation matrix of the current image relative to the reference image, wherein the matrix is then used for tasks such as visual guidance, visual detection and the like in the assembly process. The method has good performance in assembly positioning scenes with rich part characteristics and stable environment. However, in scenes with lack of texture features and dynamic changes in the environment, the performance of such methods remains to be improved due to the lack of stable, significant shallow features of the object to be positioned. Along with the gradual application of convolutional neural networks to image processing tasks, the convolutional neural networks can automatically learn depth features of images from training data and fuse multi-dimensional information such as edges, gray scales, corner points and the like, and have remarkable advantages compared with a rule-based method. However, at present, the neural network based on supervised learning generally needs a large amount of manual labeling data, has a heavy labeling burden, and often needs to be labeled again when the production line is changed, so that the flexibility is not enough. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The application mainly aims to provide a method for training a point-of-interest detection model, a method for positioning a product, electronic equipment and a storage medium, and aims to provide an unsupervised (also called self-adaption) method for training the point-of-interest detection model, which does not need to rely on any manual annotation data and improves training efficiency and adaptation capability of the model in different scenes. In order to achieve the above object, the present application provides a training method for a point of interest detection model, the training method for a point of interest detection model comprising: Acquiring a sample image pair, wherein the sample image pair comprises an original image and a transformed image, and the transformed image is obtained by performing homography transformation on the corresponding original image; Extracting first information of each interest point in the original image and second information of each interest point in the transformation image through an interest point detection model to be trained, wherein the first information and the second information at least comprise positions of corresponding interest points; Matching the positions of all the interest points in the original image with the positions of all the interest points in the transformation image after the homography transformation to obtain all the interest point pairs, or matching the positions of all the interest points in the transformation image with the positions of all the interest points in the original image after the inverse transformation of the homography transformation to obtain all the interest point pairs, wherein each interest point pair comprises one interest point in the original image and one interest point in the transformation image; calculating a sample loss of the sample image pair based on the first information and the second information corresponding to each point of interest pair; parameters of the point of interest detection model are adjusted based on the sample losses for a plurality of the sample image pairs. Optionally, the step of matching the position of each interest point in the original image with the position of each interest point in the transformed image after the homography transformation to obtain each interest point pair includes: Carrying out homography transformation on the positions of all the interest points in the original image to obtain transformed positions of all the interest points in the original image; And performing bidirectional nearest neighbor matching based on the transformed positions of all the interest points in the original image and the positions of all the interest points in the transformed image to obtain interest point pairs which are nearest neighbors of each othe