CN-115661872-B - Robust palm region of interest positioning method in natural scene

CN115661872BCN 115661872 BCN115661872 BCN 115661872BCN-115661872-B

Abstract

The invention discloses a robust palm region of interest positioning method in a natural scene, which comprises the steps of obtaining an input image, detecting a palm region and a palm outline region in the input image to obtain a palm region and outline region feature map, inputting the feature map into a palm region of interest positioning network model, carrying out regression on the feature map through a positioning network in the palm region positioning network model to obtain gesture correction parameters, carrying out spatial variation on the palm region feature map, carrying out downsampling and finger root amplification on the image to generate an adjusted feature map, carrying out key point coordinate fusion regression on the adjusted feature map to obtain a first key point coordinate, inversely converting the coordinate into a coordinate system of an original input image to obtain a second key point coordinate, and extracting to obtain a palm region of interest image based on the second key point coordinate. The invention can solve the problems of translation, rotation and the like in a large range without removing the background by infrared imaging, and can realize the positioning requirement of the non-contact palm interested region in a natural scene.

Inventors

LIANG XU
CHEN JUNAN
ZHANG DAPENG

Assignees

深圳市人工智能与机器人研究院
香港中文大学（深圳）

Dates

Publication Date: 20260508
Application Date: 20221021

Claims (9)

1. A method for locating a robust palm region of interest in a natural scene, the method comprising: Acquiring an input palm image, and detecting a palm region and a palm contour region in the input image to obtain a palm region and a contour region feature map; Inputting the palm region and contour region feature images into a palm region-of-interest positioning network model trained by a weak supervision training strategy, regressing the palm region and contour region feature images through a positioning network of the trained palm region-of-interest positioning network model to obtain palm posture correction parameters, performing spatial variation on the palm region feature images based on the posture correction parameters, performing downsampling and finger root amplification on the images, and generating an adjusted feature image; performing fusion regression on the key point coordinates of the adjusted feature map to obtain first key point coordinates of the fusion regression, and inversely converting the coordinates back to the coordinate system of the original input image based on the palm posture correction parameters to obtain second key point coordinates of the palm interested region corresponding to the input palm image; Extracting and obtaining a palm region-of-interest image based on the obtained second key point coordinates; And carrying out fusion regression on the key point coordinates of the adjusted feature map, wherein the fusion regression specifically comprises the following steps: the key point characteristics are subjected to attention sensing and extraction through a multi-head attention network, and then global regression is carried out on the coordinates of the key points through a multi-layer sensing machine, so that the key point coordinates predicted by the global regression are obtained; Performing attention sensing and feature extraction on the adjusted feature map through Vision Transformer network, detecting local key points through CNN convolution network, and mapping key point response into local regression key point coordinates through microspace-to-numerical conversion; Fusing the key point coordinates obtained by global regression and the key point coordinates obtained by local regression through the fusion weight obtained by the current sample feature prediction to obtain first key point coordinates of fusion regression; calculating a finger root edge distance loss, the finger root edge distance loss defined as: , Is corresponding to Root edge of strip finger The euclidean distance sequence of individual finger root key points, To mean root edge distance loss.
2. The method for positioning a robust palm region of interest in a natural scene according to claim 1, wherein the training step of training the palm region of interest positioning network model by the weak supervision training strategy comprises: Classifying pixels in the input first palm image by a palm region weak divider by adopting a classifier trained by a monochromatic background palm image set, extracting a palm region, and performing edge detection on the palm region to obtain a palm contour; processing the palm outline through an iterative palm region of interest positioning algorithm to obtain first labeling information of the input first palm image; Extracting palm pixels based on a palm region of the first palm image, and combining the palm pixels with a natural background image in a natural gallery to perform background synthesis to generate a second palm image in a natural environment; the same random space change operation is simultaneously carried out on the obtained second palm image and the first annotation information to obtain a third palm image and the second annotation information, then image quality random disturbance is carried out on the third palm image to obtain a fourth palm image, and a training sample containing the fourth palm image and the second annotation information after data augmentation is obtained; and training the palm region of interest positioning network model through the obtained training sample.
3. The method for positioning a robust palm region of interest in a natural scene according to claim 2, wherein the iterative palm region of interest positioning algorithm specifically comprises: detecting key points of the extracted palm area, and checking the number of the key points; based on the number of the verified key points, executing a palm region of interest positioning method based on distance extreme points when the complete 5 points and 4 points are detected; performing finger edge detection based on line scanning when the complete 5 finger points and 4 finger valley points are not obtained through verification, and performing a palm interested area positioning method based on line scanning when the 4 finger points and 3 finger valley points are detected; When 4 finger points and 3 finger valley points are not detected, moving to the next scanning point to execute finger edge detection based on line scanning, judging whether the 4 finger points and 3 finger valley points are scanned again through detection, executing a palm interested region positioning method based on line scanning when the 4 finger points and 3 finger valley points are detected, and moving to the next scanning position when the 4 finger points and 3 finger valley points are not detected to execute finger edge detection based on line scanning again; The detection is exited when the scan triggers a stop condition.
4. A method for positioning a robust palm region of interest in a natural scene according to claim 3, wherein said iterative palm region of interest positioning algorithm and said line scan-based finger edge detection specifically further comprise: The iterative palm region of interest positioning algorithm obtains a palm region of interest through a positioning method and then carries out anomaly detection on the palm region of interest; The line scanning-based finger edge detection is used for vertically detecting the brightness value of a palm area diagram from top to bottom in an input palm area, when a change mode is completely generated, the normal four-finger edge is detected, when the complete change mode is not detected, the vertical scanning fails, the step length is moved to the right, and the next vertical scanning is continued; When the scanning exceeds the preset scanning area, triggering the scanning termination condition and exiting the scanning detection.
5. The method for positioning a robust palm region of interest in a natural scene according to claim 1, wherein the obtaining an input image, detecting a palm region and a palm contour region in the input image, and obtaining a palm region and a contour region feature map specifically includes: Extracting primary features in the picture through a backbone network with a multi-scale pyramid structure; based on the obtained primary features, processing the primary features by adopting a semantic segmentation network to generate an area feature map comprising three channels of a background area map, a palm area map and a palm contour area map, wherein the three channels output the semantic segmentation effect of the three channels to be monitored through focus loss; the generated three-channel regional feature map is preprocessed to generate a feature map containing six channels.
6. The method for positioning a robust palm region of interest in a natural scene according to claim 1, wherein the extracting the palm region of interest image based on the obtained second key point coordinates specifically comprises: judging the training sample according to a preset threshold value; When the sample size reaches a threshold value, selecting and outputting coordinates of angular points of the palm region of interest by using the coordinates of the second key points; when the sample size does not reach the threshold value, selecting to output the central point of the palm region of interest and the coordinates of the key points of the finger root by using the coordinates of the second key points to establish a coordinate system and position the square palm region of interest; And performing projective transformation on the acquired angular point coordinates of the region of interest of the palm to extract an image of the region of interest of the palm.
7. A device for positioning a robust palm region of interest in a natural scene, for implementing the method for positioning a robust palm region of interest in a natural scene according to any one of claims 1 to 6, the device comprising: the palm region and contour extraction module is used for acquiring an input palm image, detecting a palm region and a palm contour region in an input image, and obtaining a palm region and contour region feature map; The palm posture adjustment module is used for inputting the palm region and contour region feature images into a palm region-of-interest positioning network model trained by a weak supervision training strategy, regressing the palm region and contour region feature images through a positioning network of the trained palm region-of-interest positioning network model to obtain posture correction parameters of the palm, carrying out space change on the palm region feature images based on the posture correction parameters, carrying out downsampling and finger root amplifying operation on the images, and generating an adjusted feature image; the palm key point coordinate fusion regression module is used for carrying out fusion regression on the key point coordinates of the adjusted feature map to obtain first key point coordinates of fusion regression, and inversely converting the coordinates back to the coordinate system of the original input image based on the palm posture correction parameters to obtain second key point coordinates of the palm region of interest corresponding to the input palm image; The palm region of interest image extraction module is used for extracting and obtaining a palm region of interest image based on the obtained second key point coordinates.
8. A feature extraction identifier, wherein the feature extractor performs palm biological information detection on a palm region of interest image obtained by the robust palm region of interest positioning method in a natural scene according to any one of claims 1 to 6, and performs user identification and verification based on the detected palm biological information.
9. A terminal device, characterized in that it comprises a memory, a processor and a palm region of interest locating program stored on the memory and being robust in a natural scene running on the processor, the processor implementing the steps of the palm region of interest locating method in a natural scene as claimed in any one of claims 1-6 when executing the palm region of interest locating program in a natural scene.

Description

Robust palm region of interest positioning method in natural scene Technical Field The present invention relates to the field of image processing, and in particular, to a method and apparatus for positioning a robust palm region of interest in a natural scene, a feature extraction identifier, and a device. Background The palm print recognition technology can automatically identify the identity of the user according to the biological characteristics of the surface of the palm of the human body, and the palm print recognition comprises the following parts of palm image acquisition, palm print region of interest positioning, palm print characteristic extraction, characteristic matching and recognition. For palm print region of interest positioning, the background is complex, and the palm posture is more variable, so the palm region of interest positioning becomes the bottleneck of the recognition system. However, the current palm print region of interest positioning scheme is more dependent on infrared imaging to remove the background, and cannot deal with the problems of palm rotation, translation and scaling in a large range, that is, cannot realize the non-contact palm region of interest positioning requirement in a natural scene, and is inconvenient for users, which is a problem to be solved in the present day. Accordingly, the prior art is still in need of improvement and development. Disclosure of Invention Aiming at the defects of the prior art, the application provides a robust palm interested region positioning method device, equipment and medium in a natural scene, which can obtain a palm related region feature map through a depth network, distinguish a palm background map and the palm related region feature map, and simultaneously correct the obtained feature map and obtain palm region key point coordinates through coordinate regression, thereby realizing extraction of a palm high-definition interested region ROI image by the coordinates, having robustness to complex background and natural palm gestures, and being capable of successfully positioning the ROI in palm images shot in the natural scene by different equipment. In order to solve the above-mentioned drawbacks of the prior art, a first aspect of the present application provides a method for positioning a robust palm region of interest in a natural scene, where the method includes: Acquiring an input palm image, and detecting a palm region and a palm contour region in the input image to obtain a palm region and a contour region feature map; Inputting the palm region and contour region feature images into a palm region-of-interest positioning network model trained by a weak supervision training strategy, regressing the palm region and contour region feature images through a positioning network of the trained palm region-of-interest positioning network model to obtain palm posture correction parameters, performing spatial variation on the palm region feature images based on the posture correction parameters, performing downsampling and finger root amplification on the images, and generating an adjusted feature image; performing fusion regression on the key point coordinates of the adjusted feature map to obtain first key point coordinates of the fusion regression, and inversely converting the coordinates back to the coordinate system of the original input image based on the palm posture correction parameters to obtain second key point coordinates of the palm interested region corresponding to the input palm image; And extracting and obtaining the palm interested region image based on the obtained second key point coordinates. Training a palm region of interest positioning network model by using a weak supervision training strategy, namely classifying pixels in an input first palm image by using a classifier obtained by training a single-color background palm image set through a palm region weak divider, extracting a palm region, and performing edge detection on the palm region to obtain a palm contour; processing the palm outline through an iterative palm region of interest positioning algorithm to obtain first labeling information of the input first palm image; Extracting palm pixels based on a palm region of the first palm image, and combining the palm pixels with a natural background image in a natural gallery to perform background synthesis to generate a second palm image in a natural environment; the same random space change operation is simultaneously carried out on the obtained second palm image and the first annotation information to obtain a third palm image and the second annotation information, then image quality random disturbance is carried out on the third palm image to obtain a fourth palm image, and a training sample containing the fourth palm image and the second annotation information after data augmentation is obtained; and training the palm region of interest positioning network model through the obtained training sample. Th