CN-122024295-A - Lip movement password verification method based on heterogeneous characteristics

CN122024295ACN 122024295 ACN122024295 ACN 122024295ACN-122024295-A

Abstract

The invention discloses a lip movement password verification method based on heterogeneous characteristics, which comprises the following steps of S1, model construction and pre-training, namely, based on a heterogeneous neural network, a face detection model, a static characteristic capturing model and a dynamic embedded perception model are constructed, and pre-training is carried out on a self-collected large-scale face data set, S2, video acquisition and pre-processing are carried out, namely, lip movement password videos are acquired, and lip movement passwords and a face detection model are input at the same time, and the method is based on the large-scale pre-training model, has good adaptability to different languages (even the mixture of various languages) without retraining during deployment, can effectively avoid the interference of complex environments, and can effectively resist the attack of the existing face counterfeiting method compared with the existing face-based static identity verification method, and a registration database only stores the coded identity characteristics encrypted by the model, but does not retain the original lip movement videos, so that the personal privacy of users can be better protected.

Inventors

ZHANG XIAOMING

Assignees

六域科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260105

Claims (9)

1. A lip movement password verification method based on heterogeneous characteristics is characterized by comprising the following steps: Step S1, model construction and pre-training, namely constructing a face detection model, a static feature capturing model and a dynamic embedded perception model based on a heterogeneous neural network, and pre-training on a self-collected large-scale face data set; S2, video acquisition and preprocessing, namely acquiring lip movement password video, simultaneously inputting a lip movement password and a human face detection model, determining the coordinates of a region where a face is located and key points, correcting the coordinates according to an affine transformation principle and the positions of reference points, and carrying out self-adaptive cutting on an original video to obtain an aligned face video segment; S3, performing sparse sampling on the facial video segments, inputting the facial video segments into a static feature capturing model, extracting spatial identity features, comparing the spatial identity features with feature vectors in a registration database, and if the static features are successfully matched, entering the next step; and S4, dynamic feature extraction and verification, namely inputting the lip video segment obtained in the step S2 into a dynamic embedded perception model to extract dynamic lip feature, comparing the extracted dynamic feature with time sequence password features in a registration database, and if the dynamic feature is successfully matched, completing double verification and confirming identity.
2. The method for verifying a labial movement password based on heterogeneous characteristics according to claim 1, wherein the face is a lip or a human face.
3. The method for verifying a lip movement password based on heterogeneous features of claim 1, wherein the static feature capture model is used for extracting high-dimensional features with identity specificity from a single frame of lip marks or face images.
4. A method of authenticating a labial character based on heterogeneous characteristics as set forth in claim 3 wherein said static character capture model is comprised of a downsampling layer, three cascaded composite two-dimensional convolution layers, and a spatially global pooling layer, the optimization objective being to obtain a model weight such that the largest intra-class distance from the constructed dataset is substantially smaller than the smallest inter-class distance.
5. The method for verifying the lip-motion password based on the heterogeneous features of claim 1, wherein the dynamic embedded perception model is used for eliminating interference information in complex lip-motion passwords and mapping an original lip-motion video into a potential space to enable the original lip-motion video to be a time sequence feature with a global unique position, so that one-dimensional lip-motion codes with discriminant are perceived.
6. The method for verifying a lip-motion password as in claim 5, wherein the dynamic embedded perceptual model comprises a three-dimensional convolution layer, a residual convolution layer, a time sequence encoder, a dynamic decoder and a linear classification layer, wherein the time sequence encoder and the dynamic decoder adopt a self-attention model as a backbone network.
7. The method for verifying lip-moving password based on heterogeneous features of claim 1, wherein said registration database requires a registered user to provide customized lip-moving password video in a user registration stage, and the video is processed through two sub-models to extract a plurality of encrypted high-dimensional identity features.
8. The method for verifying a lip movement password based on heterogeneous features of claim 7, wherein the features comprise static features and dynamic features, the static features comprise lip marks or face information, the dynamic features are lip language codes of videos, and captured time sequence dynamic features.
9. The method for authenticating a labial password based on heterogeneous characteristics of claim 7, wherein during said registration phase, a user may decide whether to include face information for authentication: If the face information is selected to be contained, the authentication method can simultaneously use static face information and dynamic lip language passwords to carry out identity authentication; if the face information is not contained, the verification method only uses static lip-print information and dynamic lip-language passwords to carry out double verification.

Description

Lip movement password verification method based on heterogeneous characteristics Technical Field The invention relates to the technical field of identity recognition based on machine learning and computer vision, in particular to a lip movement password verification method based on heterogeneous characteristics. Background The existing face recognition technology has a certain hidden danger in the aspect of safety. The deep face-changing technology is rapidly developed, so that false generated faces are increasingly flooded, and the false generated faces have extremely strong authenticity and confusion, and seriously influence the safety of the traditional face verification algorithm. In contrast, the lip language password has richer dynamic change and anti-counterfeiting performance, and can effectively resist the attack of the deep face changing technology. Therefore, unlike the common static identity information verification method such as face recognition, the encryption time sequence lip code in the method has stronger safety and reliability. In addition, the existing identity authentication technology combining face recognition with lip matching has a plurality of defects, such as (1) that only a small amount of lip content given in a system can be identified, (2) that a lip recording mode of a user is limited, words must include specified pause time and the like, (3) that traditional classification models with limited feature expression capability or highly predefined manual features (Haar, SIFT, hoG and the like) are relied on, so that the authentication passwords with uniqueness are difficult to capture from changeable lip videos, and (4) that model generalization capability is limited, so that the authentication passwords lack adaptability to complex environment changes, and a large amount of marked data is required to be retrained during deployment. The above drawbacks severely limit their practical application capabilities. Disclosure of Invention The invention aims to provide a lip movement password verification method based on heterogeneous characteristics, which consists of a face detection model, a static characteristic capturing model and a dynamic embedded perception model, and performs large-scale pre-training on a self-collection lip movement (face) data set. Particularly, the well-designed model architecture enables the method to have strong invariance feature expression capability, can efficiently sense unique passwords from dynamic and changeable lip language videos, and endows the model with extremely strong reliability and generalization through large-scale pre-training of a self-constructed data set, so that the model can effectively eliminate the interference of complex environments, and therefore, no additional labeling data retraining is needed when the model is deployed, so that the problems in the background technology are solved. In order to achieve the purpose, the invention provides the technical scheme that the lip movement password verification method based on the heterogeneous characteristics comprises the following steps of: Step S1, model construction and pre-training, namely constructing a face detection model, a static feature capturing model and a dynamic embedded perception model based on a heterogeneous neural network, and pre-training on a self-collected large-scale face data set; S2, video acquisition and preprocessing, namely acquiring lip movement password video, simultaneously inputting a lip movement password and a human face detection model, determining the coordinates of a region where a face is located and key points, correcting the coordinates according to an affine transformation principle and the positions of reference points, and carrying out self-adaptive cutting on an original video to obtain an aligned face video segment; S3, performing sparse sampling on the facial video segments, inputting the facial video segments into a static feature capturing model, extracting spatial identity features, comparing the spatial identity features with feature vectors in a registration database, and if the static features are successfully matched, entering the next step; and S4, dynamic feature extraction and verification, namely inputting the lip video segment obtained in the step S2 into a dynamic embedded perception model to extract dynamic lip feature, comparing the extracted dynamic feature with time sequence password features in a registration database, and if the dynamic feature is successfully matched, completing double verification and confirming identity. Preferably, the face is a lip or a face. Preferably, the static feature capture model is used for extracting high-dimensional features with identity specificity from a single frame lip print or face image. Preferably, the static feature capture model consists of a downsampling layer, three cascaded composite two-dimensional convolution layers, and a spatially global pooling layer, the optimization objec