CN-122024299-A - Multi-view facial image processing system

CN122024299ACN 122024299 ACN122024299 ACN 122024299ACN-122024299-A

Abstract

The invention provides a multi-view facial image processing system, and relates to the technical field of image processing. The system comprises an image acquisition module, a preprocessing module, a characteristic quantization calculation module, a deep learning reasoning module and a multi-mode characteristic fusion module, wherein the image acquisition module is used for shooting positive side face, first and second mouth images, the preprocessing module is used for carrying out normalization and illumination enhancement on a multi-view face image, the characteristic quantization calculation module is used for generating geometric characteristics such as mandibular skeleton, lingual space ratio, neck-face ratio, facial geometric outline, eyelid texture and the like based on the processed positive side face and first mouth image, the deep learning reasoning module is used for generating semantic characteristics such as hard palate morphological probability distribution, tonsil grading, occlusion relation, nose wing morphology and the like based on the processed side face, the first and second mouth images, and the multi-mode characteristic fusion module is used for carrying out weighted fusion and nonlinear mapping on the geometric characteristics, the semantic characteristics and physiological parameters of a user to obtain a fusion characteristic image. The invention directly predicts the sleep apnea based on the fusion feature map, and improves the prediction efficiency and the prediction precision of the sleep apnea.

Inventors

YUAN FANG
Zhan Xianbiao
WANG HANQIAO
SONG DONGMEI
HAO YAXIN

Assignees

河北医科大学

Dates

Publication Date: 20260512
Application Date: 20260302

Claims (10)

1. The multi-view facial image processing system is characterized by comprising an image acquisition module, a preprocessing module, a feature quantization calculation module, a deep learning reasoning module and a multi-mode feature fusion module, wherein: The image acquisition module is used for guiding a user to shoot multi-view face images according to a preset instruction, wherein the multi-view face images comprise a front face image, a side face image, a first mouth image and a second mouth image; the preprocessing module is used for preprocessing the multi-view face image, wherein the preprocessing comprises normalization processing and illumination enhancement processing; The feature quantization calculation module is used for: generating mandibular bone features based on the preprocessed side face image; generating tongue space duty ratio features based on the preprocessed first mouth image; dividing a neck region from the preprocessed front face image, and calculating the ratio of the pixel width at the narrowest part of the neck to the pixel width at the widest part of the cheekbones in the neck region to obtain a neck-face ratio feature; Calculating the aspect ratio of the vertical height and the horizontal width of the face based on the preprocessed side face image and the preprocessed front face image, and acquiring a lip closing gap in a natural state to obtain a face geometric outline feature; Intercepting an eyelid lower area from the preprocessed front face image, and extracting eyelid texture features from the eyelid lower area by utilizing a local binary pattern; The deep learning reasoning module is used for: intercepting a hard palate region from the preprocessed first mouth image, and extracting a hard palate morphological probability distribution characteristic vector from the hard palate region through a first convolution neural network; Intercepting a pharyngisthmus region from the preprocessed first mouth image, and extracting tonsil grading feature vectors from the pharyngisthmus region through a classification network with an attention mechanism; intercepting a labial area from the preprocessed second mouth image, and extracting an occlusion relation feature vector from the labial area through a visual classification model; intercepting a nasalar region from the preprocessed side face image, and extracting a nasalar morphological feature vector from the nasalar region by using a second convolutional neural network; The multi-modal feature fusion module is used for splicing the mandibular skeleton feature, the lingual space duty ratio feature, the neck-face ratio feature, the facial geometric outline feature, the eyelid texture feature, the hard palate morphological probability distribution feature vector, the tonsil grading feature vector, the occlusion relation feature vector, the nose wing morphological feature vector and the physiological parameters of a user to obtain a spliced vector, and inputting the spliced vector into a multi-modal fusion network to perform weighted fusion and nonlinear mapping to obtain a fusion feature map.
2. The multi-view facial image processing system of claim 1, wherein the feature quantization computation module is specifically configured to generate mandibular bone features based on the preprocessed facial images by: Extracting key points of a mandibular part from the preprocessed side face image by using a depth generation model, wherein the key points of the mandibular part comprise a tragus point, a mandibular angular point, a pre-chin point and a lip protrusion point; based on the key points of the mandibular part, the mandibular plane angle and the depth of the chin labial sulcus are calculated and used as mandibular bone characteristics.
3. The multi-view facial image processing system of claim 1, wherein the first mouth image is an oropharyngeal cavity image; The feature quantization calculation module is specifically configured to generate, based on the preprocessed first mouth image, a tongue space duty ratio feature by: extracting an oral cavity visual area and a tongue area from the preprocessed first mouth image by adopting a semantic segmentation network; And calculating the ratio of the number of the pixel points in the tongue region to the number of the pixel points in the visible region of the oral cavity to be used as the tongue space occupation ratio.
4. The multi-view facial image processing system according to claim 1, wherein the feature quantization computation module is specifically configured to compute an aspect ratio of a vertical height to a horizontal width of a face based on the preprocessed side face image and the preprocessed front face image, and obtain a lip closing gap in a natural state, to obtain a facial geometric contour feature: Extracting a nose root point, a chin point, a left cheekbone point, a right cheekbone point, an upper lip edge point and a lower lip edge point from the preprocessed side face image and the preprocessed front face image; Determining an absolute value of a difference between an ordinate of the under-chin point and an ordinate of the nasion point as a face height, and determining an absolute value of a difference between an abscissa of the left cheekbone point and an abscissa of the right cheekbone point as a face width; Determining the ratio of the face height to the face width as an aspect ratio of the face vertical height to the face horizontal width; calculating the distance between the upper lip edge point and the lower lip edge point to be used as a lip closing gap in a natural state; Aspect ratio of the vertical height to the horizontal width of the face and lip closing gap in the natural state are taken as facial geometric outline features.
5. The multi-view facial image processing system according to claim 1, wherein the feature quantization computation module is specifically configured to extract eyelid texture features from the eyelid-down region by intercepting the eyelid-down region from the preprocessed front face image by using a local binary pattern: Extracting a lower eyelid key point from the preprocessed front face image; Extending a preset number of pixel points downwards from the key point of the lower eyelid, and intercepting a rectangular strip area covering the lacrimal canals and the lower eye vein area as an area below the eyelid; Calculating a local binary pattern texture histogram of the eyelid lower area, and quantifying the pigmentation and blood vessel blood stasis degree of the black eye as eyelid texture characteristics.
6. The multi-view facial image processing system of claim 1, wherein the first mouth image is an oropharyngeal cavity image; The deep learning reasoning module is specifically configured to intercept a hard palate region from the preprocessed first mouth image, and extract a hard palate morphological probability distribution feature vector from the hard palate region through a first convolutional neural network by: Extracting an upper lip lower edge key point set and a nose bottom key point set in the preprocessed first mouth image; calculating geometric centers of key points in the upper lip lower edge key point set and the nose bottom key point set, and constructing a rectangular frame with the geometric centers to obtain a hard palate area, wherein the width of the rectangular frame is a first preset multiple of the left and right mouth angle distance, the height of the rectangular frame is a second preset multiple of the nose bottom to upper lip lower edge distance, the first preset multiple is smaller than 1, and the second preset multiple is larger than 1; Adjusting the size of the hard palate region; And inputting the hard palate region with the adjusted size into a first convolution neural network to obtain a hard palate morphological probability distribution characteristic vector.
7. The multi-view facial image processing system of claim 1, wherein the first mouth image is an oropharyngeal cavity image; the deep learning reasoning module is specifically configured to intercept a pharyngeal isthmus region from the preprocessed first mouth image, and extract a tonsil hierarchical feature vector from the pharyngeal isthmus region through a classification network with an attention mechanism by: Extracting key points of the outline in the oral cavity in the preprocessed first mouth image; calculating the minimum circumscribed rectangle of the inner oral cavity contour based on the inner oral cavity contour key points; The center of the minimum circumscribed rectangle is kept unchanged, and the length and the width are respectively expanded outwards by preset proportions to obtain a pharyngisthmus region; inputting the pharyngisthmus region into a classification network with a spatial attention mechanism and a channel attention mechanism, extracting a feature map before a global average pooling layer, flattening the feature map before the global average pooling layer to obtain a tonsil hierarchical feature vector, wherein the spatial attention mechanism is used for guiding the classification network to inhibit the weight of a tongue region and focusing on tonsil regions on two sides and deep of an image, and the channel attention mechanism is used for enhancing the feature response to red congestion and tissue protruding textures.
8. The multi-view facial image processing system of claim 1, wherein the second mouth image is a close-teeth bite image; The visual classification model comprises a spatial transformation network and a third convolutional neural network; The deep learning reasoning module is specifically configured to intercept a labial region from the preprocessed second mouth image, and extract an occlusion relationship feature vector from the labial region through a visual classification model by: Extracting key points of left and right mouth corners and key points of outer edges of upper and lower lips from the preprocessed second mouth image; taking the connecting line of the key points of the left mouth corner and the right mouth corner as a horizontal axis, and cutting a rectangular area containing the upper lip, the lower lip and the tooth contact surface from the preprocessed second mouth image based on the key points of the outer edges of the upper lip and the lower lip; Correcting the rectangular region by using the space transformation network so as to horizontally align the occlusal plane; And inputting the corrected rectangular region into the third convolution neural network to obtain the occlusion relation feature vector.
9. The multi-view facial image processing system of claim 1, wherein the deep learning inference module is specifically configured to extract a nasalar morphology feature vector from the nasalar region by using a second convolutional neural network by intercepting the nasalar region from the preprocessed side-face image: In the preprocessed side face image, a square area is constructed by taking a connecting line between a nose tip point and a nose wing base point as a diagonal line, and the outer side edge of the nose wing is covered to obtain a nose wing area; and inputting the alar region into a second convolutional neural network, and extracting the alar morphological feature vector.
10. The multi-view facial image processing system of claim 1, wherein the first mouth image is an oropharyngeal cavity image; The preprocessing module is specifically configured to perform illumination enhancement processing on the side face image and the first mouth image by: converting the side face image from an RGB space to an HSV space, keeping hue components and saturation components unchanged, carrying out multi-scale convolution on brightness components by using a Gaussian surrounding function, estimating a first illumination component, and removing the first illumination component to obtain a first intermediate image; Converting the first mouth image from the RGB space to the HSV space, keeping the hue component and the saturation component unchanged, performing multi-scale convolution on the brightness component by using a Gaussian surrounding function, estimating a second illumination component, removing the second illumination component to obtain a second intermediate image, and converting the second intermediate image from the HSV space to the RGB space.

Description

Multi-view facial image processing system Technical Field The invention belongs to the technical field of image processing, and particularly relates to a multi-view facial image processing system. Background At present, sleep apnea prediction is generally performed directly according to a multi-angle facial image of a user. However, this process is highly dependent on manual observation, labeling and integration of discrete morphological cues, not only is time consuming and laborious, but also lacks a standardized quantitative prediction system, resulting in inefficient prediction and insufficient consistency. Disclosure of Invention In order to solve the above technical problems, the present invention provides a multi-view facial image processing system. The invention provides a multi-view facial image processing system, which comprises an image acquisition module, a preprocessing module, a feature quantization calculation module, a deep learning reasoning module and a multi-mode feature fusion module, wherein: The image acquisition module is used for guiding a user to shoot multi-view face images according to a preset instruction, wherein the multi-view face images comprise a front face image, a side face image, a first mouth image and a second mouth image; the preprocessing module is used for preprocessing the multi-view face image, wherein the preprocessing comprises normalization processing and illumination enhancement processing; The feature quantization calculation module is used for: generating mandibular bone features based on the preprocessed side face image; generating tongue space duty ratio features based on the preprocessed first mouth image; dividing a neck region from the preprocessed front face image, and calculating the ratio of the pixel width at the narrowest part of the neck to the pixel width at the widest part of the cheekbones in the neck region to obtain a neck-face ratio feature; Calculating the aspect ratio of the vertical height and the horizontal width of the face based on the preprocessed side face image and the preprocessed front face image, and acquiring a lip closing gap in a natural state to obtain a face geometric outline feature; Intercepting an eyelid lower area from the preprocessed front face image, and extracting eyelid texture features from the eyelid lower area by utilizing a local binary pattern; The deep learning reasoning module is used for: intercepting a hard palate region from the preprocessed first mouth image, and extracting a hard palate morphological probability distribution characteristic vector from the hard palate region through a first convolution neural network; Intercepting a pharyngisthmus region from the preprocessed first mouth image, and extracting tonsil grading feature vectors from the pharyngisthmus region through a classification network with an attention mechanism; intercepting a labial area from the preprocessed second mouth image, and extracting an occlusion relation feature vector from the labial area through a visual classification model; intercepting a nasalar region from the preprocessed side face image, and extracting a nasalar morphological feature vector from the nasalar region by using a second convolutional neural network; The multi-modal feature fusion module is used for splicing the mandibular skeleton feature, the lingual space duty ratio feature, the neck-face ratio feature, the facial geometric outline feature, the eyelid texture feature, the hard palate morphological probability distribution feature vector, the tonsil grading feature vector, the occlusion relation feature vector, the nose wing morphological feature vector and the physiological parameters of a user to obtain a spliced vector, and inputting the spliced vector into a multi-modal fusion network to perform weighted fusion and nonlinear mapping to obtain a fusion feature map. Optionally, the feature quantization computation module is specifically configured to generate, based on the preprocessed side face image, mandibular bone features by: Extracting key points of a mandibular part from the preprocessed side face image by using a depth generation model, wherein the key points of the mandibular part comprise a tragus point, a mandibular angular point, a pre-chin point and a lip protrusion point; based on the key points of the mandibular part, the mandibular plane angle and the depth of the chin labial sulcus are calculated and used as mandibular bone characteristics. Optionally, the first mouth image is an oropharyngeal cavity image; The feature quantization calculation module is specifically configured to generate, based on the preprocessed first mouth image, a tongue space duty ratio feature by: extracting an oral cavity visual area and a tongue area from the preprocessed first mouth image by adopting a semantic segmentation network; And calculating the ratio of the number of the pixel points in the tongue region to the number of the pixel points in the visible