CN-121982757-A - Intelligent gaze following display control method and system based on face recognition and space mapping

CN121982757ACN 121982757 ACN121982757 ACN 121982757ACN-121982757-A

Abstract

The invention provides an intelligent gaze following display control method and system based on face recognition and space mapping, which relate to the field of image processing, wherein the system comprises a camera acquisition module, a camera acquisition module and a camera acquisition module, wherein the camera acquisition module is used for acquiring images of a target area; the face recognition and gesture estimation module is used for recognizing target personnel through the face recognition model and extracting the head gesture of the target personnel through the gesture estimation model, the space mapping calculation module is used for calculating the space position of the target personnel based on the image of the target area, the following control module is used for calculating display adjustment parameters based on gesture features and the space position of the target personnel, the picture control module is used for adjusting a display picture based on the display adjustment parameters, the automatic reset module is used for calculating the multi-source confidence coefficient of the existence of the target personnel, carrying out gradual reset on the display picture, carrying out smooth transition on the display picture after the target personnel return, and providing natural, smooth and immersive gaze following display experience for users.

Inventors

WANG HUI
XI PUZHAO
AN BO

Assignees

上海卓越睿新数码科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251226

Claims (10)

1. Intelligent gaze following display control system based on face identification and space mapping, which is characterized by comprising: the camera shooting acquisition module is used for acquiring an image of a target area; the face recognition and posture estimation module is used for recognizing target persons in the image of the target area through the face recognition model, extracting a region-of-interest image, and extracting the head posture of the target persons from the region-of-interest image through the posture estimation model; the space mapping calculation module is used for calculating the space position of the target personnel based on the image of the target area; the following control module is used for calculating display adjustment parameters based on the gesture characteristics and the spatial position of the target personnel; The picture control module is used for adjusting the display picture based on the display adjustment parameters; the automatic resetting module is used for calculating the existence multisource confidence coefficient of the target personnel, carrying out gradual resetting on the display picture based on the existence multisource confidence coefficient of the target personnel, and carrying out smooth transition on the display picture after the target personnel return.
2. The intelligent gaze-following display control system of claim 1, wherein the face recognition and pose estimation module is configured to: performing brightness normalization and self-adaptive histogram equalization on the image of the target area to generate a preprocessed image; carrying out face recognition on the preprocessed image through a face recognition model, and extracting a face region image; For each face region image, extracting a multi-dimensional face feature vector of the face region image, and calculating cosine similarity between the multi-dimensional face feature vector of the face region image and the multi-dimensional face feature vector of the target person; And identifying the face region image of the target person as a region-of-interest image based on cosine similarity of the multi-dimensional face feature vector of each face region image and the multi-dimensional face feature vector of the target person.
3. The intelligent gaze-following display control system of claim 1, wherein the face recognition and pose estimation module is configured to: dynamically enhancing the characteristics of key areas in the region-of-interest graph through a light attention mechanism of the gesture estimation model; Generating fusion features by cross-scale fusion of shallow textures and high-level semantic information of the region-of-interest map after dynamic enhancement through a multi-layer feature fusion unit of the gesture estimation model; And recognizing the facial key points based on the fusion characteristics through the posture estimation model, and determining the head posture of the target person.
4. The intelligent gaze-following display control system of claim 3, wherein the face recognition and pose estimation module is configured to: generating a thermodynamic diagram of a plurality of facial key points based on the fusion characteristics through a thermodynamic diagram generation branch of the gesture estimation model, wherein the thermodynamic diagram represents the probability of the facial key points appearing at various positions in the region-of-interest diagram through the shades of different colors; generating predicted coordinates of a plurality of face key points based on the fusion characteristics through coordinate regression branches of the gesture estimation model; Determining coordinates of the face key points based on thermodynamic diagrams and predicted coordinates of the plurality of face key points through a joint optimization unit of the gesture estimation model; Based on the coordinates of the facial key points, the head pose of the target person is determined.
5. The intelligent gaze-following display control system of claim 4, wherein the face recognition and pose estimation module is configured to: For the coordinates of each face key point, calculating the weight of the coordinates of the face key point based on the thermodynamic diagram of the face key point; Removing coordinates of abnormal face key points based on the weight of the coordinates of each face key point; determining an initial head pose of the target person based on the coordinates of the remaining facial key points; Based on the head gestures of the target personnel corresponding to the images of the adjacent continuous multi-frame target areas, carrying out Kalman filtering and adaptive exponential smoothing on the initial head gestures of the target personnel, and determining the head gestures of the target personnel.
6. The intelligent gaze-following display control system of any one of claims 1-5, wherein the spatial map computation module is configured to: Outputting a human body boundary frame and human body key points of a target person based on the image of the target area through a human body identification model according to a composite weight selection mechanism; Determining an initial space position of a target person based on a human body boundary frame and human body key points of the target person and combining camera internal parameters and depth information; And calculating the spatial position of the target person based on the initial spatial position of the target person by combining the Kalman filtering with a differential and delay confirmation mechanism.
7. The intelligent gaze-following display control system of claim 6, wherein the spatial mapping calculation module is configured to: Outputting a plurality of human body boundary boxes based on the image of the target area through the human body recognition model; For each human body boundary box, determining a composite weight of each human body boundary box based on cosine similarity, space distance and SIOU scores of the multidimensional face feature vectors; Determining a human body boundary box of the target person based on the composite weight of each human body boundary box; and extracting human body key points of the target personnel based on the human body boundary box of the target personnel.
8. The intelligent gaze-following display control system based on face recognition and spatial mapping of any one of claims 1-6, wherein the automatic reset module is configured to: And calculating the multi-source confidence coefficient of the target person based on the target person identification result, the human face key point integrity, the human shape key point integrity, the depth consistency and the optical flow consistency of the image of the target area.
9. The intelligent gaze-following display control system based on face recognition and spatial mapping of any one of claims 1-6, wherein the automatic reset module is configured to: determining a current presence state based on presence multisource confidence of a target person corresponding to the sliding window and the image of the target area of the continuous multiframe, wherein the current presence state is active, intermittent presence and departure; if the current existing state is leaving, when the leaving time is smaller than or equal to the leaving time threshold, gradually resetting the display picture to the initial picture by adopting an S-shaped interpolation function when the leaving time is larger than the leaving time threshold.
10. The intelligent gaze following display control method based on face recognition and space mapping is characterized by being applied to the intelligent gaze following display control system based on face recognition and space mapping as claimed in claim 1, and comprising: collecting an image of a target area; identifying target personnel in the image of the target area through a face recognition model, extracting a region-of-interest image, and extracting the gesture characteristics of the target personnel from the region-of-interest image through a gesture estimation model; calculating the spatial position of the target person based on the image of the target area; calculating and displaying a steering angle based on the gesture features and the space position of the target person; Adjusting the display screen based on the display steering angle; calculating the presence multisource confidence coefficient of the target personnel, and carrying out gradual change reset on the display picture based on the presence multisource confidence coefficient of the target personnel; and after the target person returns, carrying out smooth transition on the display picture.

Description

Intelligent gaze following display control method and system based on face recognition and space mapping Technical Field The invention relates to the field of image processing, in particular to an intelligent gaze following display control method and system based on face recognition and space mapping. Background In the existing multimedia presentation, conference demonstration and intelligent interaction system, picture control is dependent on fixed viewing angles or manual switching. The display control of a fixed camera view angle or a preset rotating path is generally adopted, namely, a plurality of fixed shooting angles or path tracks are preset in a camera, the user moving or scene switching signals are detected according to the set path switching angles, a part of the improvement scheme utilizes a basic face detection algorithm, the camera rotation is triggered by recognizing the existence of a user face, the simple face following is realized, two-dimensional image plane calculation is adopted, the camera horizontal or pitch angle is adjusted by detecting the offset of the face on a picture, an infrared positioning or depth camera technology is introduced into a part of the system, user coordinates are captured in a limited area, and the picture or camera angle is adjusted through three-dimensional positioning information. The prior art has the following defects: 1. The picture response is unnatural, namely the rotation angle of the camera or the display picture is a preset fixed value, the real-time adjustment can not be realized according to the actual face direction and the space movement of the user, and the visual angle change can only be approximately simulated through a manual or program preset path, so that the picture response is stiff. 2. The space perception capability is poor, the face detection algorithm is mostly based on two-dimensional plane calculation, the three-dimensional space mapping capability is lacked, the depth position and orientation relation of a user in the indoor space cannot be accurately restored, the picture is delayed or deviated, and a multi-person scene is prone to target confusion. For example, although the current camera or display system can carry out identity judgment through face recognition, the face orientation and the viewing angle are difficult to accurately identify, a large-screen picture cannot be turned in real time according to the movement and the line-of-sight change of a user, the user immersion experience is poor, a human body tracking scheme is often stopped at a 'target detection' level, indoor three-dimensional space angle mapping cannot be realized through two-dimensional image data, the system does not perceive the space movement direction and the depth position of the user, and the picture presents delay, deviation or blocking. 3. The target loss processing is poor, when a person to be followed temporarily leaves a picture, most systems cannot intelligently wait or smoothly return to the middle processing, or immediately reset to cause picture mutation, or stay in an invalid following state for a long time, so that interaction smoothness and visual experience are seriously affected. If the following user leaves the shooting range temporarily, most systems stop picture following immediately or have automatic reset delay, so that scene switching is abrupt, viewing experience is incoherent, and partial systems cannot distinguish main following targets in a multi-user environment, so that focus objects are easy to misjudge or frequently switch. 4. The scene adaptation capability is weak, the existing space following system is mostly designed in a single scene, the universal adaptation and self-recovery capability aiming at multiple scenes and multiple devices is lacking, and the dynamic visual angle requirements of meeting display, exhibition demonstration, intelligent classrooms and other complex interaction environments are difficult to meet. Therefore, there is a need to provide an intelligent gaze-following display control method and system based on face recognition and spatial mapping for providing a natural, smooth, immersive gaze-following display experience to a user. Disclosure of Invention The invention provides an intelligent gaze following display control system based on face recognition and space mapping, which comprises a camera acquisition module, a face recognition and posture estimation module, a space mapping calculation module, a following control module, a picture control module and an automatic reset module, wherein the camera acquisition module is used for acquiring an image of a target area, the face recognition and posture estimation module is used for recognizing a target person in the image of the target area through a face recognition model, extracting a region-of-interest graph, extracting the head posture of the target person from the region-of-interest graph through a posture estimation model, the space mapping calc