CN-122029569-A - Pose estimation method, virtual display image generation method and related equipment thereof

CN122029569ACN 122029569 ACN122029569 ACN 122029569ACN-122029569-A

Abstract

The application discloses a pose estimation method, which comprises the steps of positioning feature points through a first thread to obtain position information, storing inertial data of an inertial sensor into a buffer module, updating the first pose according to the inertial data and the position information through the first thread to obtain a second pose, and predicting the target pose according to the target inertial data and the second pose through the second thread, so that the predicted power consumption of the pose can be reduced, and the resource consumption can be reduced.

Inventors

LIU YI
WANG ZEWEI
Jia Zhaoyi
CHEN NUO

Assignees

广州视源电子科技股份有限公司
广州视源人工智能创新研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20240912

Claims (20)

A pose estimation method is applied to a head-mounted display device, wherein the head-mounted display device comprises a visual sensor, an inertial sensor and a processor, the processor is configured with a first thread, a second thread and a buffer module, the frame rate of the inertial sensor is larger than that of the visual sensor, and the method comprises the following steps: acquiring a current frame image acquired by the vision sensor; positioning the characteristic points in the current frame image through the first thread to obtain the position information of the characteristic points; storing inertial data detected by the inertial sensor between the current frame image and the historical frame image to the buffer module; Updating a first pose of the head-mounted display device at a moment corresponding to the history frame image according to the position information and the inertia data through the first thread to obtain a second pose; And predicting the pose of the head-mounted display device at the moment corresponding to the current frame image as a target pose according to the target inertial data of the inertial sensor at the current moment and the second pose through the second thread.
The pose estimation method of claim 1, wherein a frame rate of the inertial sensor is greater than a frame rate of the vision sensor, further comprising: and outputting the target pose according to the frame rate of the inertial sensor.
The pose estimation method according to claim 1, wherein the updating, by the first thread, the first pose of the head-mounted display device at the time corresponding to the history frame image according to the position information and the inertial data, to obtain the second pose includes: Preprocessing the inertial data through the first thread to obtain preprocessed data, wherein the preprocessed data is used for representing the pose change degree of the head-mounted display device between the moment corresponding to the current frame image and the moment corresponding to the historical frame image; And calling a preset target optimization model through the first thread, and updating the first pose according to the preprocessing data and the position information to obtain the second pose.
The pose estimation method according to claim 3, wherein the step of calling, by the first thread, a preset target optimization model, updating the first pose according to the preprocessing data and the position information, and obtaining the second pose includes: determining, by the first thread, a first prediction error according to the preprocessed data and the first pose; determining a second prediction error according to the position information and the predicted position of the feature point through the first thread, wherein the position information is real position information of the feature point, and the predicted position is information obtained by predicting the position of the feature point; And calling the target optimization model through the first thread, and updating the first pose according to the first prediction error and the second prediction error to obtain the second pose.
The pose estimation method of claim 4, wherein said determining, by said first thread, a first prediction error from said preprocessed data and said first pose comprises: Predicting, by the first thread, a pose of the head-mounted display device at a time corresponding to the current frame image according to the preprocessed data and the first pose, to obtain a third pose; And determining, by the first thread, the first prediction error according to the preprocessed data, the first pose and the third pose.
The pose estimation method of claim 5, wherein said determining, by said first thread, said first prediction error from said preprocessed data, said first pose, and said third pose comprises: determining, by the first thread, a rotation prediction error from the rotation increment in the preprocessed data, the historical rotation state in the first pose, and the rotation state in the third pose; Determining, by the first thread, a speed prediction error from a speed increment in the preprocessed data, a historical speed state in the first pose, and a speed state in the third pose; Determining, by the first thread, a translation prediction error according to a translation increment in the preprocessed data, a historical translation state in the first pose, and a translation state in the third pose; determining, by the first thread, the first prediction error from the rotational prediction error, the speed prediction error, and the translational prediction error.
The pose estimation method according to claim 4, wherein the position information includes a first pixel coordinate and a spatial coordinate corresponding to the feature point, the predicted position includes a second pixel coordinate corresponding to the feature point, and the determining, by the first thread, a second prediction error according to the position information and the predicted position of the feature point includes: Projecting the space coordinates according to the parameters of the visual sensor through the first thread to obtain second pixel coordinates of the feature points; and determining the second prediction error according to the relative distance between the first pixel coordinate and the second pixel coordinate.
The pose estimation method according to claim 1, wherein after the storing inertial data detected by the inertial sensor between the current frame image and the history frame image to the buffer module, further comprising: when the current frame image is detected, waking up the first thread; and reading inertial data in the buffer module when the buffer module is unoccupied through the first thread.
The pose estimation method according to claim 1, wherein the predicting, by the second thread, the pose of the head-mounted display device at the time corresponding to the current frame image as the target pose according to the target inertial data of the inertial sensor at the current time and the second pose includes: Acquiring the updated bias parameters of the inertial sensor from the first thread through the second thread; And predicting the pose of the head-mounted display device at the moment corresponding to the current frame image according to the second pose, the target inertial data of the inertial sensor at the current moment and the updated bias parameter by the second thread to obtain the target pose.
The pose estimation method according to claim 1, wherein the positioning, by the first thread, the feature point in the current frame image to obtain the position information of the feature point includes: Positioning the characteristic points in the current frame image through the first thread to obtain first pixel coordinates of the characteristic points in the current frame image; Mapping the characteristic points according to the first pixel coordinates through the first thread to obtain space coordinates of the characteristic points in a world coordinate system; And determining the position information according to the first pixel coordinates and the space coordinates through the first thread.
The pose estimation method according to claim 10, wherein the mapping, by the first thread, the feature point according to the first pixel coordinate to obtain a spatial coordinate of the feature point in a world coordinate system includes: acquiring a mapping relation between the first pixel coordinate and the space coordinate through the first thread; and mapping the first pixel coordinates according to the mapping relation through the first thread to obtain the space coordinates of the feature points in the world coordinate system.
The pose estimation method according to claim 10, wherein the positioning, by the first thread, the feature point in the current frame image to obtain a first pixel coordinate of the feature point in the current frame image includes: Preprocessing the current frame image through the first thread to obtain a preprocessed image; and positioning characteristic points in the processed image according to the historical frame image in the head-mounted display device through the first thread to obtain the first pixel coordinate.
The pose estimation method of claim 1, wherein a frame rate of the first thread is the same as a frame rate of the vision sensor and a frame rate of the second thread is the same as a frame rate of the inertial sensor.
A virtual display image generation method is applied to a head-mounted display device, wherein the head-mounted display device comprises a visual sensor, an inertial sensor and a processor, the processor is configured with a first thread, a second thread and a buffer module, the frame rate of the inertial sensor is larger than that of the visual sensor, and the method comprises the following steps: acquiring a current frame image acquired by the vision sensor; positioning the characteristic points in the current frame image through the first thread to obtain the position information of the characteristic points; storing inertial data detected by the inertial sensor between the current frame image and the historical frame image to the buffer module; Updating a first pose of the head-mounted display device at a moment corresponding to the history frame image according to the position information and the inertia data through the first thread to obtain a second pose; Predicting the pose of the head-mounted display device at the moment corresponding to the current frame image as a target pose according to the target inertial data of the inertial sensor at the current moment and the second pose through the second thread; and generating a virtual display image of the scene where the head-mounted display device is located according to the target pose rendering.
The method of generating a virtual display image according to claim 14, wherein a frame rate of the inertial sensor is greater than a frame rate of the visual sensor, further comprising: and outputting the target pose according to the frame rate of the inertial sensor.
A head mounted display device, wherein the head mounted display device comprises a vision sensor, an inertial sensor, and a processor configured with a first thread, a second thread, and a buffer module, the processor configured to: acquiring a current frame image acquired by the vision sensor; positioning the characteristic points in the current frame image through the first thread to obtain the position information of the characteristic points; storing inertial data detected by the inertial sensor between the current frame image and the historical frame image to the buffer module; Updating a first pose of the head-mounted display device at a moment corresponding to the history frame image according to the position information and the inertia data through the first thread to obtain a second pose; And predicting the pose of the head-mounted display device at the moment corresponding to the current frame image as a target pose according to the target inertial data of the inertial sensor at the current moment and the second pose through the second thread.
The head mounted display device of claim 16, wherein the processor is further configured to: and outputting the target pose according to the frame rate of the inertial sensor.
A head mounted display device, wherein the head mounted display device comprises a vision sensor, an inertial sensor, and a processor, the processor configured with a first thread, a second thread, and a buffer module, a frame rate of the inertial sensor being greater than a frame rate of the vision sensor, the processor configured to: acquiring a current frame image acquired by the vision sensor; positioning the characteristic points in the current frame image through the first thread to obtain the position information of the characteristic points; storing inertial data detected by the inertial sensor between the current frame image and the historical frame image to the buffer module; Updating a first pose of the head-mounted display device at a moment corresponding to the history frame image according to the position information and the inertia data through the first thread to obtain a second pose; Predicting the pose of the head-mounted display device at the moment corresponding to the current frame image as a target pose according to the target inertial data of the inertial sensor at the current moment and the second pose through the second thread; and generating a virtual display image of the scene where the head-mounted display device is located according to the target pose rendering.
The head mounted display device of claim 18, wherein the processor is further configured to: and outputting the target pose according to the frame rate of the inertial sensor.
A computer readable storage medium, wherein the computer readable storage medium stores a computer program adapted to be loaded by a processor to perform the pose estimation method of any of claims 1 to 13 or the virtual display image generation method of claim 14.

Description

Pose estimation method, virtual display image generation method and related equipment thereof Technical Field The application relates to the technical field of computer vision, in particular to a pose estimation method, a virtual display image generation method and related equipment thereof. Background With the development of Virtual Reality (VR), augmented Reality (Augmented Reality, AR), and Mixed Reality (MR) technologies, there is an increasing demand for accurate estimation of the head pose of a user. The accuracy of head pose estimation will directly impact the naturalness and immersion of the user experience. Traditional head pose estimation methods rely on image data captured by a camera or sensor data acquired based on inertial measurement units (Inertial Measurement Unit, IMU). However, the estimation method of a single data source often has limitations, for example, the visual method based on image data can reduce performance in environments with insignificant illumination change or features, and the pose estimation method based on IMU can drift after integrating and calculating sensor data for a long time, so that accurate estimation of the pose cannot be performed in real time. In head-mounted display devices (e.g., AR glasses, VR helmets, etc.) that involve spatial computing, low latency, high accuracy head pose estimation is a core requirement and fundamental function of such devices. Furthermore, due to the limited volume of the head mounted display device, it also places stringent demands on the use of computing resources. Technical problem The existing technology often needs to occupy excessive computing resources to ensure the accuracy of pose prediction, but this limits the use of this type of head pose estimation technology on head-mounted display devices, so that it cannot be deployed on low-cost and low-power chips. Technical solution The embodiment of the application provides a pose estimation method, a virtual display image generation method and related equipment, which can solve the technical problems that the current pose prediction needs to consume more computing resources and cannot be deployed on a chip with low cost and low power consumption. In a first aspect, an embodiment of the present application provides a pose estimation method applied to a head-mounted display device, where the head-mounted display device includes a vision sensor, an inertial sensor, and a processor, the processor is configured with a first thread, a second thread, and a buffer module, and a frame rate of the inertial sensor is greater than a frame rate of the vision sensor, and the method includes: acquiring a current frame image acquired by a vision sensor; Positioning the characteristic points in the current frame image through a first thread to obtain the position information of the characteristic points; Storing inertial data detected by an inertial sensor between a current frame image and a historical frame image to a buffer module; updating a first pose of the head-mounted display device at a moment corresponding to the historical frame image according to the position information and the inertial data through a first thread to obtain a second pose; and predicting the pose of the head-mounted display device at the moment corresponding to the current frame image as a target pose according to the target inertial data and the second pose of the inertial sensor at the current moment through a second thread. Alternatively, the above-mentioned pose estimation method may include outputting the target pose according to the frame rate of the inertial sensor. Optionally, updating, by the first thread, the first pose of the head-mounted display device at the moment corresponding to the history frame image according to the position information and the inertial data to obtain the second pose, where the updating includes: Preprocessing inertial data through a first thread to obtain preprocessed data, wherein the preprocessed data is used for representing the pose change degree of the head-mounted display equipment between the moment corresponding to the current frame image and the moment corresponding to the historical frame image; And calling a preset target optimization model through the first thread, and updating the first pose according to the preprocessing data and the position information to obtain the second pose. According to the scheme, fusion of the characteristic point positions and the inertial data can be achieved, and the pose can be accurately deduced through the fused data. And the first pose is updated through the target optimization model, so that the pose updating efficiency can be further improved. Optionally, the above-mentioned calling, by the first thread, a preset target optimization model, updating the first pose according to the preprocessed data and the position information to obtain the second pose, including: determining, by the first thread, a first prediction error according to t