CN-121999183-A - Virtual reality multi-person cooperation method, system and storage medium
Abstract
The invention relates to the technical field of virtual reality and computer vision, in particular to a virtual reality multi-person cooperation method, a system and a storage medium based on server-side calculation and self-supervision feature extraction, aiming at solving the technical problem of insufficient calculation force when mobile terminal equipment builds a map in a large space in the prior art; the method comprises the steps of receiving sensor data by a server, extracting features of the image data, generating a global map file by combining inertial measurement unit data, obtaining a current frame image by virtual reality equipment and sending the current frame image to the server through remote procedure call when repositioning is needed, calculating pose based on the current frame image and the global map file by the server to obtain repositioning pose, and sending the repositioning pose to the virtual reality equipment, wherein the virtual reality equipment renders a virtual scene based on the repositioning pose.
Inventors
- PANG HAIYAN
- ZHANG YI
- LI WENQUAN
- CHEN FEI
- CHEN SHAOPING
Assignees
- 深圳创维新世界科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251223
Claims (10)
- 1. A virtual reality multiplayer collaboration method, characterized by being applied to a system comprising a virtual reality device and a server, the method comprising: The virtual reality device collects sensor data in an environment, wherein the sensor data comprises image data and inertial measurement unit data, and the sensor data is transmitted to the server through remote procedure call; The server receives the sensor data, performs feature extraction on the image data based on a self-supervision interest point detection and description network, and performs visual inertia and positioning and map construction simultaneously by combining the inertia measurement unit data to generate a global map file; When repositioning is needed, the virtual reality device acquires a current frame image and sends the current frame image to the server through remote procedure call; The server calculates the pose based on the current frame image and the global map file to obtain a repositioning pose, and sends the repositioning pose to the virtual reality equipment; the virtual reality device receives the repositioning pose and initializes a local tracking thread based on the repositioning pose to render a virtual scene.
- 2. The method of claim 1, wherein the transmitting the sensor data to the server via a remote procedure call comprises: Adding a hardware time stamp for each acquired frame of the image data and each inertial measurement unit data; packaging the image data, the inertial measurement unit data and the corresponding hardware time stamps into a sensor data packet with a uniform format; And continuously sending the sensor data packet to the server through the established bidirectional flow type remote procedure call channel.
- 3. The method of claim 1, wherein the server receives the sensor data, performs feature extraction on the image data based on a self-supervising point of interest detection and description network, and performs visual inertial simultaneous localization and mapping in conjunction with the inertial measurement unit data, generating a global map file, comprising: Said sensing of receipt preprocessing the data of the device; Inputting the preprocessed image data into the self-supervision interest point detection and description network, and extracting to obtain feature points and descriptors; performing frame pose estimation by utilizing the characteristic points, the descriptors and the preprocessed inertial measurement unit data; carrying out local map construction based on the frame pose estimation result; And executing loop detection and global optimization on the constructed local map, and storing the optimized map information as the global map file.
- 4. The method of claim 1, wherein during the process of the server performing visual inertia while locating and mapping, the method further comprises: The server transmits the calculated real-time six-degree-of-freedom pose back to the virtual reality equipment in real time through a bidirectional flow; and the virtual reality equipment receives the real-time six-degree-of-freedom pose, performs motion prediction by combining local inertial measurement unit data, and drives virtual content rendering.
- 5. The method of claim 1, wherein the global map file includes a file header, metadata, map point information, and key frame information; the map point information comprises three-dimensional position coordinates, feature descriptors, observation information, reference key frame identifiers and average observation directions, and the key frame information comprises key frame pose, word bag vectors and connection relations.
- 6. The method of claim 1, wherein the server performs pose computation based on the current frame image and the global map file to obtain a repositioning pose, comprising: Extracting characteristic points of the current frame image, and calculating a bag-of-word vector of the current frame; Searching candidate key frames with the similarity of the word bag vector with the current frame meeting a preset condition in a key frame database of the global map file; solving an initial pose based on the candidate key frame and the current frame image by combining a PnP algorithm with a RANSAC algorithm; And optimizing the reprojection error corresponding to the initial pose by using a local beam method adjustment to obtain the repositioning pose.
- 7. The method according to claim 1, wherein the method further comprises: The virtual reality equipment records a first area of the acquired image data, and displays the first area and a second area of the non-acquired image data in a distinguishing way through a rendering system so as to guide the completion of the construction and coverage of a target space; The server stores the generated global map file in a memory, and monitors repositioning requests from one or more virtual reality devices so as to realize the spatial consistency of the plurality of virtual reality devices in the same virtual coordinate system.
- 8. The method according to any one of claims 1 to 7, wherein the descriptor output by the self-monitoring interest point detection and description network is a floating point descriptor, and the feature extraction process uses euclidean distance for feature matching.
- 9. A virtual reality multiplayer collaboration system, comprising: The virtual reality device is used for collecting sensor data comprising image data and inertial measurement unit data, sending the sensor data through remote procedure call, receiving repositioning pose and initializing a local tracking thread based on the repositioning pose to render a virtual scene; The server is in communication connection with the virtual reality equipment and is used for receiving the sensor data, performing feature extraction by utilizing a self-supervision interest point detection and description network, constructing a global map file, responding to a repositioning request of the virtual reality equipment, calculating according to the global map file and returning to the repositioning pose.
- 10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any one of claims 1 to 8.
Description
Virtual reality multi-person cooperation method, system and storage medium Technical Field The invention relates to the technical field of virtual reality and computer vision, in particular to a virtual reality multi-person cooperation method, a system and a storage medium based on server-side computing and self-supervision feature extraction. Background Virtual Reality (VR) head-up is a core device that implements an immersive experience, and with the development of technology, supporting six-degree-of-freedom (6 DoF) interactions has become a standard for high-end VR systems. In order to achieve six degrees of freedom tracking, a simultaneous localization and mapping (SLAM) technique is widely used, which constructs a map in real time in an unknown environment and estimates the position of the device itself using sensor data such as cameras, inertial Measurement Units (IMUs), etc. However, most existing VRSLAM solutions run the mapping algorithm directly locally on the VR head display. Because the VR head is used as the mobile terminal device, the computing capacity and the storage space are obviously limited by the volume and the power consumption of hardware, and high-precision mapping of a large-scale physical space is difficult to support efficiently. The complex SLAM mapping task is executed locally, so that not only is the efficiency low, but also a large amount of system resources of the terminal are occupied, the equipment is seriously heated or the endurance time is shortened, and the long-time and large-space use requirements are difficult to meet. Disclosure of Invention The embodiment of the invention provides a virtual reality multi-person cooperation method, a system and a storage medium, aiming at solving the technical problem that in the prior art, the computing power of mobile terminal equipment is insufficient when a large space is built. The embodiment of the invention provides a virtual reality multi-person cooperation method which is applied to a system comprising virtual reality equipment and a server, wherein the virtual reality equipment collects sensor data in an environment, the sensor data comprise image data and inertial measurement unit data, the sensor data are transmitted to the server through remote procedure call, the server receives the sensor data, performs feature extraction on the image data based on a self-supervision interest point detection and description network, performs visual inertia and positioning and map construction in combination with the inertial measurement unit data to generate a global map file, when repositioning is needed, the virtual reality equipment acquires a current frame image and sends the current frame image to the server through remote procedure call, the server performs pose calculation based on the current frame image and the global map file to obtain a repositioning pose, and sends the repositioning pose to the virtual reality equipment, and the virtual reality equipment receives the repositioning pose and initializes a local tracking thread to render a virtual scene based on the repositioning pose. With reference to the first aspect of the embodiment of the present invention, in a first implementation manner of the first aspect of the embodiment of the present invention, the transmitting the sensor data to the server through a remote procedure call includes adding a hardware timestamp for each frame of the collected image data and each inertial measurement unit data, packaging the image data, the inertial measurement unit data and the corresponding hardware timestamp into a sensor data packet with a uniform format, and continuously transmitting the sensor data packet to the server through an established bidirectional streaming remote procedure call channel. In combination with the first aspect of the embodiment of the present invention, in a second implementation manner of the first aspect of the embodiment of the present invention, the server receives the sensor data, performs feature extraction on the image data based on a self-supervision interest point detection and description network, performs visual inertia and positioning and map construction simultaneously in combination with the inertia measurement unit data, and generates a global map file, which includes preprocessing the received sensor data, inputting the preprocessed image data into the self-supervision interest point detection and description network, extracting feature points and descriptors, performing frame pose estimation by using the feature points, the descriptors and the preprocessed inertia measurement unit data, performing local map construction based on a result of the frame pose estimation, performing loop detection and global optimization on the constructed local map, and storing optimized map information as the global map file. With reference to the first aspect of the embodiment of the present invention, in a third implementation manner of the first aspect o