CN-115686193-B - Virtual model three-dimensional gesture manipulation method and system in augmented reality environment

CN115686193BCN 115686193 BCN115686193 BCN 115686193BCN-115686193-B

Abstract

The invention discloses a virtual model three-dimensional gesture control method and system in an augmented reality environment, and belongs to the field of human-computer interaction of augmented reality. The gesture intention recognition system comprises a data acquisition and processing module, a virtual hand module, a collision calculation module and a gesture intention recognition module. According to the method, the condition of the 'grabbing pair' is constructed according to the characteristics of an object grabbing physical process and an augmented reality environment, the grabbing intention recognition algorithm is constructed based on the 'grabbing pair', the grabbing state is judged according to the grabbing intention recognition algorithm and the 'grabbing pair' condition, and whether grabbing is completed or not is judged without contact calculation based on a plurality of contact points, so that grabbing intention judgment is more flexible, the situation is closer to a real three-dimensional gesture operation situation, complex gesture interaction scenes are more adapted, visual interaction feeling of a user is more met, meanwhile, if a plurality of pairs of 'grabbing pairs' exist, a plurality of contact points forming the grabbing pair participate in the interaction intention recognition, and the robustness, flexibility, efficiency and immersion feeling of gesture interaction intention recognition are improved.

Inventors

HU YAOGUANG
WANG JINGFEI
YANG XIAONAN
WANG PENG
Mao Wanting

Assignees

北京理工大学

Dates

Publication Date: 20260505
Application Date: 20220906

Claims (4)

1. A three-dimensional gesture control method of a virtual model in an augmented reality environment is characterized by comprising the following steps, The method comprises the steps of collecting images of hands of a current frame, determining position and posture data of key nodes of the hands relative to AR equipment based on a convolutional neural network two-hand pose estimation algorithm, wherein the convolutional neural network algorithm is an algorithm consisting of two convolutional neural networks from 2D to 3D two-hand pose estimation, a first convolutional neural network is trained for realizing hand positioning, estimating the 2D position of the center of the hand in the images, and then a localized image of the hand position is used for generating a normalized clipping image together with corresponding input depth values, and the image is transmitted to a second convolutional neural network to return to the position of a relative 3D hand joint in real time; Superposing a virtual hand model at the key nodes of the hands identified by the current frame, and determining the position and the posture of the virtual hand model according to the position and the posture of the key nodes to realize the mapping of the real hands in a virtual space; step three, based on a collision detection algorithm, calculating whether contact or collision occurs between the virtual hand model and other virtual models to be manipulated in real time in each frame; if the collision detection algorithm detects that two hands are contacted with other virtual models, according to the collision detection algorithm, calculating whether the two hands and the manipulated model can form a gripping pair or not, judging whether a gripping condition exists between the two hands and the virtual model, wherein the gripping pair consists of two contact points, and if more than one gripping pair exists, the gripped virtual model is judged to be in a gripping state; The physical process of grasping the real object is characterized in that a basic rule of Newton rigid body mechanics is applied to judge whether the object can be grasped or not according to whether the object is stressed and the friction force between the contact surfaces of the virtual hand model and the manipulated model, and the realization principle is that a simplified coulomb friction force model is utilized to analyze the stressed state of the object; The said "grip pair" is composed of two contact points of virtual hand model and gripped model, and is characterized by that the angle between the connecting line of two contact points and normal line of contact surface is not greater than a fixed angle The two contact points will form a stable grip pair g (a, b), the fixed angle Namely a friction angle; The grasping intention recognition algorithm is constructed based on the condition of a grasping pair, and circularly judges whether all the current contact points and the other contact point form a pair of grasping pairs, and for any two contact points a and b of a virtual hand and a virtual object in one-time circular judgment, the angle between the connecting line of the two contact points and the normal line of the contact surface of the two contact points is not more than a fixed angle Said two contact points will form a stable gripping pair g (a, b), said fixed angle I.e. the friction angle, i.e. the grip co-g (a, b) should be such that Wherein, the And The normal vector is the normal vector of the contact point a and the contact point b, and is the normal vector of the cylindrical surface of the joint virtual model at the contact point; is the connection vector of contact points a and b; As the friction angle, the value of the friction angle needs to be set by testing for a specific manipulated model so as to meet the stable and natural grasping of the virtual part; Step five, constructing a grabbing center acquisition method according to the condition of the grabbing pair constructed in the step four so as to acquire a grabbing center, wherein the grabbing center is the center of a contact point connecting line forming the grabbing pair; if the virtual model is judged to be in the gripping state based on the gripping intention recognition algorithm in the step four, calculating virtual force or moment applied by two hands to the virtual model based on displacement and posture transformation of the gripping centers of the two hands on the manipulated model according to the manipulation intention recognition algorithm, and driving movement or rotation of the virtual model by the virtual force or moment; the manipulation intention recognition algorithm is used for calculating virtual force or virtual moment applied by both hands to the virtual model in the current frame based on the pose transformation trend of the grabbing center, and calculating parameters of movement and rotation of the virtual model according to the virtual force or the virtual moment, wherein the parameters of movement and rotation comprise movement direction and distance and rotation direction and angle; and step six, repeating the step one to the step five, and performing three-dimensional gesture manipulation in an augmented reality environment according to the virtual hand model, the grasping intention recognition method and the manipulation intention recognition method.
2. The method for manipulating three-dimensional gestures of a virtual model in an augmented reality environment according to claim 1, wherein said virtual hand model is composed of a plurality of virtual joint models, each virtual joint model is a cylinder and approximates finger joints of a real hand, and said virtual hand model is represented by the following parameterizations: Wherein Jointi is the ith virtual joint model, Is a position vector of the virtual joint model and is composed of a group of vectors under an augmented reality environment coordinate system , And The representation is made of a combination of a first and a second color, For the pose of the virtual joint model, a set of Euler angle parameters in an augmented reality environment coordinate system , And Representing, size is a parameter of the virtual joint model, Indicating the length of the cylinder and, Representing the diameter of the cylinder, child represents the driven sub-joint model of the virtual joint model, And Representing the kth and mth virtual joint models, respectively; Each virtual joint model in the virtual hand model corresponds to the key node identified by gesture tracking in the first step, the position and gesture data of each identified hand key node are used for updating the position and gesture of the virtual joint model of the current frame, the position and gesture of the virtual hand model are determined according to the position and gesture of the key node, and mapping of the real hands in the virtual space is realized, and the formula is as follows: Wherein, the For the position vector of the ith virtual joint model, A rotation matrix of the ith virtual joint model, wherein the conversion relation between the rotation matrix and Euler angles is shown as a formula (5), and the rotation matrix is a rotation matrix of the ith virtual joint model Representing rotation about the z-axis The degree of the heat dissipation, Representing rotation about the y-axis The degree of the heat dissipation, Representing rotation about the x-axis T is a transformation matrix representing an augmented reality environment coordinate system and a camera coordinate system where the virtual joint model is located; is the position vector of the key node corresponding to the ith virtual joint model, And the rotation matrix of the key node corresponding to the ith virtual joint model.
3. The method for manipulating a virtual model three-dimensional gesture in an augmented reality environment according to claim 1, wherein said grasping center is a center point representing movement of the entire hand, the entire hand is regarded as a complete rigid body, and the position, posture and speed of the grasping center represent movement parameters of the entire virtual hand; the method for judging the grabbing center comprises the following steps of judging the positions and the number of the grabbing pairs according to the conditions of the grabbing pairs constructed in the step four, regarding the grabbing pairs as a unified rigid body, and representing the positions and the postures of the rigid body by the grabbing center, wherein if one grabbing pair exists, the positions and the postures of the grabbing center are calculated as follows: Wherein, the Indicating the position of the centre of gripping, And Indicating the location of the contact points that make up the "grip pair", 、 And Three euler angle parameters respectively representing the centre of gripping, 、 And A unit vector indicating the x, y and z axes of the current coordinate system; if a plurality of 'gripping pairs' exist, judging according to the connecting line length of the contact points of the 'gripping pairs', determining the 'gripping pair' with the longest connecting line length as the main gripping pair, and constructing a gripping center according to the formulas (8) and (9); Step 5.1, judging whether the 'grip pair' meets the 'grip pair' cancellation condition, if yes, considering that the user puts down the manipulated virtual model, and not executing the subsequent steps, and not updating the position and the gesture of the virtual model in the next frame, if not, executing the step 5.2; the "grip-pair" cancellation condition is calculated as follows: Wherein, the The distance between the two contact points that make up the "grip pair" for the current frame t, The distance between the two contact points that make up the "grip pair" for frame t-1, When the two contact points forming the gripping pair are far away between two frames and the degree of the distance meets a certain threshold value, the gripping is canceled; Step 5.2, calculating virtual force or moment applied by two hands on the virtual model according to the operation intention recognition algorithm, and continuing to execute step 5.3, wherein after the operation intention recognition algorithm is adopted and the condition judgment of a grabbing center is added, all contact points are enabled to participate in the operation intention recognition process, so that the operation intention recognition is more flexible, and the robustness of the operation intention recognition is improved; The operation intention recognition algorithm is constructed based on a virtual linear and torsion spring damping model (springer-dampers), and the calculation formula of the operation intention recognition algorithm is shown as follows; equation (11) is a calculation equation of the virtual force, Representing the virtual steering force, equation (12) is a calculation equation of the virtual moment, Representing virtual steering moment, wherein the gesture of the current t-th frame that hands touch the center point is represented as In the t+1st frame, the gesture of touching the center point by both hands is expressed as% , ), For the three-dimensional position of the hand in the t-th frame, Quaternion for describing hand orientation; And Linear and angular velocities at the t-th frame for the manipulated virtual model; 、 And 、 By adjusting the coefficients of the linear and torsion spring damping models for specific manipulated virtual models 、 And 、 The coefficients realize stable and smooth dynamic movement of the virtual components and accord with visual interaction feeling of users; Step 5.3, calculating the displacement variation and rotation variation of the virtual model according to the virtual force or moment calculated by the maneuvering intention recognition algorithm in step 5.2 and combining the rigid body dynamics, updating the position and the gesture of the maneuvered virtual model in the current frame according to the displacement and rotation variation, and rendering the virtual model according to the new position and gesture; the displacement variation calculation formula is as follows: Wherein, the Representing the manipulated virtual model displacement at the current t-th frame, Representing the speed at which the current t-th frame is steered to the virtual model, Representing the time difference between the current t frame to the t +1 frame of the next frame, For the virtual steering force identified by the steering intent identification algorithm, m represents the mass of the virtual model being steered; displacement matrices Z, Y and X representing virtual models represent coordinate systems in an augmented reality environment; The rotation variation amount calculation formula is as follows: Wherein, the Representing the angle of rotation of the virtual model being manipulated at the current t-th frame, The virtual steering torque identified for the steering intent recognition algorithm, Representing the time difference between the current t frame to the t +1 frame of the next frame, Representing the moment of inertia of the manipulated virtual model, A rotation matrix representing the virtual model is presented, , And Respectively indicate the rotation angles at Components in the x, y, z axes around the augmented reality environment coordinate system.
4. A gesture interaction system for realizing the method for manipulating a virtual model in an augmented reality environment according to claim 1,2 or 3, comprising a data acquisition and processing module, a virtual hand module, a collision calculation module and a gesture intention recognition module; The data acquisition and processing module is used for acquiring RGB images and depth images of the current frame, and acquiring the position and posture information of hand key nodes of the current frame according to the RGB images and the depth images based on the convolutional neural network two-hand posture estimation algorithm; the virtual hand module is used for superposing the virtual hand model, determining the position and the gesture of the virtual hand model according to the position gesture of the key node, and realizing the mapping of the real hands in the virtual space; A collision calculation module for calculating in real time whether contact or collision occurs between the virtual hand model and other virtual models to be manipulated based on a collision detection algorithm every frame; The gesture intention recognition module comprises a grasping intention recognition sub-module and a manipulation intention recognition sub-module; if the collision detection module detects that the two hands are contacted with other virtual models, according to the grip intention recognition algorithm, whether the two hands and a plurality of contact points of the manipulated model can form a grip pair or not is calculated, whether a grip condition exists between the two hands and the virtual model is judged, if more than one grip pair exists, the gripped virtual model is judged to be in a grip state, whether the grip is completed or not is judged without contact calculation based on the plurality of contact points, so that the grip intention judgment is more flexible, the control condition is more suitable for a complex gesture interaction scene, and the visual interaction feeling of a user is more met, and meanwhile, if a plurality of pairs of grip pairs exist, the plurality of contact points forming the grip pair participate in the interaction intention recognition, so that the robustness, the flexibility and the immersion feeling of the interaction intention recognition are improved; The manipulation intention recognition submodule is used for recognizing the manipulation intention of a user for a manipulated virtual model, the manipulation intention comprises movement and rotation, the manipulation intention recognition submodule is called after recognizing the grasping intention, the intention of the manipulated virtual model is recognized based on a manipulation intention recognition algorithm and is represented by a result of virtual driving force and moment, the displacement variable quantity and the rotation variable quantity of the manipulated virtual model are calculated based on force or moment, the motion state of the manipulated model is updated, the driving of the manipulated model is realized, and compared with a method for predicting the model displacement by using the displacement of both hands only, a manipulation intention recognition mode based on virtual force accords with a physical motion process, and more accurate manipulation can be realized.

Description

Virtual model three-dimensional gesture manipulation method and system in augmented reality environment Technical Field The invention belongs to the field of human-computer interaction of augmented reality, and particularly relates to a virtual model gesture control method in an augmented reality environment. Background Augmented reality (Augmented Reality, AR) is a technology that superimposes virtual information on a real environment, and realizes fusion of the virtual information and the real environment. In an augmented reality environment, because information is presented in a stereoscopic, three-dimensional manner, the traditional ways of interacting based on additional input devices such as a keyboard and mouse are no longer applicable, as they may interfere with a seamless interaction experience. Thus, more natural interactions are being studied and applied in augmented reality, including gestures, speech, body language, eye tracking, and the like. Compared with other interaction modes, the gesture interaction has more advantages in the aspect of direct interaction with the three-dimensional model, and the multi-degree-of-freedom manipulation task of the three-dimensional model can be realized by means of gesture interaction. For example, augmented reality assembly systems, gesture interactions may provide a natural and intuitive user interface for manipulating virtual parts or fixtures during virtual assembly. The current gesture interaction scheme has two dimensions and three dimensions. Two-dimensional gesture interaction schemes are generally oriented to AR systems on mobile devices such as mobile phone tablets, and can simply interact with a model through two-dimensional (planar) gestures, such as dragging a virtual object, and the like. However, since the AR information is three-dimensional, the two-dimensional gesture interaction method is not intuitive and accurate enough. Compared with two-dimensional gesture interaction, the three-dimensional gesture interaction method supports interaction between the user and the virtual object in the three-dimensional space, and more accords with visual experience and experience of people. For example, chinese patent publication No. CN110221690B discloses a gesture interaction method based on an AR scene, which provides a new gesture interaction mode, and can accurately present a shielding relationship between a hand and a virtual object and a "contact" between the hand and the virtual object, so that more interaction actions are made between the hand and the virtual object, and a more real interaction experience between a user and the virtual object in the AR scene is realized. However, the current three-dimensional gesture interaction technical scheme has some drawbacks and problems, most of the three-dimensional gesture interaction technical scheme depends on a fixed gesture recognition result to drive the virtual object to move or rotate, natural interaction between two hands and the virtual object, such as grasping, cannot be supported, and in addition, the pose of the virtual object is not naturally adjusted by a user in the actual manipulation process, so that the virtual object is difficult to be accurately placed at a position expected by the user. Disclosure of Invention The invention mainly aims to provide a three-dimensional gesture manipulation method and system for a virtual model in an augmented reality environment, which can efficiently and accurately identify the natural grasping intention of a user for the three-dimensional virtual model in the augmented reality environment, support the user to move and rotate a virtual object by natural gestures, improve the robustness of a three-dimensional gesture interaction process, enable the experience of gesture interaction to be more visual and natural, further improve the virtual effect of augmented reality on three-dimensional gesture manipulation, and improve the immersion feeling of the user. The aim of the invention is achieved by the following technical scheme. The invention discloses a virtual model three-dimensional gesture control method in an augmented reality environment, which comprises the following steps: Acquiring images of hands of a current frame, and determining position and posture data of key nodes of the hands relative to the AR equipment based on a convolutional neural network hand pose estimation algorithm. The convolutional neural network algorithm is an algorithm which consists of two convolutional neural networks and is used for estimating the pose of the two hands from 2D to 3D. The first convolutional neural network is trained to achieve hand localization, estimating the 2D position of the hand center in the image, and then the localized image of the hand position along with the corresponding input depth values are used to generate a normalized cropped image that is passed to the second convolutional neural network to regress in real-time to the relative 3D hand join