CN-121116058-B - Cross-terminal collaborative social AR interaction method and system

CN121116058BCN 121116058 BCN121116058 BCN 121116058BCN-121116058-B

Abstract

The invention relates to the field of augmented reality and discloses a cross-terminal collaborative social AR interaction method and a cross-terminal collaborative social AR interaction system, wherein the method comprises the following steps that S1, a plurality of user terminals are connected with a cloud rendering server to form an interaction group; the method comprises the steps of S2, uploading positions, postures and operation instructions of user terminals in real time, enabling a cloud rendering server to dynamically update AR views of other user terminals in a group, S3, enabling the cloud rendering server to synchronize AR scene data and align AR views of all the user terminals, S4, enabling the cloud rendering server to trigger and coordinate cooperative tasks according to the user terminal data, and enabling the system to comprise a multi-terminal cooperative module, a cooperative task engine module and an AI interactive module. According to the invention, through introducing a multi-terminal collaboration mechanism and an innovative collaborative task design, the basic transformation of AR navigation from single information to multi-user sharing exploration is realized, and the interactive experience and collaborative pleasure among users are improved.

Inventors

ZENG JUN
GAO NA

Assignees

扬宇光电(深圳)有限公司

Dates

Publication Date: 20260512
Application Date: 20250828

Claims (8)

1. A cross-terminal collaborative social AR interaction method is characterized by comprising the following steps: S1, establishing connection between a plurality of user terminals and a cloud rendering server, so that the plurality of user terminals enter the same interaction group; s2, uploading user positions, postures and operation instructions to the cloud rendering server by the user terminals in real time, and dynamically updating AR views of other user terminals in the interaction group; S3, the cloud rendering server synchronizes AR scene data to all user terminals in the interaction group, and aligns AR views of the plurality of user terminals by utilizing a coordinate conversion algorithm so that the plurality of user terminals display consistent virtual contents in the same physical space; in the step S3, the coordinate transformation algorithm specifically matches the feature points of different user terminals and minimizes the reprojection error, where the reprojection error is calculated by the following formula: ; In the formula, Represent the first Two-dimensional pixel coordinates of the feature points in the image; representing corresponding three-dimensional space coordinates; representing a camera reference matrix; And Respectively representing a rotation matrix and a translation vector of the camera external parameters; The perspective projection function is represented as such, Is a constant value, and is used for the treatment of the skin, A heavy projection error; s4, triggering and coordinating the plurality of user terminals to execute collaborative tasks according to the positions, the scenes and the operation instructions of the plurality of user terminals by the cloud rendering server, and performing interaction among the plurality of users and between the plurality of users and the virtual roles; In step S4, the cooperative task includes: a history scene restoration task, which is used for synchronously triggering AR restoration pictures after the plurality of user terminals scan the designated area and stay for a preset time period, so that the plurality of user terminals display dynamic AR restoration pictures of the history scene and guide the user to interact; The knowledge competition challenge task is to push knowledge questions related to the current scenic spot to the user terminals when the user terminals arrive at the scenic spot core speaking solution point, and count the response conditions of the users to carry out rewarding and scenario triggering; And the virtual character interaction task is to trigger the AI-driven virtual character to appear after the plurality of users actively initiate voice instructions and click virtual tour guide buttons, and conduct voice and gesture interaction with the plurality of users to guide the users to explore and provide information.
2. The method for cross-terminal collaborative social AR interaction according to claim 1, wherein in step S1, the establishing connections between the plurality of user terminals and the cloud rendering server comprises the following steps: s11, the user terminals enter the interaction group through scanning scenic spot identifiers and positioning; and S12, the plurality of user terminals and the cloud rendering server establish connection based on a WebSocket protocol so as to perform data communication.
3. The method according to claim 1, wherein in step S1, the plurality of user terminals enter the same interaction group and are allocated to a logical group through a common identifier and a shared session ID, so as to allow the plurality of user terminals in the group to share and synchronize AR scene data.
4. The method for cross-terminal collaborative social AR interaction according to claim 1, wherein in step S2, the cloud rendering server dynamically updates AR views of other user terminals in the interaction group, specifically comprising the following steps: S21, the cloud rendering server receives the user position obtained and uploaded through the 6-degree-of-freedom positioning of ARKit and ARCore in the user terminal; S22, the cloud rendering server receives the user terminal and obtains the user gesture through a mobile phone gyroscope and a camera gesture angle; and S23, the cloud rendering server renders and pushes AR views of other user terminals in the interactive group in real time according to the received user position, gesture and operation instruction.
5. The cross-terminal collaborative social AR interaction method according to claim 1, wherein in step S4, the interactions between the plurality of users and the virtual character comprise: receiving voice input uploaded by the user terminal through a cloud rendering server, carrying out semantic understanding on the voice input by utilizing a dialogue model trained based on an LSTM neural network, generating a multi-round dialogue response aiming at the user terminal according to the semantic understanding result and current AR scene information, and pushing the multi-round dialogue response to the user terminal for voice playing; Gesture recognition and response, namely receiving gesture images and video data uploaded by the user terminal through a cloud rendering server, performing gesture recognition on the gesture images and the video data by utilizing a mixed model based on a convolutional neural network and a cyclic neural network, acquiring a gesture instruction of the user, determining a moving direction and action response of a virtual character according to the gesture instruction, updating a display state of the virtual character in an AR view, and simultaneously responding to the gesture instruction; And personalized recommendation, namely acquiring historical interaction data of the user through a cloud rendering server, determining the explanation style and the cooperative task difficulty of the virtual character, and carrying out voice interaction of the virtual character and pushing of the cooperative task.
6. The cross-terminal collaborative social AR interaction method according to claim 5, wherein the dialogue model based on LSTM neural network training is a deep learning model with a sequence-to-sequence structure, which is trained by inputting a large-scale spoken language corpus and question-answer pairs, so that the deep learning model can capture long-distance dependency relations in sentences, understand the complex voice instruction intention of a user, identify entity information and perform context association, and further generate multi-round dialogue responses which accord with logic and associate the lithology content of the current scenic spot.
7. The cross-terminal collaborative social AR interaction method according to claim 5, wherein the front end of the hybrid model based on the convolutional neural network and the cyclic neural network adopts the convolutional neural network to extract spatial features of gesture images and video frames, the output of the convolutional neural network is used as the input of the back-end cyclic neural network, the cyclic neural network captures time sequence dynamic information of gesture actions, and simultaneously carries out real-time high-precision recognition and classification on user gestures, and maps recognized gesture instructions to preset moving directions and actions of virtual roles.
8. A cross-terminal collaborative social AR interaction system, according to any one of claims 1-7, characterized in that it comprises: The multi-terminal collaboration module is used for realizing the synchronization of data communication and AR view between a plurality of user terminals and the cloud rendering server; the collaborative task engine module is used for storing collaborative task materials, dynamically triggering and coordinating the plurality of user terminals to execute collaborative tasks according to the positions, the scenes and the behaviors of the plurality of user terminals; the AI interaction module is used for driving the virtual roles in the system, realizing voice dialogue, gesture recognition and response with the plurality of users and performing personalized recommendation; the multi-terminal collaboration module includes: The terminal equipment is used for supporting the mobile phone and the AR glasses and integrating a binocular camera for SLAM positioning and gesture recognition, a microphone for voice input and a loudspeaker for voice output; The cloud rendering server is provided with an AR scene rendering engine and is used for realizing multi-terminal data synchronization through a WebSocket protocol and ensuring that the spatial positions of virtual contents on different devices are consistent; The positioning and synchronizing algorithm unit is used for combining GPS, UWB positioning and visual SLAM technology to avoid view deviation; the collaborative task engine module includes: The task library unit is used for storing historical scene restoration materials, knowledge competition question libraries and virtual tour guide role models; the triggering and executing unit is used for dynamically triggering tasks according to the positions, the scenes and the behaviors of tourists through the rule engine and coordinating the synchronous execution of the multiple terminals; The integration and ranking list unit is used for counting the completion degree and the accuracy of the tourist tasks in the interaction group in real time and generating a dynamic ranking list; the AI interaction module includes: the virtual tour guide driving unit is used for training a dialogue model based on the LSTM neural network, supporting multiple rounds of question and answer and associating the content of the current scenic spot; The gesture recognition and response unit is used for recognizing the gesture of the tourist through the convolutional neural network and the cyclic neural network mixed model and mapping the gesture into the moving direction and the action of the virtual tour guide; and the personalized recommendation unit is used for combining the tourist history interaction data and adjusting the explanation style and task difficulty of the virtual tour guide.

Description

Cross-terminal collaborative social AR interaction method and system Technical Field The invention relates to the technical field of augmented reality, in particular to a cross-terminal collaborative social AR interaction method and system. Background With the continuous development and maturation of the augmented reality technology, the application of the augmented reality technology in the field of cultural tourism is increasingly popular. Currently, many attractions and museums have begun to introduce AR navigation systems in an attempt to provide a novel interactive experience for guests. These systems typically employ a "single person" mode in which a guest scans a predetermined scenic spot identifier through a handheld smart device, such as a cell phone or tablet computer. By this operation, its personal terminal can trigger and display the augmented reality content related to the scenery spot, such as superimposed text commentary, history pictures, three-dimensional models or simple animation effects. However, although the prior art improves the visitor experience to some extent, there are still some shortfalls to be resolved. Firstly, the current mainstream AR navigation schemes tend to focus on independent experience of individual users when providing services, and lack collaboration and interaction capability between different user terminals, so that social connections among tourists are difficult to strengthen through AR experience, and overall experience tends to be monotonous. Secondly, when multiple users share the same virtual scene, virtual contents may not be completely aligned on different devices due to the difference of positioning accuracy of each terminal and the limitation of a synchronization mechanism, so that visual 'split feeling' is caused, and continuity of immersive experience is affected. Finally, the existing AR navigation is mostly remained on the information display level on the presentation of cultural contents, is difficult to deeply mine the inherent charm of historical culture, and lacks vivid story and interactivity, so that culture transfer appears shallower. This relatively passive and isolated mode of experience also makes it difficult for guests to spontaneously generate attractive sharable content after the experience is completed, thus limiting the potential for secondary propagation of attractions through social media. Disclosure of Invention The invention aims to provide a cross-terminal collaborative social AR interaction method and system, which solve the problems of weak social interaction, insufficient alignment precision of multi-terminal views, lack of depth of cultural content presentation and limited propagation value in the existing AR navigation. In order to achieve the above purpose, the invention is realized by the following technical scheme: a cross-terminal collaborative social AR interaction method comprises the following steps: S1, establishing connection between a plurality of user terminals and a cloud rendering server, so that the plurality of user terminals enter the same interaction group; s2, uploading user positions, postures and operation instructions to the cloud rendering server by the plurality of user terminals in real time, and dynamically updating AR views of other user terminals in the interaction group; S3, the cloud rendering server synchronizes AR scene data to all user terminals in the interaction group, and aligns AR views of the plurality of user terminals by utilizing a coordinate conversion algorithm so that the plurality of user terminals display consistent virtual contents in the same physical space; And S4, triggering and coordinating the plurality of user terminals to execute collaborative tasks according to the positions, the scenes and the operation instructions of the plurality of user terminals by the cloud rendering server, and performing interaction among the plurality of users and between the plurality of users and the virtual roles. Preferably, in step S1, the establishing connections between the plurality of user terminals and the cloud rendering server includes the following steps: s11, the user terminals enter the interaction group through scanning scenic spot identifiers and positioning; and S12, the plurality of user terminals and the cloud rendering server establish connection based on a WebSocket protocol so as to perform data communication. Preferably, in step S1, the plurality of user terminals enter the same interaction group and are allocated to a logical group through a common identifier and a shared session ID, so as to allow the plurality of user terminals in the group to share and synchronize AR scene data. Preferably, in step S2, the cloud rendering server dynamically updates AR views of other user terminals in the interaction group, and specifically includes the following steps: S21, the cloud rendering server receives the user position uploaded by the user terminal through the 6-degree-of-free