CN-122018688-A - Multimode human-computer interaction method and system for AI glasses augmented reality environment

CN122018688ACN 122018688 ACN122018688 ACN 122018688ACN-122018688-A

Abstract

The invention relates to the technical field of man-machine interaction, and discloses a multi-mode man-machine interaction method and a system of an AI glasses augmented reality environment, wherein the method is characterized in that interactive input information such as sight position information, hand motion information, voice instruction information and the like of an operator is obtained, and the reliability evaluation is carried out on the interactive input information by combining the parameters related to the AI glasses operation environment and the self state and the working stage information of the operators. Based on the evaluation result, the system can dynamically adjust the weight of each interactive input information when judging the intention of the operator. The method effectively solves the problems that in the prior art, in the long-term high-strength application of the AI glasses, the interaction accuracy and reliability are reduced due to the environment factors and the fine adjustment behaviors generated by operators for compensating the system deviation, and the system is difficult to accurately distinguish the real intention of the user.

Inventors

LI GAOSONG
LI YONGBO
LI DINGWEI

Assignees

深圳市鼎皓达科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. The multi-mode human-computer interaction method of the AI glasses augmented reality environment is characterized by comprising the following steps: acquiring interactive input information of an operator, wherein the interactive input information comprises sight line position information, hand motion information and voice instruction information; Acquiring parameters related to the AI glasses operation environment and the state of the AI glasses, wherein the parameters comprise environment parameters and state information of a motion sensing component; Acquiring working phase information of the operator; Based on the interactive input information, the environment parameters, the state information of the motion sensing component and the working phase information, carrying out reliability evaluation on the interactive input information to obtain a reliability evaluation result of each type of interactive input information; and adjusting the weight of each interactive input information when judging the intention of an operator according to the reliability evaluation result.
2. The multi-modal human-computer interaction method of the AI glasses augmented reality environment according to claim 1, wherein the adjusting the weight of each interaction input information when determining the intention of the operator comprises: Acquiring a time mark of laser pulse generation; Acquiring head posture fine adjustment action characteristics and sight line action characteristics of the operator; Judging a first time correlation between the head gesture fine adjustment action and the laser pulse, identifying whether the head gesture fine adjustment action is a compensation action according to the first time correlation, and identifying whether the head gesture fine adjustment action is a micro-operation instruction; and judging a second time correlation between the sight line action and the laser pulse, identifying whether the sight line action is a compensation action according to the second time correlation, and identifying whether the sight line action is a micro-operation instruction.
3. The multi-modal human-computer interaction method of the AI glasses augmented reality environment according to claim 1, wherein the adjusting the weight of each interaction input information when determining the intention of the operator comprises: Acquiring the head gesture fine adjustment motion characteristics, wherein the head gesture fine adjustment motion characteristics comprise amplitude, speed and duration of a head gesture fine adjustment motion; Acquiring the sight line motion characteristics, wherein the sight line motion characteristics comprise amplitude, speed and duration of sight line motion; acquiring a time mark of the laser pulse generation; Judging a first time interval between the occurrence time of the head posture fine adjustment action and the occurrence time of the laser pulse and the duration time of the head posture fine adjustment action; judging a second time interval between the occurrence time of the sight line action and the occurrence time of the laser pulse and the duration time of the sight line action; identifying whether the head gesture fine adjustment action is a compensation action according to the first time interval, the second time interval and a duration time threshold, and adjusting the weight of the interactive input information related to the head gesture fine adjustment action according to the identification result; Judging whether the duration of the sight line action falls in a preset physiological response window or not according to the duration of the sight line action, judging whether the duration of the sight line action is shorter than the duration threshold, recognizing whether the sight line action is a compensation action or not, and adjusting the weight of interactive input information related to the sight line action according to a recognition result.
4. The multi-modal human-computer interaction method of the AI glasses augmented reality environment according to claim 1, wherein the adjusting the weight of each interaction input information when determining the intention of the operator comprises: acquiring relative position change information between the virtual superimposed layer and a physical world reference point; identifying an instantaneous vision drift event and extracting features of the instantaneous vision drift event; Acquiring head posture fine adjustment action characteristics and sight line action characteristics of an operator; Determining a third temporal association between the head pose fine tuning action and the instantaneous vision drift event, and determining a fourth temporal association between the gaze action and the instantaneous vision drift event; Identifying whether the head gesture fine adjustment action is a compensation action or not and whether the head gesture fine adjustment action is a micro-operation instruction or not according to the judging result of the third time association; Identifying whether the sight line action is a compensation action or not and whether the sight line action is a micro-operation instruction or not according to the judgment result of the fourth time association; And adjusting the weight of the related interactive input information according to the head gesture fine adjustment action and the recognition result of whether the sight line action is the compensation action.
5. The multi-modal human-computer interaction method of the AI glasses augmented reality environment according to claim 1, wherein the adjusting the weight of each interaction input information when determining the intention of the operator comprises: Acquiring head posture fine adjustment action characteristics and sight line action characteristics of the operator; Acquiring various currently existing system deviation information, wherein the various system deviation information comprises continuous spatial dislocation between a virtual superimposed layer and the physical world caused by environmental temperature fluctuation, physical target transient micro-displacement caused by laser pulses and virtual superimposed layer transient visual drift caused by local electromagnetic interference or air flow; acquiring the type, occurrence time, intensity and duration time characteristics of each system deviation information; Carrying out causal correlation analysis on the head gesture fine adjustment action characteristics and each system deviation event in time to obtain a fifth time correlation of the head gesture fine adjustment action and the system deviation, and calculating a first matching score between each head gesture fine adjustment action and each system deviation according to the fifth time correlation; Carrying out causal correlation analysis on the sight line action characteristics and each system deviation event in time to obtain a sixth time correlation of the sight line actions and the system deviations, and calculating a second matching score between each sight line action and each system deviation according to the sixth time correlation; According to the first matching score, carrying out association recognition on the head gesture fine adjustment action and each system deviation, recognizing whether the head gesture fine adjustment action is a compensation action or not and whether the head gesture fine adjustment action is a micro-operation instruction or not, and adjusting the weight of interactive input information related to the head gesture fine adjustment action; and according to the second matching score, carrying out association recognition on the sight line action and each system deviation, recognizing whether the sight line action is a compensation action or not and whether the sight line action is the micro-operation instruction or not, and adjusting the weight of interaction input information related to the sight line action.
6. The method of claim 5, wherein adjusting the weight of the interactive input information associated with the head pose fine tuning action comprises: Acquiring the task priority of the precise physical operation currently being executed by the operator; acquiring a cognitive load state of the current multi-task switching of the operator; Determining the importance degree of the head gesture fine adjustment action in the current task according to the task priority; According to the cognitive load state, evaluating the control precision of the head posture fine adjustment action by the operator; and adjusting the weight of the head posture fine adjustment action according to the importance degree and the control precision.
7. The method for multi-modal human-machine interaction in an AI glasses augmented reality environment according to claim 6, wherein the obtaining the cognitive load state of the currently ongoing multi-task switch of the operator comprises: Acquiring physiological signals of the operator, wherein the physiological signals comprise heart rate variability data, eye movement data and brain wave data; Acquiring task identifiers of a plurality of tasks currently being executed by the operator; According to the task identification, extracting cognitive load characteristic parameters of each task from a preset task characteristic database, wherein the cognitive load characteristic parameters comprise the complexity degree of the task, the emergency degree of the task and the interaction mode of the task; according to the physiological signals and the cognitive load characteristic parameters, carrying out preliminary evaluation on the cognitive load state of the operator to obtain a preliminary evaluation result; Judging whether the operator is performing task switching; When judging that the operator is performing task switching, adjusting the weight of the preliminary evaluation result according to the occurrence time of task switching.
8. The method of claim 7, wherein the acquiring physiological signals of the operator, the physiological signals including heart rate variability data, eye movement data, and brain wave data, comprises: continuously monitoring signal quality parameters of each physiological signal acquisition channel, wherein the signal quality parameters comprise signal-to-noise ratio, baseline drift degree and signal integrity of the signals; when the signal quality parameter of any physiological signal acquisition channel is lower than a preset quality threshold, identifying an abnormal channel; triggering the physiological signal of the abnormal channel to carry out real-time compensation processing when the signal is identified to be abnormal; According to the signal abnormality type and degree of the abnormal channel, adjusting the weight of the physiological signal of the abnormal channel in cognitive load assessment; when a plurality of physiological signal acquisition channels have abnormal signals, the data of other physiological signal channels with better signal quality are preferentially used for cognitive load evaluation; when all physiological signal acquisition channels have abnormal signals and cannot be effectively compensated, a prompt for replacing or checking the sensor is sent to an operator, and the cognitive load evaluation result is temporarily frozen until the signal quality is recovered to be normal.
9. The multi-modal human-computer interaction method of the AI glasses augmented reality environment according to claim 8, wherein when a signal abnormality is identified, triggering the real-time compensation processing of the physiological signal of the abnormal channel comprises: acquiring physiological signals of the abnormal channel; performing spectrum analysis on the physiological signal of the abnormal channel to obtain spectrum characteristics of the physiological signal; Identifying high-frequency interference components in the physiological signal according to the frequency spectrum characteristics of the physiological signal, Analyzing the frequency characteristics of the high-frequency interference component; According to the frequency characteristics of the high-frequency interference components, the type and the working mode of an interference source are matched; according to the type and the working mode of the interference source, the center frequency and the bandwidth parameters of the adaptive filter are adjusted; removing dynamically changing interference noise in the physiological signal according to the adjusted self-adaptive filter; Comparing the waveform characteristics of the physiological signals after the dynamically-changing interference noise is removed, and obtaining a comparison result of the waveform characteristics; and adjusting the amplitude or phase of the physiological signal according to the comparison result of the waveform characteristics.
10. A multi-modal human-machine interaction system for AI glasses augmented reality environment, comprising: The input end is used for acquiring interactive input information of an operator, wherein the interactive input information comprises sight position information, hand motion information and voice instruction information; acquiring parameters related to the AI glasses running environment and the state of the AI glasses, wherein the parameters comprise environment parameters and state information of a motion sensing part; The evaluation end is used for carrying out reliability evaluation on the interactive input information based on the interactive input information, the environment parameters, the state information of the motion perception component and the working phase information to obtain a reliability evaluation result of each type of interactive input information; and the adjusting end is used for adjusting the weight of each interactive input information when judging the intention of the operator according to the reliability evaluation result.

Description

Multimode human-computer interaction method and system for AI glasses augmented reality environment Technical Field The invention relates to the technical field of man-machine interaction, in particular to a multi-mode man-machine interaction method and system for an augmented reality environment of AI glasses. Background The AI glasses are used as a system for realizing multi-mode man-machine interaction in an augmented reality environment, and are characterized in that three-dimensional space interaction corresponding relations are established through integrating eye tracking, gesture recognition and voice semantic understanding, so that the natural operation of an augmented reality interface is realized. The technology has remarkable application value in scenes requiring high-precision operation such as industrial manufacturing, medical operation and the like, and can effectively solve the defects of the traditional single-point interaction equipment in information transmission efficiency and operation complexity. However, in practical long-term high-strength applications, the performance of AI eyeglasses may be subtly affected by environmental factors and result in reduced interaction accuracy and reliability. For example, in an ultra clean shop for semiconductor manufacturing, technicians wear AI glasses to perform optical alignment calibration of high-precision lithography machines. Under ideal conditions, the AI glasses can accurately superimpose the virtual alignment cross line and the graduated scale, and realize efficient and accurate adjustment through voice instructions and gesture operation. However, during long operation, small temperature fluctuations of the shop air conditioning system may lead to small changes in the local ambient temperature around the lithography machine, which in turn may cause extremely weak thermal expansion and contraction of the physical components. Such constant and subtle ambient temperature changes, as well as natural fine-tuning of the technician's head pose, can cause the motion sensing components inside the AI glasses to accumulate small deviations, creating spatial misalignment in the augmented reality interface that is imperceptible to the naked eye between the virtual alignment reticle and the actual alignment marks on the physical optical elements. This virtual-physical space misalignment causes the eye tracking system and gesture recognition system of the AI glasses to deviate from the judgment of the user's intent. Even if the technician's eyes are accurately looking at the physical calibration points, the system may misjudge its intent due to erroneous knowledge of the virtual target location. Also, feedback of gesture operations may become delayed and inaccurate. In the face of such subtle visual deviations and uncertainties in interaction, the technician would have had the local attempt to "correct" the virtual overlay by slightly adjusting his head pose or gaze angle to realign it with the physical target. However, the natural head movements of the user to compensate for the system deviation further interfere with the head gesture tracking system of the AI glasses, making it difficult for the system to accurately distinguish the real intention of the user, sometimes misjudging these fine head adjustments as instructions that the user wants to pan or zoom the interface, resulting in the virtual interface jumping or flickering carelessly, further dispersing the attention of the technician, and exacerbating the confusion of the operation. Under the complex situations that fine deviation exists in the spatial correspondence and various interaction inputs become ambiguous, the integration judgment mechanism of various interaction modes of the AI glasses encounters a difficult problem. The system cannot effectively and uniformly judge the input information with uncertainty, and cannot judge which input source is more reliable at present, so that the real intention of a technician cannot be accurately understood. As a result, the system may fail to execute instructions or perform operations that are not intended by the user in error, severely threatening the success of the optical alignment task in semiconductor manufacturing. In view of the above, there is a need in the art for improvements. Disclosure of Invention The invention provides a multimode man-machine interaction method and a multimode man-machine interaction system for an augmented reality environment of AI (advanced technology) glasses, and aims to solve the problems that in long-term high-strength application of the AI glasses, interaction accuracy and reliability are reduced due to environmental factors and user compensation behaviors, and the real intention of a user is difficult to accurately distinguish by the system. The technical scheme of the application is as follows: in a first aspect, the application discloses a multimode man-machine interaction method of an augmented reality environm