Search

CN-121259052-B - Interactive spherical screen film viewing system and method based on AI multi-mode data processing

CN121259052BCN 121259052 BCN121259052 BCN 121259052BCN-121259052-B

Abstract

The invention discloses an interactive spherical screen video viewing system and method based on AI multi-mode data processing, which relate to the technical field of data processing, and comprise a background motion isolation module, wherein spherical screen video stream signals are analyzed and processed through an optical stream estimation process; the system comprises an interaction window period dynamic adjustment module for dynamically adjusting a response state time window of a system, a modal source tracking module for dynamically evaluating the credibility of each modal signal source, a spherical screen change trend prediction module for fusing a convolutional neural network CNN and a long and short memory network LSTM to predict the future disturbance trend of a spherical screen image background, a modal anti-interference filtering module for generating an anti-interference sample based on a diffusion countermeasure generation model training mode, and a signal fusion module for integrating processing results from each module.

Inventors

  • LI FUMIN

Assignees

  • 深圳市欻与无影文化产业发展有限公司

Dates

Publication Date
20260512
Application Date
20251009

Claims (10)

  1. 1. An interactive spherical screen viewing system based on AI multi-modal data processing, comprising: The background motion isolation module analyzes and processes the spherical screen video stream signal through an optical stream estimation process, extracts the space-time variation characteristic of the current frame image, and calculates the background disturbance confidence coefficient by combining a background modeling process; The interactive window period dynamic adjustment module is used for dynamically adjusting the response state time window of the system according to the time sequence cooperativity of the limb action input signal and the voice input signal of the user; The system comprises a modal source tracking module, a multi-head attention mechanism and a multi-head attention mechanism, wherein the modal source tracking module is based on a transducer neural network architecture, performs feature vector correlation modeling on modal signals from different input channels, and dynamically evaluates the credibility of each modal signal source; The balloon change trend prediction module fuses a convolutional neural network CNN and a long and short memory network LSTM to predict the future disturbance trend of the balloon image background; The modal anti-interference filtering module generates an anti-interference sample based on a diffusion antagonism generation model training mode, and optimizes robustness parameters of a filter function through a neural network; And the signal fusion module integrates the processing results from the modules and comprehensively evaluates the background disturbance intensity, the reliability of the modal signal source and the state of the interaction window.
  2. 2. The interactive spherical screen viewing system based on AI multi-modal data processing as set forth in claim 1, wherein the background motion isolation module operates as follows: The optical flow estimation process is used for extracting the motion vector information of each pixel point in the current image frame; the background modeling process calculates a motion vector difference value according to the expected position of the predicted pixel, obtains disturbance intensity, and further generates a background disturbance confidence value; the confidence value is used as one of the basis for judging the interference intensity by the signal fusion module.
  3. 3. The interactive spherical screen viewing system based on AI multi-modal data processing as set forth in claim 1, wherein the interactive window period dynamic adjustment module works as follows: collecting time sequence characteristics of motion and voice input signals in a plurality of sampling periods, and judging whether conditions for entering an interactive window are met or not through comparison with a motion probability threshold alpha and a voice energy threshold beta; If the condition is satisfied, the system will remain in the interactive response state until an end of action signal is detected.
  4. 4. The interactive spherical screen viewing system based on AI multimode data processing of claim 1, wherein the modal source tracking module works as follows: Modeling feature vector association is carried out on modal signals of different moments and different sources, and signal source reliability scores are output based on attention weights; the score is used as a reference for weighting by the signal fusion module to optimize the final response output.
  5. 5. The interactive spherical screen viewing system based on AI multi-mode data processing of claim 1, wherein the spherical screen change trend prediction module works as follows: Extracting the characteristics of the past n frames of images in a sliding window mode, and combining the extracted characteristics by fusing a convolutional neural network and a long-short-time memory network; the convolutional neural network is used for extracting the spatial characteristics of a single frame image; The long-short-time memory network is used for modeling the time change rule of the image sequence; Finally, the predicted output is mapped into a future disturbance probability value, and the probability value is used for judging whether the system enters an interference shielding state or not.
  6. 6. The interactive spherical screen viewing system based on AI multi-mode data processing of claim 1, wherein the mode anti-interference filtering module works as follows: inputting the acquired user input modal signals to a filter realized by a neural network, and introducing an interference sample set generated by a diffusion generation model; The system trains the parameters of the filter function by minimizing the weighted sum of normal sample loss and interference sample loss.
  7. 7. The interactive spherical screen viewing system based on AI multi-modal data processing as set forth in claim 6, wherein the signal fusion module operates as follows: Taking the interaction state as a gating variable, comprehensively evaluating the background disturbance intensity, the reliability of a modal signal source, the signal-to-noise ratio of a filter function and the disturbance prediction probability; if the parameters indicate that the background is stable, the signal reliability is high and the signal to noise ratio is good, the module performs weighted summation on the mode signals to generate a fusion expression value, and accordingly whether a response instruction is sent or not is judged.
  8. 8. The interactive spherical screen viewing system based on AI multi-modal data processing of claim 7, wherein the system executes corresponding modal response instructions only if the fusion expression value exceeds a system set threshold; if the system set threshold is not reached, the system maintains the non-interactive silence state.
  9. 9. The interactive spherical screen viewing system based on AI multi-modal data processing as set forth in claim 1, further comprising a modal awareness module including an array of motion sensors, a directional speech receiving array, and a gaze tracking unit aligned by a time synchronization mechanism for synchronously collecting motion input signals, speech input signals, and visual attention information of a user.
  10. 10. An interactive spherical screen viewing method for an AI-based multi-modal data processing interactive spherical screen viewing system according to any one of claims 1-9, comprising the steps of: Acquiring action input signals, voice input signals and visual attention information of a user, and realizing synchronous acquisition by utilizing an action sensor array, a directional voice receiving array and a gaze track tracking unit in a modal sensing module; Performing optical flow estimation and background modeling on the spherical screen video stream through a background motion isolation module, extracting the space-time variation characteristic of the current image and calculating the background disturbance confidence coefficient; Based on the limb actions and the voice input signals of the user, dynamically adjusting the response time window of the interactive window through an interactive window period dynamic adjustment module according to the time sequence characteristics of the action input signals and the voice input signals; Carrying out feature vector association modeling on input signals of different modes by utilizing a mode source tracking module, dynamically evaluating the credibility of each mode signal source, and providing a basis for signal fusion; The method comprises the steps of predicting the future disturbance trend of a spherical screen image background by a spherical screen change trend prediction module through a convolutional neural network and a long-short-time memory network, and outputting a future disturbance probability value; Generating an anti-interference sample set through a modal anti-interference filtering module, optimizing robustness parameters of a filter by utilizing a neural network, and filtering an interference signal; And integrating the processing results of the modules through a signal fusion module, comprehensively evaluating the background disturbance intensity, the reliability of the modal signal source, the interactive window state, the signal-to-noise ratio and the disturbance prediction probability, and generating a fusion expression value.

Description

Interactive spherical screen film viewing system and method based on AI multi-mode data processing Technical Field The invention belongs to the technical field of data processing, and particularly relates to an interactive spherical screen film viewing system and method based on AI multi-mode data processing. Background The spherical screen film viewing system is an advanced immersive multimedia display platform, is generally applied to environments such as large cinema, museums and education places, can provide 360-degree dead-angle-free visual experience through the omnibearing spherical screen display content, and further enhances the immersion and participation of audiences. In a dome environment, the system often creates confusion at two levels due to the high degree of overlap between the projected content (e.g., explosion, transition, flash) and the user's real motion/voice interactions: The mode drift false triggers, the system regards the projection vision variation as a user behavior signal, and switches modes (such as from 'gaze perception' to 'voice input'). The system judges the false triggering instruction in real interaction in the face of inconsistent multi-mode input (such as user voice 'fast forward' +fuzzy gesture). Disclosure of Invention The invention aims at overcoming the defects of the prior art, and provides an interactive spherical screen viewing system and method based on AI multi-mode data processing, which solve the technical problems in the background art. Aiming at the technical problems, the invention provides an interactive spherical screen viewing system based on AI multi-mode data processing, which comprises the following steps: The background motion isolation module analyzes and processes the spherical screen video stream signal through an optical stream estimation process, extracts the space-time variation characteristic of the current frame image, and calculates the background disturbance confidence coefficient by combining a background modeling process; The interactive window period dynamic adjustment module is used for dynamically adjusting the response state time window of the system according to the time sequence cooperativity of the limb action input signal and the voice input signal of the user; The system comprises a modal source tracking module, a multi-head attention mechanism and a multi-head attention mechanism, wherein the modal source tracking module is based on a transducer neural network architecture, performs feature vector correlation modeling on modal signals from different input channels, and dynamically evaluates the credibility of each modal signal source; The balloon change trend prediction module fuses a convolutional neural network CNN and a long and short memory network LSTM to predict the future disturbance trend of the balloon image background; The modal anti-interference filtering module generates an anti-interference sample based on a diffusion antagonism generation model training mode, and optimizes robustness parameters of a filter function through a neural network; And the signal fusion module integrates the processing results from the modules and comprehensively evaluates the background disturbance intensity, the reliability of the modal signal source and the state of the interaction window. The invention also provides an interactive spherical screen film watching method, which comprises the following steps of collecting action input signals, voice input signals and visual attention information of a user, and realizing synchronous collection by utilizing an action sensor array, a directional voice receiving array and a gaze track tracking unit in a mode sensing module; Performing optical flow estimation and background modeling on the spherical screen video stream through a background motion isolation module, extracting the space-time variation characteristic of the current image and calculating the background disturbance confidence coefficient; Based on the limb actions and the voice input signals of the user, dynamically adjusting the response time window of the interactive window through an interactive window period dynamic adjustment module according to the time sequence characteristics of the action input signals and the voice input signals; Carrying out feature vector association modeling on input signals of different modes by utilizing a mode source tracking module, dynamically evaluating the credibility of each mode signal source, and providing a basis for signal fusion; The method comprises the steps of predicting the future disturbance trend of a spherical screen image background by a spherical screen change trend prediction module through a convolutional neural network and a long-short-time memory network, and outputting a future disturbance probability value; Generating an anti-interference sample set through a modal anti-interference filtering module, optimizing robustness parameters of a filter by utilizing a neural network,