CN-122002086-A - Audiovisual language audience reaction evaluation system and method based on multi-view synchronous perception and data fusion

CN122002086ACN 122002086 ACN122002086 ACN 122002086ACN-122002086-A

Abstract

The invention relates to an audiovisual language audience reaction evaluation system and method based on multi-view synchronous perception and data fusion. The system comprises an adjustable visual angle viewing platform, an audiovisual content playing module, a synchronous perception acquisition module, a subjective feedback acquisition module, a multi-mode data fusion analysis module and an evaluation result output module, wherein eye movement data, physiological signals and behavioral responses of an audience can be synchronously acquired in real time under different viewing angles, and the perception intensity, the concentration degree of attention, the cognitive load and the emotion titer of the audience are comprehensively evaluated by combining subjective feedback information. The system can be widely applied to the fields of audio-visual language content optimization, immersive medium design and audience experience research, and has the advantages of flexible visual angle switching, high data fusion degree, high analysis precision and the like.

Inventors

HAN YANAN
ZHANG YAN
ZHANG LINGLIN

Assignees

辽宁大学

Dates

Publication Date: 20260508
Application Date: 20260227

Claims (9)

1. The audio-visual language audience reaction evaluation system based on multi-view synchronous perception and data fusion is characterized by comprising an information input part, an information acquisition part, an analysis evaluation part and an optimization suggestion part; The information input part comprises an adjustable watching platform and an audiovisual content playing module; The information acquisition part comprises a synchronous perception acquisition module and a subjective feedback acquisition module, wherein the synchronous perception acquisition module consists of an eye movement tracking sub-module, a synchronous mark recorder and a physiological signal acquisition sub-module; the analysis and evaluation part comprises a multi-mode data fusion and analysis module and an evaluation result output module; the optimization suggestion part comprises an adaptive audio-visual language optimization module.
2. An audio-visual language audience response assessment system based on multi-view synchronous perception and data fusion as claimed in claim 1, wherein: The adjustable viewing platform comprises an angle-adjustable viewing angle adjusting bracket, an inclination angle sensor module and an inclination screen mounting interface, and is used for realizing angle switching through electric driving or manual control and simulating various viewing angles and various viewing environments in a real scene; The audio-visual content playing module comprises a content management unit, a synchronous control unit and a playing control interface, and is used for displaying standardized or experimental audio-visual language materials, wherein the content comprises controllable picture composition, a clip structure and sound design parameters, and supports the importing, editing, labeling and timing playing of video clips.
3. An audio-visual language audience response assessment system based on multi-view synchronous perception and data fusion as claimed in claim 1, wherein: The synchronous perception acquisition module is used for collecting physiological, behavioral and cognitive data of an audience under different viewing angles in real time, The eye movement tracking sub-module is based on an infrared or video eye movement instrument and records the gaze point, the gaze duration, the glance path and the pupil diameter index in real time, and comprises an infrared light source component, a high-speed camera and an implementation coordinate calculation unit; The physiological signal acquisition submodule is used for monitoring physiological changes of an audience in the watching process and comprises a heart rate sensor, a skin electric sensor and a myocomputer store acquisition interface.
4. The audio-visual language audience response evaluation system based on multi-view synchronous perception and data fusion according to claim 1, wherein the subjective feedback acquisition module is used for acquiring subjective evaluation information of audience on audio-visual contents, comprises a questionnaire evaluation interface, an input interface and a data preprocessing unit, wherein evaluation dimensions comprise visual clarity, plot understanding, rhythm comfort, immersion experience and emotion touch indexes, and automatically records time stamps.
5. The audio-visual language audience response evaluation system based on multi-view synchronous perception and data fusion according to claim 1 is characterized in that the multi-mode data fusion and analysis module is used for carrying out synchronous and standardized processing on collected multi-source data, constructing a unified time axis, extracting core characteristic indexes, carrying out statistical modeling and association analysis on the data, supporting cross analysis on eye movement data, physiological signals, behavior characteristics and subjective scores, and establishing mapping relations between different viewing views and audience perception responses, and comprises a multi-mode data integration engine, a characteristic extraction unit, a model analysis unit and a visual display module.
6. The audio-visual language audience reaction evaluation system based on multi-view synchronous perception and data fusion according to claim 1 is characterized in that an evaluation result output module converts a fusion analysis result into a visual chart and a quantitative report based on a preset evaluation index system, the output content comprises a attention heat chart, a cognitive load curve, an emotion state distribution chart and an understanding degree scoring chart, the output content is derived into a PDF, CSV or JSON format, and the evaluation result output unit comprises a content understanding degree scoring unit, an emotion reaction evaluation unit and an output interface.
7. The system for evaluating audience reaction of an audio-visual language based on multi-view synchronous perception and data fusion of claim 1, wherein the adaptive audio-visual language optimization module is used for providing adjustment suggestions for picture composition, clip density, rhythm design and sound layout according to the performance difference of audiences under different angles on the basis of an evaluation result and supporting automatic content adjustment and recommendation, and comprises a composition optimization suggestion unit, a clip rhythm suggestion unit and a sound and picture synchronous adjustment module.
8. An evaluation method of an audio-visual language audience reaction evaluation system based on multi-view synchronous perception and data fusion according to any one of claims 1-7, characterized by comprising the following steps: Step 1), setting experimental conditions, configuring a required viewing angle, and adjusting a viewing platform to a corresponding angle; step 2) loading the audio-visual content to be evaluated to an audio-visual content playing module, starting a playing process, and simultaneously activating a synchronous perception acquisition module; step 3) during the playing of audiovisual content, the eye movement tracking submodule records the gazing behaviors of audiences in real time, and the physiological acquisition submodule continuously acquires heart rate, skin electricity and brain electricity indexes; step 4) after the content is played, the audience completes subjective evaluation through a subjective feedback acquisition module, and the system numbers, aligns and preprocesses all data; Step 5) the multi-mode data fusion and analysis module performs feature extraction, time sequence analysis and statistical modeling on various data and outputs key indexes, wherein the key indexes comprise average fixation concentration rate, maximum cognitive load peak value, heart rate change interval, skin level average value, alpha/beta wave frequency distribution, subjective understanding score and emotion valence score; step 6) analyzing physiological and cognitive reaction differences of audiences under different visual angles to construct a visual angle-perception relation model; and 7) outputting a complete audience response evaluation report by the evaluation result output module, wherein the complete audience response evaluation report comprises a visual chart, a characteristic curve and a decision support suggestion, and pushing the visual chart, the characteristic curve and the decision support suggestion to a researcher or a content making system through a platform.
9. The method of claim 8, wherein the method supports a vector machine (SVM), convolutional Neural Network (CNN), time series clustering, principal Component Analysis (PCA), random forest machine learning method for classifying, predicting or associative modeling of audience response data under visual angle variables.

Description

Audiovisual language audience reaction evaluation system and method based on multi-view synchronous perception and data fusion Technical Field The invention relates to the technical field of intersection of audio-visual propagation, man-machine interaction, experimental psychology, physiological signal detection, multi-mode data fusion analysis and the like, in particular to an audio-visual language audience reaction evaluation system and method based on multi-view synchronous perception and data fusion. The system is widely applicable to application scenes such as audience research of audiovisual contents, user experience optimization, media content creation feedback, virtual reality environment evaluation, intelligent display equipment adaptation and the like, and belongs to a comprehensive application system of multi-view media perception and effect measurement technology. Background With diversification of medium environments and diversification of terminal devices, viewing modes of audiovisual contents are significantly changing. The traditional front horizontal viewing scene mainly based on televisions or computers is gradually replaced by the oblique or nonstandard viewing angles such as the viewing of a vehicle-mounted entertainment system, a subway advertisement screen and mobile equipment in a lying way. In this context, how to understand the influence of different viewing angles on the cognitive psychological mechanisms such as the audience attention, perception, understanding and emotion response becomes a research hotspot in the fields of transmissibility, psychology and interactive design. Currently, many studies on audience reactions focus on single view angles or ideal viewing environments to explore the acceptance of certain information by the audience, such as social media marketing information, barrier-free information, interactive movies, etc., and the methods employed typically include subjective questionnaires, interviews, basic behavioral observations, etc. Although these methods reflect subjective feelings of the audience to some extent, there are the following disadvantages: (1) The prior study lacks systematic measurement and comparison of audience reaction under different physical viewing angles such as horizontal and inclined, and is difficult to truly restore diversified viewing environments. (2) The sensing data acquisition dimension is single, subjective evaluation is taken as the main, and the support of objective data such as eye movement tracking, physiological signal monitoring, facial expression recognition and the like is lacked, so that the evaluation result is strong in subjectivity and insufficient in credibility. (3) And the data acquisition and analysis flow is split, namely even if a part of researches introduce a plurality of sensor devices, the multi-mode data fusion analysis under a unified time axis is often lacking, so that analysis results are scattered, and a complete audience response model is difficult to form. (4) The prior art is difficult to directly feed back audience response data to an audiovisual content creator, so that optimization of elements such as composition, clipping, sound and the like is guided, and a practical application-oriented landing path is lacked. In addition, aiming at novel media forms such as virtual reality, immersive interactive video and the like, no mature system supports quantitative research and optimization suggestion of 'non-standard visual angle viewing experience'. Therefore, a systematic solution which can cover multiple viewing angles, synchronously collect multi-mode physiological and behavioral data, integrate subjective and objective evaluation and effectively output evaluation results is needed to meet the dual requirements of deep understanding of audience response and content optimization support in a new media environment. Disclosure of Invention In order to solve the technical problems, the invention provides an audio-visual language audience reaction evaluation system and method based on multi-view synchronous perception and data fusion. The invention is realized by the following technical scheme that the audio-visual language audience reaction evaluation system based on multi-view synchronous perception and data fusion comprises an information input part, an information acquisition part, an analysis evaluation part and an optimization suggestion part; The information input part comprises an adjustable watching platform and an audiovisual content playing module; The information acquisition part comprises a synchronous perception acquisition module and a subjective feedback acquisition module, wherein the synchronous perception acquisition module consists of an eye movement tracking sub-module, a synchronous mark recorder and a physiological signal acquisition sub-module; the analysis and evaluation part comprises a multi-mode data fusion and analysis module and an evaluation result output module; the optimiz