CN-121981714-A - Public opinion monitoring method and system based on multi-mode data fusion

CN121981714ACN 121981714 ACN121981714 ACN 121981714ACN-121981714-A

Abstract

The invention discloses a public opinion monitoring method and system based on multi-mode data fusion, which are used for improving public opinion risk management and control capability and emergency response level in a complex public opinion environment. The method comprises the steps of collecting multi-modal data, carrying out edge preprocessing on the collected multi-modal data to generate structured input data, carrying out feature extraction and projection on the multi-modal data to generate unified event embedded representation, identifying event types based on a public opinion classification model, carrying out risk assessment on the event by combining set parameters, constructing a public opinion knowledge graph, inputting prompt words into a pre-training language model to generate a response strategy, triggering early warning based on a hierarchical risk threshold, and carrying out reinforcement learning fine adjustment and strategy optimization on the pre-training language model based on event feedback and response results.

Inventors

LIU XIAODONG
YE KEWEN
ZHANG BOXI
CAO WEI
LI MINHAO
Liu Mingsha
Jin Zujie
WU ZHUANG

Assignees

中国东方航空集团有限公司
东航技术应用研发中心有限公司

Dates

Publication Date: 20260505
Application Date: 20260122

Claims (10)

1. A public opinion monitoring method based on multi-mode data fusion is characterized by comprising the following steps: preprocessing the acquired multi-mode data to generate structured input data; Performing feature extraction and projection on the multi-mode data to generate unified event embedded representation; identifying event types based on the public opinion classification model, and carrying out risk assessment on the events by combining the set parameters; constructing a public opinion knowledge graph, and generating a response strategy by inputting prompt words into a pre-training language model; And triggering early warning based on the layering risk threshold, and performing reinforcement learning fine adjustment and strategy optimization on the pre-training language model based on the event feedback and response result.
2. The multi-modal data fusion based public opinion monitoring method of claim 1 wherein the preprocessing in the first step includes performing format unification, voice-to-text, image text extraction, and noise removal.
3. The multi-modal data fusion-based public opinion monitoring method according to claim 1, wherein feature extraction and projection in the second step are achieved by using a cross-modal semantic alignment model, the cross-modal semantic alignment model is a model based on a transform structure, the cross-modal semantic alignment model encodes images by using a BLIP-2 image-text multi-modal understanding and generating model, the wav2vec 2.0 self-supervised speech characterization learning and automatic speech recognition extracts audio semantics, and the VideoBERT video-text joint characterization learning model performs video behavior modeling and performs attention cascade projection through a shared semantic space.
4. The public opinion monitoring method based on multi-modal data fusion according to claim 1, wherein the public opinion classification model in the third step is based on TextCNN-BERT model, the set parameters include public opinion emotion index, propagation density index, user influence index, and the risk assessment is the calculation of risk scores and classification of risk grades by gradient boost decision tree algorithm.
5. The multi-modal data fusion-based public opinion monitoring method of claim 4 wherein a risk scoring function is used in the risk assessment: , The risk scoring result is used for classifying the risk grade; Representing the product; Scoring of public opinion emotional intensity The weight coefficient of (2) is used for representing the influence degree of emotion factors in the public opinion event on the overall risk assessment result; Scoring public opinion transmission intensity The weight coefficient of the model (C) is used for representing the influence degree of the public opinion transmission range and the transmission speed factor on the overall risk assessment result; Scoring public opinion subject influence The weight coefficient of (2) is used for representing the influence degree of a key main body participating in public opinion propagation on the overall risk assessment result; Scoring for public opinion burstiness The weight coefficient of the model (2) is used for representing the influence degree of the rapid heating or abnormal growth characteristics of the public opinion event on the overall risk assessment result in a short time; scoring the emotion intensity of the public opinion, which is used for reflecting the emotion polarity and emotion intensity contained in text, voice or video content related to the public opinion event; Scoring public opinion transmission intensity, which is used for reflecting the transmission density, the diffusion range and the transmission speed of public opinion information in unit time; Scoring public opinion subjects influence, which is used for reflecting the influence capacity of key users or subjects participating in public opinion propagation in a network; the public opinion burstiness score is used for reflecting the degree of attention or abnormal increase of the transmission quantity of public opinion events in a short time.
6. The public opinion monitoring method based on multi-modal data fusion of claim 1, wherein, The prompt word in the fourth step is constructed by an event triplet and a scene tag, wherein the event triplet is Subject, behavior, object The method comprises the steps of generating a structured semantic representation in a form, wherein an event triplet is based on structured input data generated in the first step and is automatically extracted and generated by combining event type identification results output in the third step, the main body comprises but is not limited to an airline company, a flight number, an airport or related organization entity, the action comprises event actions including delay, cancellation, complaint, conflict and disposal, the object comprises passengers, facilities, flight services or specific public opinion objects, the scene label is derived from the output results in the third step and comprises event type labels and risk grade labels, the pre-training language model is LLaMA-3 and ChatGLM language model, and the response strategy comprises a multi-version response draft including pacifying text, explanatory notes and suggestion schemes.
7. The multi-modal data fusion-based public opinion monitoring method of claim 1, wherein the fifth step of performing fine-tuning reinforcement learning and policy optimization on the pre-trained language model includes performing fine-tuning reinforcement learning based on a near-end policy optimization algorithm and performing an interpretive analysis by visualizing SHAP contributions of multiple samples or multiple features to achieve policy optimization.
8. A public opinion monitoring system based on multi-mode data fusion is characterized in that the system comprises: The multi-mode data preprocessing module is used for preprocessing the acquired multi-mode data to generate structured input data; The multi-mode semantic fusion module is used for carrying out feature extraction and projection on the multi-mode data to generate unified event embedded representation; The public opinion event recognition and risk assessment module is used for recognizing event types based on a public opinion classification model and carrying out risk assessment on the events by combining set parameters; The response strategy generation module is used for constructing a public opinion knowledge graph and generating a response strategy by inputting a prompt word into the pre-training language model; And the public opinion early warning and closed loop optimization module is used for triggering early warning based on the layering risk threshold value and performing reinforcement learning fine adjustment and strategy optimization on the pre-training language model based on the event feedback and response result.
9. The electronic equipment is characterized by comprising a controller, wherein the controller comprises a processor, a communication interface, a memory and a communication bus, and the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; A processor for implementing the steps of the multi-modal data fusion-based public opinion monitoring method according to any of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the multimodal data fusion based public opinion monitoring method of any of claims 1-7.

Description

Public opinion monitoring method and system based on multi-mode data fusion Technical Field The invention relates to the field of artificial intelligence application, in particular to a public opinion monitoring method and system based on multi-mode data fusion, which can be applied to public opinion monitoring of aviation scenes and is widely applied to public opinion management and emergency response work of units such as airlines, airport groups, civil aviation supervision institutions and the like. Background With the increasing diversification of internet propagation channels, the internet public opinion information presents remarkable multi-modal, multi-platform, fragmentation and emotional polarization characteristics. In the aviation transportation industry, as the flight operation involves a large number of public passengers, once events such as flight delay, service disputes, security polices or equipment faults occur, the events are extremely easy to ferment rapidly on platforms such as microblogs, tremble, notification, news comment areas, short video communities and the like, a large-scale negative public opinion field is formed in a short time, and threat is formed to the reputation of an airline company, the trust of the passengers and the integral stability of a civil aviation system. At present, the mainstream public opinion monitoring method at home and abroad mainly relies on capturing text data from a social platform, and keyword matching, emotion tendency analysis or topic clustering is carried out on the basis. Some advanced systems have introduced emotion classifiers based on deep learning (such as emotion analysis modules based on models of LSTM, BERT, etc.) and event clustering models (such as event discovery mechanisms based on graph neural networks or density clustering) to improve recognition accuracy and response speed to text public opinion. However, these approaches remain limited in nature to single text modality processing frameworks. In particular, while deep learning models have made significant progress in text semantic understanding, their input is typically limited to structured or semi-structured literal content. For the multi-mode public opinion expression forms widely existing in the current social media, such as 'live picture + text description', 'short video record + dubbing description', 'voice complaint + text abstract', and the like, uploaded by users, the existing system lacks effective resolving capability for unstructured mode contents such as images, videos, audios, and the like. On one hand, key visual information (such as a flight display screen, a boarding gate chaotic scene, staff behaviors and the like) in an image, dynamic behavior clues (such as passenger collision processes, ground service operation abnormity and the like) in a video and intonation emotion characteristics (such as anger, anxiety and other non-language acoustic signals) in an audio cannot be directly perceived by a traditional text analysis model, and on the other hand, even if part of systems try to process all modes respectively (such as extracting characters in a picture by using OCR alone and transferring voice by using ASR), all mode results are often processed in isolation and semantic association and alignment mechanisms between cross modes cannot be established. For example, a picture in a video shows a flight cancellation notice, while dubbing states "intentional concealment by the airline", and if only the transcribed text is analyzed, objective evidence provided by the picture may be ignored, resulting in an emotional misjudgment or qualitative deviation of the event. Therefore, under the condition of lacking unified multi-mode semantic modeling capability, the existing public opinion monitoring system is difficult to accurately identify complex scenes such as 'picture-text disagreement', 'audio-picture contradiction' or 'multi-mode collaborative reinforcement negative emotion', and the like, so that the following problems are caused: (1) The event identification is delayed, namely, the early intervention window is missed due to the fact that multisource heterogeneous information cannot be fused, and the perception of the system on the real public opinion event is delayed; (2) On one side, the risk assessment is carried out by only relying on text emotion scores, ignoring high-risk behavioral clues in images/videos or drastic emotions in audios, and causing risk level underestimation; (3) The response strategy is manually formulated and single, the system cannot automatically generate accurate response content based on multi-modal facts, the integration information is required to be manually inserted, and the efficiency is low and the consistency is poor; (4) The disposal chain is easy to break, the analysis result is disjointed with the service response system, an automatic chemical one-shot and closed-loop feedback mechanism is lacked, and quick linkage disposal is diffic