CN-122019721-A - AI-based multi-mode financial question-answering system and method
Abstract
The invention discloses an AI-based multi-mode financial question-answering system and method, which relate to the technical field of financial question-answering data processing and comprise a multi-mode input analysis module, a question-answering analysis module, an emotion perception module, a question-answering dynamic evaluation module and a financial dynamic decision module; according to the invention, through analyzing the voice emotion characteristics and text emotion keyword density, emotion change of a user can be perceived in real time, in a financial question-answering scene, the emotion of the user possibly reflects important information such as anxiety on financial conditions, satisfaction degree on solutions and the like, task urgency assessment is carried out on question-answering dynamics, assessment strategies are adjusted in real time according to user behaviors and question characteristics based on various dynamic assessment indexes, requirements of different users are better adapted, financial decision analysis is carried out based on task assessment indexes and emotion early warning indexes and by combining user identity authority, task urgency degree, user emotion and authority of the user in financial decision are considered, and limitation of single factor decision is avoided.
Inventors
- CUI SHANGYONG
- JING JIELIN
Assignees
- 崔尚勇
- 井洁琳
Dates
- Publication Date
- 20260512
- Application Date
- 20260130
Claims (8)
- 1. The multi-mode financial question-answering system based on the AI is characterized by comprising a multi-mode input analysis module, a question-answering analysis module, an emotion perception module, a question-answering dynamic evaluation module and a financial dynamic decision module; The multi-mode input analysis module is used for carrying out multi-mode data feature extraction and analysis on the target question-answer data, and comprises the steps of extracting voice emotion features, text keywords, context logic association features and picture recognition results of the target question-answer data; The question-answering analysis module is used for carrying out deep semantic analysis and question type recognition on the target question-answering data, carrying out semantic type recognition according to text keywords and context logic key features so as to determine the question type of the question-answering target data, carrying out verification on the question type according to the relevance by analyzing the relevance among questions through the context logic relevance features, and carrying out key parameter extraction verification by combining picture recognition features so as to generate the target question data type; The emotion perception module is used for constructing a user emotion state time sequence model based on deep learning, calculating an emotion stability index by analyzing intensity fluctuation of voice emotion characteristics and density of text emotion keywords, and generating an emotion early warning index when the stability index is detected to be lower than a critical value; The question-answer dynamic evaluation module is used for acquiring target question-answer data to perform question-answer operation analysis, taking the historical average waiting time of a user, the question times in unit time and a keyword library as dynamic evaluation targets, performing task urgency evaluation on the question-answer dynamics, and generating a task evaluation index; The financial dynamic decision module is used for carrying out financial decision analysis by combining the user identity authority based on the task evaluation index and the emotion early warning index, carrying out action selection in an answer candidate set of a financial knowledge base, and generating financial answers and interaction strategies based on data optimization analysis index dynamics; the multi-mode data feature extraction and analysis are carried out on the target question-answer data, and the specific process is as follows: Acquiring voice fundamental frequency through an audio framing processing technology, identifying based on an audio standard emotion fundamental frequency threshold value, and marking abnormal turning points of the voice fundamental frequency to form voice emotion feature vectors; Deep semantic analysis is carried out on the text content, financial professional terms, business scene keywords and emotion polarity words are extracted through entity recognition technology, and text keyword feature vectors are generated; carrying out causal relation analysis based on text content, and carrying out causal relation, parallel relation and conditional relation analysis and comparison according to a problem logic chain map to obtain context logic association characteristics; and (3) carrying out character recognition and structured extraction on the text region by positioning the table region, the seal mark and the handwriting annotation key element in the image to generate the table structural feature, the numerical distribution mode and the key data mark, thereby obtaining the picture recognition result.
- 2. The AI-based multimodal financial question-answering system of claim 1, wherein semantic type recognition is performed based on text keywords and contextual logic key features to determine question-answering target data question types, comprising the following steps: Acquiring a knowledge base in the financial field, defining a problem type label comprising a fact type problem, an analysis type problem and a cause type problem, and forming a classification standard; fusing the text keyword feature vector and the context logic association feature vector, and combining the voice emotion feature and the image recognition result to form a multi-mode feature fusion vector; Comparing and identifying the multi-mode feature fusion vector with the question type of the question type label, and calculating the semantic similarity of the questions and the standard question library to obtain an initial identification type; verifying an initial recognition type result of the problem type, and combining the context logic characteristics, verifying that the recognition result of the problem type is consistent with the whole text semantic logic, generating visual keyword distribution and problem type distribution, and generating an interpretable analysis report; And extracting structural parameters from the picture, checking the range of keyword problems and picture contents in the problems, and finally, labeling and verifying reports of the types of the problems, wherein the characteristics of the main types and the sub types are included.
- 3. The AI-based multimodal financial question-answering system of claim 1, wherein the constructing of the deep learning-based user emotion state timing model is performed as follows: S300, carrying out emotion analysis based on voice emotion characteristics, taking a sliding window as a unit, averaging emotion probabilities in the window, generating a smooth voice emotion intensity vector which is arranged according to time sequence, defining a time window, and calculating the ratio of the number of negative emotion keywords to the total word number in the window to be used as the text negative emotion density; S301, establishing a unified time axis, and aligning a voice emotion intensity vector and text emotion density by using a time stamp to form synchronous multi-mode emotion time sequence data; S302, using the aligned voice emotion intensity vector and text emotion density as input sample data to construct an emotion state time sequence model; And in a short time window, calculating the standard deviation of the instantaneous emotion intensity as a measurement index of short-term fluctuation, combining the number of the text negative emotion densities to obtain a stability index, and taking the obtained stability index as the output dimension of the constructed emotion state time sequence model.
- 4. The AI-based multimodal financial question-answering system according to claim 1, wherein when a stability index is detected to be below a threshold, an emotion warning index is generated by: S400, setting a reasonable stability index critical value which is marked as T according to emotion data analysis and actual scene requirements, wherein the stability index critical value is a threshold value for judging the emotion stability degree of a user; s401, calculating an emotion early warning index according to the difference degree between the emotion stability index and the critical value and combining the early warning value, wherein the emotion early warning index=T is 0.5-S is 0.5; S402, comparing the calculated emotion stability index S with a set stability index critical value, judging that the emotion of the user is extremely unstable when S is smaller than T, adding a risk value R, judging that the emotion of the user is in a relatively stable state if S is larger than or equal to T, and carrying out question-answering continuous inquiry.
- 5. The AI-based multimodal financial question-answering system according to claim 1, wherein task urgency assessment is performed on question-answering dynamics to generate task assessment indices, the specific process is as follows: Extracting time length data from the time when the user initiates the question and answer request to the time when the user obtains the response, calculating historical average waiting time of the user, obtaining the number of times of the user initiating the question in a time period, calculating the number of times of the question in unit time, And carrying out keyword matching on the content of the current question of the user. The keywords in the question can be extracted by using a natural language processing technology and compared with a keyword library to obtain matched keywords; According to the historical average waiting time of the user, the questioning times in unit time and the matched keyword quantity, carrying out dynamic evaluation calculation to obtain a task evaluation index, wherein the task evaluation index=P0.4+T0.4; the tasks are classified into different emergency grades according to the magnitude of the task emergency degree index.
- 6. The AI-based multimodal financial question-answering system of claim 1, wherein the data-based optimization analysis index dynamic, generated financial answers and interaction strategies are as follows: Based on the task evaluation index and the emotion early warning index, carrying out financial decision by combining with the user identity authority, carrying out financial authority analysis and limitation, and acquiring the questions of the target question-answering data according to a financial index library to obtain a multi-granularity answer candidate set; calculating the semantic matching degree of each answer and the user question, selecting the answer with the first matching degree as a financial answer, and dynamically determining the data optimization analysis index as a style decision and presenting information; And generating financial answers and interaction strategies based on the emotion pre-warning index value, the task urgency and the authority level.
- 7. The AI-based multimodal financial question-answering system of claim 1, wherein the risk early warning and compliance execution module is configured to construct a three-dimensional risk monitoring matrix to implement active early warning and automatic response for high risk situations; Through real-time monitoring of a three-dimensional space of 'user permission-emotion index-urgency', when an abnormal mode of 'high-permission user + high negative emotion + high urgency' concurrent access to core sensitive financial data is detected, a multi-level early warning rule chain is triggered, question-answer risk verification is adopted to generate a high-risk log, a structured alarm is sent to a supervision party through a question-answer encryption channel, a predefined crisis response protocol is started at the same time, and finally compliance examination and risk calibration are carried out on output contents through a security check layer.
- 8. An AI-based multi-modal financial question-answering method applied to the AI-based multi-modal financial question-answering system of any one of claims 1-7, comprising the steps of: The method comprises the steps of firstly, carrying out multi-mode data feature extraction and analysis on target question-answer data, wherein the multi-mode data feature extraction and analysis comprises the steps of extracting voice emotion features, text keywords, context logic association features and picture recognition results of the target question-answer data; Step two, performing deep semantic analysis and question type recognition on the target question-answering data, performing semantic type recognition according to text keywords and context logic key features to determine question types of the question-answering target data, analyzing relevance among questions through context logic associated features, verifying the question types according to the relevance, and performing key parameter extraction verification by combining picture recognition features to generate target question data types; Thirdly, constructing a user emotion state time sequence model based on deep learning, calculating an emotion stability index by analyzing intensity fluctuation of voice emotion characteristics and density of text emotion keywords, and generating an emotion early warning index when the stability index is detected to be lower than a critical value; Step four, acquiring target question-answering data for question-answering operation analysis, taking the historical average waiting time of a user, the question times in unit time and a keyword library as dynamic evaluation targets, and dynamically evaluating task urgency of question-answering to generate a task evaluation index; And fifthly, based on the task evaluation index and the emotion pre-warning index, carrying out financial decision analysis by combining the user identity authority, and carrying out action selection in an answer candidate set of a financial knowledge base, and based on data optimization analysis index dynamics, generating financial answers and interaction strategies.
Description
AI-based multi-mode financial question-answering system and method Technical Field The invention relates to the technical field of dendrobium nobile processing, in particular to an AI-based multi-mode financial question-answering system and method. Background Machine learning and deep learning are core technologies of artificial intelligence that can automatically learn patterns and rules from a large amount of data and make predictions and decisions. In the financial field, machine learning algorithms may be used for tasks such as financial data classification, prediction, and anomaly detection. For example, algorithms such as decision trees, support vector machines and the like can be utilized to classify financial conditions of enterprises to judge whether the enterprises face financial risks, and neural network models can be utilized to predict stock prices, financial indexes and the like to provide references for investment decisions. The Convolutional Neural Network (CNN) and the cyclic neural network (RNN) and variants thereof (such as LSTM and GRU) in deep learning have unique advantages in the aspects of image processing, sequence data processing and the like, and can be applied to scenes such as financial statement image recognition, time sequence financial data analysis and the like. The existing AI financial question-answering systems are mostly based on simple matching patterns of "question-answer", which, although capable of handling multimodal inputs, have significant limitations: providing answers with the same depth and content for all users, and failing to distinguish beginners from experts, may cause information overload or insufficient information; Lack of situational awareness-inability to perceive the user's emotional state and the true urgency of the question, possibly giving untimely standard answers in high risk situations, even exacerbating the psychological risk of the interviewee; in view of the above technical drawbacks, a solution is now proposed. Disclosure of Invention The invention aims to sense emotion change of a user in real time by analyzing voice emotion characteristic intensity fluctuation and text emotion keyword density, wherein in a financial question-answering scene, the emotion of the user possibly reflects important information such as anxiety on financial conditions, satisfaction degree on solutions and the like, task urgency assessment is carried out on question-answering dynamics by taking user historical average waiting time, question times in unit time and keyword library as dynamic assessment standards, based on various dynamic assessment indexes, the system can adjust assessment strategies in real time according to user behaviors and question characteristics, requirements of different users are better adapted, financial decision analysis is carried out by combining user identity authority based on task assessment indexes and emotion early warning indexes, task urgency degree, user emotion and authority of the user in financial decision are considered, and limitation of single factor decision is avoided. In order to achieve the aim, the invention adopts the following technical scheme that the multi-mode financial question-answering system based on AI comprises a multi-mode input analysis module, a question-answering analysis module, an emotion perception module, a question-answering dynamic evaluation module and a financial dynamic decision module; The multi-mode input analysis module is used for carrying out multi-mode data feature extraction and analysis on the target question-answer data, and comprises the steps of extracting voice emotion features, text keywords, context logic association features and picture recognition results of the target question-answer data; The question-answering analysis module is used for carrying out deep semantic analysis and question type recognition on the target question-answering data, carrying out semantic type recognition according to text keywords and context logic key features so as to determine the question type of the question-answering target data, carrying out verification on the question type according to the relevance by analyzing the relevance among questions through the context logic relevance features, and carrying out key parameter extraction verification by combining picture recognition features so as to generate the target question data type; The emotion perception module is used for constructing a user emotion state time sequence model based on deep learning, calculating an emotion stability index by analyzing intensity fluctuation of voice emotion characteristics and density of text emotion keywords, and generating an emotion early warning index when the stability index is detected to be lower than a critical value; The question-answer dynamic evaluation module is used for acquiring target question-answer data to perform question-answer operation analysis, taking the historical average waiting time of a