CN-121981763-A - Intelligent report generation system based on multi-source data integration
Abstract
The invention discloses an intelligent report generating system based on multi-source data integration, and relates to the technical fields of business data processing and artificial intelligence; the invention realizes accurate semantic alignment and fusion of multi-source heterogeneous data through a multi-dimensional semantic anchor system of entities, time and space, constructs the fusion data into a multi-mode knowledge graph, digs and quantifies implicit causal relations and weights among nodes, realizes deep analysis from phenomenon description to root cause diagnosis, models a report generation process into a Markov decision process, dynamically plans out a self-adaptive report topological structure through a reinforcement learning intelligent agent, generates an analysis text by utilizing a natural language generation model, combines data characteristics and emotion tendency intelligent rendering graphs to obtain an intelligent analysis report, and realizes full-flow automation from multi-source data access, causal attribution analysis to personalized report generation, thereby remarkably improving the depth, efficiency and decision support value of business analysis.
Inventors
- QU ZHENYU
- CAI JUNHAO
Assignees
- 广州珠江商业经营管理有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251212
Claims (10)
- 1. The intelligent report generation method based on multi-source data integration is characterized by comprising the following steps of: s1, respectively extracting numerical characteristics of structured data, semantic characteristics of unstructured text and time sequence behavior characteristics of a semi-structured log in multi-source heterogeneous data; s2, calculating the matching degree of various features in the S1 by utilizing a multidimensional semantic anchor system constructed based on service entities and space-time attributes, and generating an aligned multimodal fusion data set; s3, introducing the aligned multi-modal fusion data set into a multi-modal knowledge graph comprising market element nodes, user behavior nodes and public opinion event nodes, utilizing a graph neural network to mine dominant correlation edges and implicit causal edges among the nodes in the multi-modal knowledge graph, calculating marginal effect values of the nodes on target service indexes, and extracting a key abnormal factor set; S4, modeling a report generation process as a Markov decision process, and defining a reinforcement learning agent of the report; based on weight distribution of the key abnormal factor set, generating a self-adaptive report topological structure by utilizing chapter logic, hierarchy depth and content emphasis of a reinforcement learning agent dynamic planning report; S5, based on a content framework defined by the self-adaptive report topological structure, converting the key abnormal factor set into a structured text description by using a natural language generation model, constructing a chart recommendation decision tree, automatically matching the type of the visual chart according to the data characteristics of the aligned multi-mode fusion data set, performing parameterization rendering to obtain an intelligent chart, and filling the structured text description and the intelligent chart into the self-adaptive report topological structure to obtain an intelligent analysis report.
- 2. The intelligent report generating method based on multi-source data integration according to claim 1, wherein S1 comprises the steps of: S11, accessing an internal service system, an external public network and a third party data platform to obtain multi-source heterogeneous data; S12, extracting numerical characteristics of structured data in the multi-source heterogeneous data, semantic characteristics of unstructured text and time sequence behavior characteristics of a semi-structured log.
- 3. The intelligent report generating method based on multi-source data integration according to claim 1, wherein said S2 comprises the steps of: S21, defining a multidimensional semantic anchor system, wherein the multidimensional semantic anchor system comprises an entity anchor, a time anchor and a space anchor, the entity anchor comprises a commodity SKU ID, a brand name and a shop ID, the time anchor comprises a standard time stamp and a service period window, and the space anchor comprises an administrative region code and a geographic coordinate; s22, cleaning the structured data, and directly extracting entity IDs, time stamps and geographic positions in the structured data to serve as dominant anchor points; S23, preprocessing unstructured text data, carrying out named entity recognition by utilizing a two-way long-short-term memory network and combining a conditional random field model, and extracting potential entity mention, time expression and place name mention in the text; s24, performing matching degree calculation on the various features in the S1, mapping the multi-source heterogeneous features to a unified semantic space, and generating an aligned multi-mode fusion data set.
- 4. A method for generating an intelligent report based on multi-source data integration as claimed in claim 3, wherein said S24 comprises the steps of: S241, calculating the name similarity and the context semantic similarity of the extracted potential entity and the entity in the standard entity library, and constructing an anchor point fusion judgment formula by combining the attribute coincidence degree; and S242, when the matching score is larger than a preset alignment threshold, mounting semantic features of unstructured data on corresponding structured entity anchors, mapping time fields of all features to the time anchors, and mapping space fields to the space anchors, so as to generate the aligned multi-mode fusion dataset.
- 5. The intelligent report generating method based on multi-source data integration according to claim 1, wherein said S3 comprises the steps of: s31, defining a map node type set, wherein the map node type set comprises market element nodes, user behavior nodes and public opinion event nodes; S32, constructing a physical structure of the multi-modal knowledge graph based on a graph node type set, reading the aligned multi-modal fusion data set, instantiating a data entity in the multi-modal fusion data set as a specific node in the graph, establishing a preliminary connection edge based on a time sequence co-occurrence relation among data, and establishing an inherent logic edge based on a business logic rule to obtain the instantiated multi-modal knowledge graph; S33, performing feature learning on the instantiated multi-mode knowledge graph by using a graph convolution neural network, taking attribute features of the nodes as initial embedded vectors, aggregating neighbor node information through multi-layer graph convolution operation, and updating node expression vectors; S34, aiming at any two nodes, calculating the error reduction amount of the future state of the prediction target node after the history sequence of the potential cause node is introduced, if the error reduction amount is obvious, judging that a cause and effect side exists, and calculating the cause and effect weight to form a perfect multi-mode knowledge graph; And S35, extracting a key abnormal factor set based on the perfect multi-mode knowledge graph, selecting a target service index node, searching the first K upstream node combinations and the corresponding causal link descriptions which have the largest positive or negative influence on the target node based on a causal weight reverse traversal graph, and obtaining the key abnormal factor set.
- 6. The intelligent report generating method based on multi-source data integration according to claim 5, wherein S34 comprises the steps of: s341, establishing a time sequence prediction reference model of the target node, predicting the future state of the target node by using only historical data of the target node, and calculating a first prediction residual square sum; s342, establishing an enhanced prediction model of the target node, predicting the future state of the target node by using the historical data of the target node and the historical data of the potential cause node together, and calculating a second prediction residual square sum; s343, calculating F statistic based on the first prediction residual square sum and the second prediction residual square sum, judging that causal relation exists if the P value corresponding to the F statistic is smaller than the significance level, and normalizing the error reduction proportion to be used as causal weight.
- 7. The intelligent report generating method based on multi-source data integration according to claim 1, wherein said S4 comprises the steps of: S41, constructing a Markov decision process model, wherein the Markov decision process model comprises a state space and an action space, the state space is defined as a triplet, the triplet comprises a report chapter list which is generated currently, a key abnormal factor set which remains to be explained, a logic consistency score of a current report, and the action space comprises a new descriptive chapter, a new diagnostic chapter, a new predictive chapter, a lifting level depth and an ending generation; S42, defining a reward function, wherein the reward function consists of information entropy gain rewards, logic coherence rewards and length penalty items; S43, converting the triplet defined by the state space into a feature vector, inputting the feature vector into a depth Q network, and outputting the Q value of each candidate action in the action space S41 by the network, selecting the action by adopting an epsilon-greedy strategy, executing the action in a simulation environment, calculating instant return according to a reward function after executing the action, updating a remaining factor set to be explained and a chapter list in the state vector, and updating network parameters through continuous iterative interaction until convergence to obtain an optimal generation strategy capable of maximizing cumulative rewards; s44, in the reasoning stage, the key transaction factor set, the weight and the causal link description output in the step S35 are used as initial inputs of the Markov decision process, a trained optimal strategy is applied, and a directory tree containing chapter titles, data source reference pointers and analysis dimensions is gradually generated from an initial state, so that a self-adaptive report topological structure is obtained.
- 8. The intelligent report generating method based on multi-source data integration according to claim 1, wherein said S5 comprises the steps of: S51, traversing each chapter node in the self-adaptive report topological structure generated in the step S4, extracting a key transaction factor set bound by each node and a corresponding causal link description as input, converting the structured data into a natural language paragraph by using a pre-trained natural language generation model, and generating an analysis text block corresponding to each node; S52, aiming at the node currently being processed in S51, calling corresponding bottom data from the aligned multi-mode fusion data set obtained in the step S2, extracting meta-feature vectors of the data, wherein the meta-feature vectors comprise the number of data dimensions, whether a time sequence HasTime is contained or not, whether a hierarchical relation HASHIERARCHY is contained or not and a data anomaly flag IsAnomaly is contained or not, inputting the meta-feature vectors into a chart recommendation decision tree, entering a time sequence branch if HasTime is True, recommending a line graph or a stacking area graph according to the number of the data dimensions, entering a static branch if HasTime is False, and recommending a sunglass graph, a tree graph or a histogram graph according to the value of HASHIERARCHY; S53, receiving the visual chart type determined in the S52 and the analysis text block generated in the S51, analyzing the emotion tendency value of the analysis text block, dynamically adjusting rendering parameters based on the emotion tendency value, mapping the main tone parameter of the chart into a red early warning color system if the emotion tendency is negative, mapping into a green growth color system if the emotion tendency is positive, and simultaneously automatically adding a highlight mark of an abnormal region when the chart is rendered to obtain a final visual chart object if the abnormality detection IsAnomaly is marked as True; S54, filling the analysis text block generated in the S51 into a corresponding text slot according to the skeleton of the self-adaptive report topological structure, embedding the visualized chart object rendered in the S53 into the corresponding chart slot, completing the structured assembly of the chart content, and finally rendering and exporting a complete intelligent analysis report.
- 9. The intelligent report generating method based on multi-source data integration according to claim 8, wherein the pre-trained natural language generating model in S51 is obtained by: S511, constructing an initial natural language generation model based on a coder-decoder framework of a Transformer, wherein the initial natural language generation model receives structural input, and is used for splicing and encoding attribute triplets of key abnormal factors and causal link descriptions; s512, collecting a plurality of paired structured input-standard text description samples of histories to obtain a domain training data set; s513, when an initial natural language generation model is trained, the structured input in the field training data set is used as a sample feature, the corresponding standard text description is used as a label, the difference between a prediction sequence and a real sequence of the initial natural language generation model is calculated through a cross entropy loss function, the error is reversely propagated through a gradient descent algorithm, and all weight parameters of the model are iteratively updated; S514, repeating the step S513 until the initial natural language generation model converges to obtain a pre-trained natural language generation model.
- 10. An intelligent report generating system based on multi-source data integration, wherein an intelligent report generating method based on multi-source data integration according to any one of claims 1-9 is implemented, the system comprising: the data acquisition and anchor point fusion module is used for accessing multi-source heterogeneous data, executing feature extraction to obtain an original feature data lake, and generating an aligned multi-mode fusion data set by utilizing an entity, time and space three-dimensional anchor point system; the multi-modal map reasoning module is used for receiving the aligned multi-modal fusion data set, constructing a multi-modal knowledge map, mining causal relation among data and outputting a key abnormal factor set and causal link description; the self-adaptive topology planning module dynamically plans a report structure by using a reinforcement learning algorithm based on weight distribution and causal link description of the key abnormal factor set to generate a self-adaptive report topology structure; And the content generation and rendering module converts the key abnormal factor set, the causal link description and the multi-modal fusion data set after alignment into texts and charts by using a natural language model and chart recommendation algorithm, and finally assembles and outputs an intelligent report.
Description
Intelligent report generation system based on multi-source data integration Technical Field The invention belongs to the technical field of business data processing and artificial intelligence, and particularly relates to an intelligent report generating system based on multi-source data integration. Background Under current digital inversion waves, business decisions increasingly rely on data-driven deep insights. The core task of the Business Intelligence (BI) and data analysis industry is to extract valuable information from massive, diverse business data and form reports that can be understood by decision makers. With the perfection of enterprise internal systems (such as ERP, CRM) and the explosive growth of external data sources (such as social media, public opinion, third party data platforms), the data environment facing enterprises has evolved from single, structured to multi-source, heterogeneous, high-dimensional complex forms. At present, a traditional report generation method commonly adopted in the industry is mainly applied to periodic operation complex and basic monitoring scenes. For example, businesses use BI tools to connect internal databases, periodically generate descriptive statistics of sales, inventory, etc., or through simple keyword matching, temporally parallel presentation of social media sound volume data and sales trends for underlying public opinion situational awareness. The core of these applications is to aggregate, visualize and simply correlate data of known dimensions. The method is characterized in that the method mainly comprises descriptive statistics and correlation analysis, wherein analysis dimension is single, causal driving chains behind data fluctuation are difficult to penetrate through the images, and cause difficulty, and report generation is seriously dependent on preset templates, structure solidification is not capable of dynamically adjusting description key points and logic structures according to differences of core problems in each analysis, and report practicability is insufficient. Disclosure of Invention (One) solving the technical problems Aiming at the problems in the related art, the invention provides an intelligent report generating system based on multi-source data integration, which aims to overcome the technical problems in the prior art. (II) technical scheme In order to solve the technical problems, the invention is realized by the following technical scheme: in a first aspect, the present invention provides an intelligent report generating method based on multi-source data integration, including the steps of: s1, respectively extracting numerical characteristics of structured data, semantic characteristics of unstructured text and time sequence behavior characteristics of a semi-structured log in multi-source heterogeneous data; s2, calculating the matching degree of various features in the S1 by utilizing a multidimensional semantic anchor system constructed based on service entities and space-time attributes, and generating an aligned multimodal fusion data set; s3, introducing the aligned multi-modal fusion data set into a multi-modal knowledge graph comprising market element nodes, user behavior nodes and public opinion event nodes, utilizing a graph neural network to mine dominant correlation edges and implicit causal edges among the nodes in the multi-modal knowledge graph, calculating marginal effect values of the nodes on target service indexes, and extracting a key abnormal factor set; S4, modeling a report generation process as a Markov decision process, and defining a reinforcement learning agent of the report; based on weight distribution of the key abnormal factor set, generating a self-adaptive report topological structure by utilizing chapter logic, hierarchy depth and content emphasis of a reinforcement learning agent dynamic planning report; S5, based on a content framework defined by the self-adaptive report topological structure, converting the key abnormal factor set into a structured text description by using a natural language generation model, constructing a chart recommendation decision tree, automatically matching the type of a visual chart according to the data characteristics of the aligned multi-mode fusion data set, and performing parameterization rendering to obtain an intelligent chart; Preferably, the step S1 includes the steps of: S11, accessing an internal service system, an external public network and a third party data platform to obtain multi-source heterogeneous data; S12, extracting numerical characteristics of structured data in multi-source heterogeneous data, semantic characteristics of unstructured text and time sequence behavior characteristics of a semi-structured log; Preferably, the step S2 includes the steps of: S21, defining a multidimensional semantic anchor system, wherein the multidimensional semantic anchor system comprises an entity anchor, a time anchor and a space anchor, the entity