CN-122022724-A - Government information disclosure monitoring method based on artificial intelligence
Abstract
The invention discloses an information disclosure monitoring analysis method based on artificial intelligence, in particular to the technical field of data processing and analysis, the method comprises six steps of information data preprocessing, information structuring analysis, information quality intelligent assessment, information disclosure quantitative analysis, abnormal information intelligent identification and monitoring result visualization generation. The method comprises the steps of carrying out deep semantic analysis on information content by a natural language processing technology, carrying out word segmentation, named entity recognition and dependency syntactic analysis by using a Bi-LSTM-CRF model and a BERT model, constructing a knowledge graph to realize the structural representation of information, and providing multi-dimensional data aggregation, multi-type chart generation, interactive drilling analysis and multi-format output functions.
Inventors
- CHEN DAN
- Bian Zejuan
- SHEN TIANTIAN
- YANG SHANHUI
- WU JIANFEI
Assignees
- 安徽省安策智库咨询有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260128
Claims (9)
- 1. The government affair information disclosure monitoring method based on artificial intelligence is characterized by comprising the following steps: Step S1, preprocessing information data, namely acquiring original information data through a data receiving unit, performing duplication removal, denoising and format unified processing through a data cleaning unit, classifying according to information subjects, release time and information types through a data classifying unit, and adding metadata tags through a data labeling unit to obtain an information data set to be analyzed; s2, information structuring analysis, namely segmenting text content of an information data set to be analyzed by utilizing a text segmentation unit, carrying out semantic analysis and entity recognition on the segmented text by utilizing a semantic recognition unit, extracting key information elements by utilizing an information element extraction unit, and organizing the information elements into a knowledge graph in a triplet form by utilizing a knowledge graph construction unit to obtain structured information knowledge representation; S3, information quality intelligent evaluation, namely detecting information integrity through an integrity evaluation unit, verifying information accuracy by an accuracy evaluation unit, analyzing issuing timeliness and updating frequency by an timeliness evaluation unit, evaluating text readability by a readability evaluation unit, and comprehensively calculating information quality total score by a quality score calculation unit to obtain a quality evaluation report; S4, quantitatively analyzing information disclosure, counting the coverage rate of a subject through a coverage analysis unit, evaluating the depth level of the content by using a disclosure depth analysis unit, analyzing the transmission range and the coverage of an audience by using a disclosure breadth analysis unit, counting the release frequency and the periodicity through a disclosure frequency analysis unit, and calculating a comprehensive disclosure index by using a disclosure comprehensive calculation unit to obtain a disclosure quantitative analysis result; S5, intelligently identifying abnormal information, establishing an abnormal characteristic pattern library through an abnormal pattern definition unit, identifying quality unqualified information by a quality abnormal detection unit, detecting and releasing aging abnormality by an aging abnormality detection unit, identifying content abnormal characteristics by a content abnormality detection unit, and classifying and labeling the abnormal information by an abnormal information classification labeling unit to obtain an abnormal information identification result list; And S6, visually generating a monitoring result, integrating an abnormal identification result and a publication degree analysis result through a data aggregation unit, generating a visual chart through a chart generation unit, filling the chart and the data into a report template through a report template rendering unit, creating a visual interaction interface through an interaction interface generation unit, and outputting a multi-format monitoring report through a result output unit.
- 2. The government information disclosure monitoring method based on artificial intelligence according to claim 1 is characterized in that the semantic recognition unit comprises word segmentation processing, part-of-speech tagging, named entity recognition and dependency syntactic analysis, the word segmentation processing adopts a Bi-LSTM-CRF model based on deep learning, an input layer of the model is a character-level embedded vector, the dimension is set to 128 dimensions, a hidden layer comprises two layers of bidirectional LSTMs, the number of neurons in each layer is 256, the CRF layer is used for sequence tagging output, the named entity recognition adopts a BERT-BiLSTM-CRF model, the BERT layer loads pre-training weights by using a Chinese pre-training model, the BiLSTM layer comprises two layers, each layer comprises 512 hidden units, and the type of the recognized entity comprises a mechanism name, a place name, a time, a person name and a proper noun.
- 3. The method for monitoring government affair information disclosure based on artificial intelligence according to claim 1, wherein the information element extraction unit performs main body element extraction, event element extraction, time element extraction and numerical element extraction, wherein the main body element extraction uses main-name relationships and centering relationships in dependency syntactic analysis to determine main body elements by identifying main words and core noun phrases in sentences and combining mechanism name entities in named entity identification results, and a confidence calculation formula of the main body elements is that confidence=0.4×entity_score+0.3×synthases_score+0.3×position_score, wherein entity_score is entity identification confidence, synthases_score is syntax relationship confidence, and position_score is position weight score.
- 4. The method for monitoring government information disclosure based on artificial intelligence according to claim 1, wherein the integrity evaluation unit performs necessary element detection and structural integrity detection, the necessary element detection defines a necessary element list according to information types, queries whether corresponding element nodes exist from a knowledge graph represented by the structured information knowledge, counts the number of necessary element deletions, calculates an integrity score of complete_score= (total number of necessary elements-number of missing elements)/total number of necessary elements multiplied by 100, and detects whether paragraph structures of information conform to a standard format, detects whether messy code characters exist in text by using a regular expression, and detects whether abnormal cut-off features exist in paragraphs.
- 5. The method for public monitoring of government information based on artificial intelligence according to claim 1, wherein the accuracy assessment unit performs a fact verification and a logic consistency detection, the fact verification establishes a fact knowledge base as a verification reference, matches a fact triplet extracted from the information to be assessed with the triplet in the knowledge base, and calculates a cosine distance of a semantic vector by using a Sentence-BERT model, wherein the triplet similarity sim_triple=0.4×sim_subject+0.3×sim_precursor+0.3×sim_subject, and sim_ PREDICATE, SIM _subject are text similarity of a subject, a relation, and an object, respectively.
- 6. The method for monitoring government affair information disclosure based on artificial intelligence according to claim 1, wherein the disclosure depth analysis unit performs information hierarchy analysis and detail richness assessment, the information hierarchy analysis divides information content into five depth hierarchies, namely an L1 basic information layer, an L2 summary information layer, an L3 detailed information layer, an L4 depth information layer and an L5 complete information layer, and comprehensively judges according to the text length of the information, the number of times of paragraph structure layers, whether attachments are contained or not and the number of numerical data, and calculates a depth score depth_score=Σ (level_i×count_i×weight_i)/total_count, wherein level_i is depth hierarchy numbers 1-5, count_i is the number of information of the hierarchy, and weight_i is hierarchy weight coefficients of 0.2, 0.4, 0.6, 0.8 and 1.0, respectively.
- 7. The artificial intelligence based government information disclosure monitoring method according to claim 1, wherein the disclosure frequency analysis unit performs time series statistics and periodicity detection, the periodicity detection detects a periodicity pattern of the time series by using an autocorrelation analysis method, calculates different lags according to the artificial intelligence based government information disclosure monitoring method according to claim 1, and is characterized in that the disclosure frequency analysis unit performs time series statistics and periodicity detection, the periodicity detection detects a periodicity pattern of the time series by using an autocorrelation analysis method, calculates autocorrelation coefficients ACF (lag) =cov (day_count_t, day_count_t-lag)/Var (day_count) of different lags, and determines that there is a periodicity when ACF (lag) has a significant peak at lag=7, 14, 21, 28 and ACF >0.3, and determines that there is a periodicity when ACF (lag) has a significant peak at lag=30 months.
- 8. The government information disclosure monitoring method based on artificial intelligence according to claim 1, wherein the content anomaly detection unit performs sensitive information detection, repeated content detection and missing information detection, wherein the repeated content detection adopts SimHash algorithm to calculate fingerprint characteristics of each piece of information, word segmentation is performed on information text, keywords and weights thereof are extracted, hash values are calculated for each keyword, weighted accumulation is performed on the hash values according to the keyword weights to obtain SimHash fingerprints, the fingerprint length is 64 bits, simHash Hamming distances between information to be detected and all information in a historical information base are calculated, when the Hamming distance is smaller than 3, the highly repeated content is determined, and the repeated similarity is calculated as similarity= (64-hash_distance)/64×100%.
- 9. The government information disclosure monitoring method based on artificial intelligence according to claim 1, wherein the chart generation unit performs trend chart generation, distribution chart generation, comparison chart generation and relation chart generation, the relation chart generation generates a relation network chart based on a knowledge graph structure generated by the knowledge graph construction unit, node data is an entity element array, side data is a relation triplet array, the relation chart layout adopts a force guide layout, when the number of nodes exceeds 500, a node aggregation function is started, the nodes which are of the same type and are dense in relation are aggregated into one super node, the aggregation algorithm adopts a community discovery algorithm Louvain, and the size of the aggregated super node is dynamically adjusted according to the number of included child nodes.
Description
Government information disclosure monitoring method based on artificial intelligence Technical Field The invention relates to the technical field of data processing and analysis, in particular to a government affair information disclosure monitoring method based on artificial intelligence. Background The conventional information monitoring and analyzing method generally adopts a text processing mode based on rules, and the operation flow of the method comprises the steps of firstly carrying out simple classification and labeling on information data through keyword matching and regular expressions, then evaluating the integrity and timeliness of the information based on preset scoring rules, mainly relying on field integrity check and time difference calculation in the evaluation process, and finally outputting an evaluation result in a report form, wherein report content comprises basic indexes such as information statistics quantity, average score and the like. The data processing flow of the method is relatively simple, is mainly realized through SQL query statistics and Excel table summarization, the identification of abnormal information depends on manually set threshold judgment, when a certain index is lower than the threshold, an abnormal mark is triggered, a static chart is mainly used for visual display, and a user acquires a monitoring report in a fixed format. However, the prior art has the following defects that firstly, the information quality evaluation dimension is single and the depth analysis of semantic layers is lacked, the evaluation is carried out only through surface features such as field integrity, character quantity and the like, the accuracy, logic consistency and actual value of information content cannot be accurately identified, so that larger deviation exists between an evaluation result and the real quality of the information, the fact error or logic contradiction exists on the content but the information with complete form cannot be effectively identified, secondly, the anomaly detection method depends on a fixed threshold value and a simple rule, the intelligent recognition capability on an anomaly mode is lacked, the hidden anomalies such as complex anomaly conditions such as repeated information release, release frequency anomaly fluctuation, content depth degradation and the like cannot be detected, the fixed threshold value is difficult to adapt to the differentiation requirements of different types of information, the misreporting rate and the omission rate are higher, thirdly, the analysis result is single in display form, the multi-dimensional association analysis and the interactive data drilling function are lacked, the user cannot flexibly view data from different visual angles, the root causes generated by the anomaly cannot be deeply analyzed, the monitoring report is difficult to meet the personalized requirements of different users in static format, and the practical value of the monitoring result is limited. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides an artificial intelligence-based government information disclosure monitoring method, which solves the problems in the background art through the following scheme. In order to achieve the purpose, the invention provides the technical scheme that the government affair information disclosure monitoring method based on artificial intelligence comprises the following steps: Step S1, preprocessing information data, namely acquiring original information data through a data receiving unit, performing duplication removal, denoising and format unified processing through a data cleaning unit, classifying according to information subjects, release time and information types through a data classifying unit, and adding metadata tags through a data labeling unit to obtain an information data set to be analyzed; s2, information structuring analysis, namely segmenting text content of an information data set to be analyzed by utilizing a text segmentation unit, carrying out semantic analysis and entity recognition on the segmented text by utilizing a semantic recognition unit, extracting key information elements by utilizing an information element extraction unit, and organizing the information elements into a knowledge graph in a triplet form by utilizing a knowledge graph construction unit to obtain structured information knowledge representation; S3, information quality intelligent evaluation, namely detecting information integrity through an integrity evaluation unit, verifying information accuracy by an accuracy evaluation unit, analyzing issuing timeliness and updating frequency by an timeliness evaluation unit, evaluating text readability by a readability evaluation unit, and comprehensively calculating information quality total score by a quality score calculation unit to obtain a quality evaluation report; S4, quantitatively analyzing information disclosure, counting