CN-122024250-A - Intelligent auditing method and device based on large model and multi-mode data fusion

CN122024250ACN 122024250 ACN122024250 ACN 122024250ACN-122024250-A

Abstract

The invention belongs to the technical field of artificial intelligence and audit, and particularly relates to an intelligent audit method and device based on large model and multi-mode data fusion. Firstly, constructing a vector knowledge base, then extracting deep features of unstructured bill data, original audit text and original audit form, acquiring multi-mode joint embedded representation features based on cross-mode comparison learning and fusion of a dynamic weight algorithm based on risk perception attention, acquiring a vector knowledge base retrieval result, constructing an audit prompt, finally generating a risk detection score and a risk evidence chain based on a large language model, constructing an audit program template through a standard audit program template obtained by retrieving the vector knowledge base, and finally generating a structured audit report by combining the audit program template, the risk detection score and the risk evidence chain. The invention solves the problems of low efficiency and low accuracy of multi-mode data fusion audit based on texts, tables, images and the like.

Inventors

LI LIPING
WANG YANXIA
YIN SIJIE
LIU PENG
ZHANG BAOGUO

Assignees

山东经贸职业学院
山东麦港数据系统有限公司

Dates

Publication Date: 20260512
Application Date: 20251202

Claims (10)

1. An intelligent auditing method based on large model and multi-mode data fusion, which is characterized by comprising the following steps: S1, preprocessing original data containing legal and legal standards, industry risk data, historical audit data and audit program templates to construct a vector knowledge base; s2, extracting unstructured bill data of an audit image based on an OCR module, and extracting deep features of the unstructured bill data, an original audit text and an original audit form based on a Transformer feature extraction module; S3, acquiring multi-mode joint embedded representation features based on cross-mode contrast learning and fusing a dynamic weight algorithm based on risk perception attention; S4, obtaining a vector knowledge base retrieval result based on the multi-mode joint embedded representation feature, and constructing an audit prompt according to the bill data, the form data and the vector knowledge base retrieval result; S5, generating a risk detection score and a risk evidence chain based on the large language model, constructing an audit program template through a standard audit program template obtained by retrieving a vector knowledge base, and generating a structured audit report by combining the audit program template, the risk detection score and the risk evidence chain.
2. The method of claim 1, wherein the constructing of the vector knowledge base in S1 comprises at least data preparation and data vectorization, wherein the data preparation comprises preprocessing of raw data, the raw data comprises legal and regulatory standards, industry risk data, historical audit data and standard audit program templates, the preprocessing operation comprises PDF parsing, cleaning and sensitive information desensitization, and the data vectorization converts text blocks into 768-dimensional vectors by means of an adaptive fine-tuning BERT embedding model and attaches metadata.
3. The method according to claim 2, wherein in S1, the construction of the vector knowledge base further includes at least a storage optimization, and the storage optimization adopts FAISS to construct a hierarchical index, specifically including: Firstly, vectorized data are classified into barrels through coarse-granularity clustering: (1); In the formula (1): Representing coarse-grained clustering results obtained through a k-means classification method, namely a codebook; Representing the number of clustering centers; Representing one sample in the ith cluster after vectorization of audit data; Represent the first The center of each cluster; Representation allocation to the first A set of data points for each cluster; then, establishing fine granularity indexes in each barrel, and improving efficiency by combining quantization compression, wherein the formula is as follows: (2); (3); (4); in the formulae (2) to (4): representing the established fine granularity index; Representing the total number of sub-vectors in the cluster; Representing the j-th subvector in the cluster; Representing sub-vectors Quantized codeword, subvector of (a) The jth sample in the cluster vectorized by the examination data Subtracting the center of the cluster Obtaining; representing subspaces for quantification of build performance; representing a codebook to which a j-th subvector in the cluster belongs; Representing sub-vectors In the codebook Is the nearest neighbor codeword in (c).
4. The method according to claim 1, wherein in S2, the transform-based deep feature extraction includes at least weight calculation that captures global dependencies of an input sequence through a multi-headed self-attention mechanism as follows: (5); in formula (5): A vector representing the query, a vector of keys, and a vector of values, respectively; Representing the number of attention heads; Represent the first An output vector of the individual head; representing an output transform matrix; (6); in formula (6): Respectively represent the first Head-by-head queries, keys, and value transformation matrices; as a self-attention function, and: (7); in the formula (7): representing the dimensions of the key vector; is an activation function; representing the matrix transpose.
5. The method of claim 4, wherein the transform-based deep feature extraction further comprises at least time sequence information injection and feature abstraction, wherein the time sequence information injection is implemented by position coding, the feature abstraction is implemented by layer normalization and residual connection stabilization training process, nonlinear feature transformation is performed by feedforward neural network, and finally hierarchical feature abstraction is implemented by stacking 12 layers of transform blocks, and deep features containing semantics, grammar and domain knowledge are output.
6. The method of claim 1, wherein in S3, the cross-modality contrast learning includes at least feature distance control, the feature distance control including: Constructing positive and negative sample pairs, wherein the positive sample pairs generate different views of the same example through data enhancement to zoom in the distance; the positive sample characteristic distance is shortened and the negative sample characteristic distance is shortened by adopting a contrast loss function, and the contrast loss function The method comprises the following steps: (8); In formula (8): Z represents the number of negative samples; Indicating the current first Sample and the first Similarity of the individual samples; is a temperature coefficient for controlling the sharpness of the distribution.
7. The method of claim 6, wherein in S3, the cross-modal contrast learning further comprises at least a feature alignment and a multi-modal joint embedding representation, wherein the feature alignment is implemented by a cross-modal risk aware attention dynamic weighting algorithm; The multi-modal joint embedding representation feature The sharpening degree of the similarity distribution is adjusted by adding a temperature coefficient, and the training stability is improved by adopting a momentum encoder, wherein the formula is as follows: (9); (10); In formulas (9) - (10): the attention weight of the I mode to the J mode is represented, and the attention degree of I to J is represented; 、 risk feature vectors for modalities I, J, respectively; And For a modality specific encoder, d represents a feature dimension; h represents the serial number of the risk feature vector; representing an h risk feature vector; 、 is a learnable modal transformation matrix; A feature vector representing modality I; A feature vector representing modality J; For element-by-element addition or concatenation operations.
8. An auditing device for realizing an intelligent auditing method based on large model and multi-mode data fusion, which is characterized by comprising: the vector knowledge base construction module is used for constructing a vector knowledge base based on preprocessing the original data comprising legal and legal standards, industry risk data, historical audit data and audit program templates; The feature extraction module is used for extracting unstructured bill data of the audit image based on the OCR module and extracting deep features of the unstructured bill data, the original audit text and the original audit form based on the Transformer feature extraction module; The multi-mode joint embedding module is used for acquiring multi-mode joint embedding representing characteristics based on cross-mode contrast learning and fusing a dynamic weight algorithm based on risk perception attention; The audit prompt generation module is used for acquiring a vector knowledge base search result based on the multi-mode joint embedded representation feature and constructing an audit prompt according to the bill data, the form data and the vector knowledge base search result; and the audit report generation module is used for generating a risk detection score and a risk evidence chain based on the large language model, constructing an audit program template through a standard audit program template obtained by retrieving the vector knowledge base, and generating a structured audit report by combining the audit program template, the risk detection score and the risk evidence chain.
9. An electronic device, the electronic device comprising: at least one processor, and A memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the large model and multimodal data fusion-based intelligent audit method of any of claims 1-7.
10. A machine-readable storage medium having stored thereon executable instructions that when executed cause the machine to perform the intelligent auditing method of any of claims 1-7 based on large model and multimodal data fusion.

Description

Intelligent auditing method and device based on large model and multi-mode data fusion Technical Field The invention belongs to the technical field of artificial intelligence and audit, and particularly relates to an intelligent audit method and device based on large model and multi-mode data fusion. Background At present, the intelligent audit technology is widely applied to the fields of finance, enterprise compliance, government supervision, medical treatment, manufacturing, electronic commerce and the like. For example, in financial auditing, the technology can integrate transaction records, contract text, voice calls and other multi-mode data to detect fraudulent behaviors, and in the medical field, medical insurance violations can be identified by analyzing diagnosis and treatment text, image data and reimbursement receipts. However, the prior art has the obvious defects that firstly, data fragmentation is serious, single mode is relied on, risk clues in heterogeneous data such as texts, images and the like are difficult to associate, secondly, the automation level is low, the complex fraud mode is high in omission rate due to the fact that manual sampling or a simple rule engine is mainly relied on, thirdly, the real-time performance is insufficient, most of traditional audits are traced afterwards, risks cannot be intercepted dynamically, thirdly, the model generalization capability is weak, and the migration cost of an industry-specific model is high. Based on this, a multi-modal auditing method based on a large model is proposed to solve the above-mentioned drawbacks of the prior art. For example, chinese patent document CN120011543a discloses a multi-modal auditing method based on a large model, by comprehensive data acquisition and preprocessing, the quality and consistency of data are ensured, the technologies such as CNN and NLP are utilized to perform feature extraction on images, texts and numerical data, the deep fusion and analysis of multi-modal data are realized, the abnormal mode and potential relation in the data are deeply revealed through suspense mining, knowledge graph association analysis and emotion analysis, the causal tracing model is applied, the causal relation and influencing factors in a specific scene are further clarified, and finally, the generated audit report content is real and has clear structure, and the persuasion of the report is enhanced through data visualization. The method improves the accuracy and efficiency of audit, provides powerful support for audit decision, and solves the problems of insufficient accuracy and low efficiency in the prior art. Chinese patent document CN117909447A discloses an intelligent auditing method based on a large language model aiming at a hospital auditing scene. The method can understand the questions of the user, convert the questions into text calculation tasks, perform interactive calculation with the private diagnosis and treatment record data of the hospital, and finally form answers in multi-mode forms such as texts, pictures and forms. The method comprises the steps of realizing intention recognition of a large model to a complex task based on less sample learning, constructing a representation and an association relation of vector and relational database storage data, constructing a feedback mechanism and a self-adaptive module to optimize the representation, completing matching of audit questions and answers based on similarity calculation, calling the large language model to form formatted output, and displaying a result in a chart. The invention solves the interaction problem between the large language model and the sensitive audit data, avoids the leakage risk of the sensitive data, and improves the intelligent degree of complex audit scenes (such as hospital audit) by utilizing the language understanding capability of the large language model. Therefore, the invention designs an intelligent auditing method based on large model and multi-mode data fusion, so as to solve the problems of low efficiency and low accuracy of multi-mode data fusion auditing based on texts, tables, images and the like. Disclosure of Invention The invention aims to overcome at least one defect of the prior art, and provides an intelligent auditing method based on large model and multi-mode data fusion so as to solve the problems of low efficiency and low accuracy of multi-mode data fusion auditing based on texts, tables, images and the like. The invention also discloses an auditing system loaded with the intelligent auditing method based on the large model and the multi-mode data fusion. The detailed technical scheme of the invention is as follows: An intelligent auditing method based on large model and multi-mode data fusion, the method comprising: S1, preprocessing original data containing legal and legal standards, industry risk data, historical audit data and audit program templates to construct a vector knowledge base; s2, extracting uns