Search

CN-121351766-B - Intelligent auditing method, system, equipment and medium

CN121351766BCN 121351766 BCN121351766 BCN 121351766BCN-121351766-B

Abstract

The invention relates to the field of auditing, in particular to an intelligent auditing method, system, equipment and medium, which comprises the steps of adopting a pre-trained multi-mode document understanding model to convert data to be audited into structural data, adopting a pre-trained lightweight model to carry out semantic analysis and classification on the structural data, comparing an analysis result with corresponding pre-classified auditing rules, if the requirement of the auditing rules is met, signing and checking, and if the requirement of the auditing rules is not met, combining RAG combined fine tuning to generate an rectifying and modifying suggestion, wherein the lightweight model is constructed by trimming a general large model and then loading LoRA adapter.

Inventors

  • LUO YIWANG
  • HU JIANGYU
  • NIU HAO
  • WANG GUOJUAN
  • CHEN JINHUA
  • WANG LIJUN
  • LI YUJING
  • Hou Chongcai
  • JI XIANGCHUN
  • ZHU SONGSONG
  • LIU XIAOYANG

Assignees

  • 北京国电通网络技术有限公司

Dates

Publication Date
20260505
Application Date
20251219

Claims (19)

  1. 1. An intelligent auditing method, comprising: converting the to-be-examined data into structural data by adopting a pre-trained multi-mode document understanding model; Carrying out semantic analysis and classification on the structured data by adopting a pre-trained lightweight model, comparing an analysis result with a corresponding pre-classified audit rule, checking to pass if the requirement of the audit rule is met, and generating a rectification suggestion by combining RAG joint fine tuning if the requirement of the audit rule is not met; The lightweight model is constructed by trimming a general large model and then loading LoRA adapters; the training of the lightweight model comprises the following steps: structural trimming is carried out on the general large model based on enterprise private domain knowledge and audit problem positioning comprehensive performance index F1 value; Taking the trimmed general large model as a teacher model, and constructing a student model based on a teacher model architecture; Adopting enterprise private domain knowledge to perform joint training on the teacher model and the student model until a joint loss function reaches a preset value; loading LoRA an adapter on the trained student model to obtain a lightweight model; the student model inserts pseudo quantization nodes in the forward propagation process to simulate a 4-bit computing environment, and the joint loss function is constructed based on distillation loss and task loss.
  2. 2. The intelligent auditing method of claim 1, wherein the enterprise private domain knowledge includes one or more of audit regulations, industry laws and regulations, industry standards, historical audit problems, and revision records and typical audit cases.
  3. 3. The intelligent auditing method according to claim 1, wherein the training process of the multimodal document understanding model is: two LoRA bypasses are added to each transducer layer of LayoutLMv models to construct a multi-mode document understanding model; carrying out structural labeling on unstructured audit data to form a training set; Freezing LayoutLMv models in the multi-modal document understanding model, and training the two LoRA bypasses by adopting the training set until the preset iteration times are reached, so as to obtain the trained multi-modal document understanding model.
  4. 4. A method of intelligent auditing according to claim 3, in which the structured notation is a JSON line notation.
  5. 5. A method of intelligent auditing according to claim 3, in which the structured labelling of unstructured audit data includes converting the unstructured audit data into a structured data format and generating semantic tags while retaining key identification of the unstructured audit data.
  6. 6. The intelligent audit method according to claim 5 wherein structurally labeling unstructured audit data includes: correcting and secondarily identifying blurring, tilting or low-contrast scanning by adopting a lightweight CRNN-TR engine; Adding a page-crossing merging cell analyzer and a row-column logic checker on Tabula cores to automatically fill broken rows and correct dislocation; Based on the BERT lightweight model, converting the text into a semantic vector to realize automatic clustering of the semantic similar fragments; and extracting key elements based on preset rules.
  7. 7. The intelligent auditing method of claim 1, wherein generating the reformulation advice further comprises: And checking the correction proposal and feeding back the checking result to the lightweight model so as to update the lightweight model.
  8. 8. The intelligent auditing method according to claim 1, further comprising USBKey login verification and identity authority classification of the identity of the user who enters the data to be audited before the data to be audited is converted into the structured data by using a pre-trained multimodal document understanding model.
  9. 9. An intelligent audit system, comprising: The structuring module is used for converting the to-be-examined data into structured data by adopting a pre-trained multi-mode document understanding model; the auditing module is used for carrying out semantic analysis and classification on the structured data by adopting a pre-trained lightweight model, comparing an analysis result with a corresponding pre-classified auditing rule, checking to pass if the requirement of the auditing rule is met, and generating an rectifying and modifying suggestion by combining RAG joint fine tuning if the requirement of the auditing rule is not met; The lightweight model is constructed by trimming a general large model and then loading LoRA adapters; the training of the lightweight model in the auditing module comprises the following steps: structural trimming is carried out on the general large model based on enterprise private domain knowledge and audit problem positioning comprehensive performance index F1 value; Taking the trimmed general large model as a teacher model, and constructing a student model based on a teacher model architecture; Adopting enterprise private domain knowledge to perform joint training on the teacher model and the student model until a joint loss function reaches a preset value; loading LoRA an adapter on the trained student model to obtain a lightweight model; the student model inserts pseudo quantization nodes in the forward propagation process to simulate a 4-bit computing environment, and the joint loss function is constructed based on distillation loss and task loss.
  10. 10. The intelligent audit system according to claim 9 wherein the enterprise private domain knowledge in the audit module includes one or more of audit regulations, industry laws and regulations, industry standards, historical audit problems and revision records and typical audit cases.
  11. 11. The intelligent audit system according to claim 9 wherein the training process for the multimodal document understanding model in the structured module is: two LoRA bypasses are added to each transducer layer of LayoutLMv models to construct a multi-mode document understanding model; carrying out structural labeling on unstructured audit data to form a training set; Freezing LayoutLMv models in the multi-modal document understanding model, and training the two LoRA bypasses by adopting the training set until the preset iteration times are reached, so as to obtain the trained multi-modal document understanding model.
  12. 12. The intelligent audit system of claim 11 wherein the structured labels in the structured modules are JSON line labels.
  13. 13. The intelligent audit system of claim 11 wherein the structuring module structured annotates unstructured audit data includes converting the unstructured audit data into a structured data format and generating semantic tags while retaining key identifications of the unstructured audit data.
  14. 14. The intelligent audit system according to claim 13 wherein the structured module structurally labeling unstructured audit data includes: correcting and secondarily identifying blurring, tilting or low-contrast scanning by adopting a lightweight CRNN-TR engine; Adding a page-crossing merging cell analyzer and a row-column logic checker on Tabula cores to automatically fill broken rows and correct dislocation; Based on the BERT lightweight model, converting the text into a semantic vector to realize automatic clustering of the semantic similar fragments; and extracting key elements based on preset rules.
  15. 15. The intelligent audit system according to claim 9 wherein the audit module further includes, after generating the reformulation proposal: And checking the correction proposal and feeding back the checking result to the lightweight model so as to update the lightweight model.
  16. 16. The intelligent audit system according to claim 9 wherein the structuring module further includes USBKey login verification and identity authority classification for the identity of the user who entered the data to be audited prior to converting the data to structured data using a pre-trained multimodal document understanding model.
  17. 17. An automated intelligent auditing system, characterized in that an intelligent auditing method according to any one of claims 1-8 is realized by adopting intelligent agent automation.
  18. 18. The computer equipment is characterized by comprising at least one processor and a memory, wherein the memory and the processor are connected through a bus; the memory is used for storing one or more programs; The intelligent audit method according to any of claims 1 to 8 is implemented when said one or more programs are executed by said at least one processor.
  19. 19. A computer readable storage medium having stored thereon an execution program which, when executed, implements the intelligent auditing method of any of claims 1 to 8.

Description

Intelligent auditing method, system, equipment and medium Technical Field The invention relates to the field of auditing, in particular to an intelligent auditing method, system, equipment and medium. Background With the development of technology, the application of enterprise digitization is more and more widespread, and in the management of digitization enterprises, audit by using digitization is also more and more widespread, however, the existing audit work has the following problems: (1) Unstructured data processing is weak. A large amount of audit data exists in unstructured forms such as PDF (including encrypted, scanned version), handwritten pictures, signature scanned parts, voice-to-text records, and the like. The traditional mode relies on manual page-by-page reference, so that the efficiency is extremely low, key information (such as payment nodes and signature validity) is easy to misjudge due to disclosure and missed viewing, and the error rate is high. (2) The intelligent analysis capability is insufficient. Traditional audit software is developed based on a structured data table (such as fixed field matching and numerical threshold verification), lacks semantic understanding and logical reasoning capability, and is completely dependent on manual judgment. (3) The model and the system are difficult to fuse. The enterprise audit system has strong 'private domain attribute', on one hand, the system clauses often contain industry specific terms, and on the other hand, the system is updated frequently (3-5 times of annual adjustment). The general large model cannot be accurately adapted because training data does not cover enterprise private domain knowledge, so that the auditing basis of model recommendation has deviation from an actual system, and the usability is insufficient. (4) The number of the audit chain breakpoints is large. The whole auditing flow involves multiple links of data collection, analysis, problem positioning, manuscript generation, report output, correction tracking, but the existing tools are scattered, data needs to be manually transmitted across the tools, and the problems of data version confusion, analysis conclusion, report disjoint and the like are easy to occur. For example, the problems found by audit need to be manually copied to the manuscript, and if the analysis result is modified in the middle, the manuscript is easy to miss and update, so that a closed loop execution breakpoint is formed. Therefore, it is highly desirable to design an integrated and secure whole-process audit system and method to meet enterprise high-efficiency, accurate and secure audit requirements. Disclosure of Invention In order to solve the problems, the invention provides an intelligent auditing method, which comprises the following steps: converting the to-be-examined data into structural data by adopting a pre-trained multi-mode document understanding model; Carrying out semantic analysis and classification on the structured data by adopting a pre-trained lightweight model, comparing an analysis result with a corresponding pre-classified audit rule, checking to pass if the requirement of the audit rule is met, and generating a rectification suggestion by combining RAG joint fine tuning if the requirement of the audit rule is not met; the lightweight model is constructed by trimming a general large model and then loading LoRA adapters. Optionally, the training of the lightweight model includes the following steps: structural trimming is carried out on the general large model based on enterprise private domain knowledge and audit problem positioning comprehensive performance index F1 value; taking the general large model or the trimmed general large model as a teacher model, and constructing a student model based on a teacher model architecture; Adopting enterprise private domain knowledge to perform joint training on the teacher model and the student model until a joint loss function reaches a preset value; loading LoRA an adapter on the trained student model to obtain a lightweight model; the student model inserts pseudo quantization nodes in the forward propagation process to simulate a 4-bit computing environment, and the joint loss function is constructed based on distillation loss and task loss. Optionally, the enterprise private domain knowledge includes one or more of audit regulations, industry laws and regulations, industry standards, historical audit problems, and revision records and typical audit cases. Optionally, the training process of the multi-mode document understanding model is as follows: two LoRA bypasses are added to each transducer layer of LayoutLMv models to construct a multi-mode document understanding model; carrying out structural labeling on unstructured audit data to form a training set; Freezing LayoutLMv models in the multi-modal document understanding model, and training the two LoRA bypasses by adopting the training set until the preset iteration time