CN-121980379-A - Quality document self-adaptive examination system and method

CN121980379ACN 121980379 ACN121980379 ACN 121980379ACN-121980379-A

Abstract

The invention relates to the technical field of document inspection, and particularly discloses a quality document self-adaptive inspection system and method. The system comprises a workflow and rule policy configuration module, an input module, a task state sensing module, a multi-channel feedback generation module, a unified rewards fusion module, a reinforcement learning policy optimization and online self-calibration module, a policy updating and publishing module, a document examination module, an output module and a review result. The invention is oriented to the equipment manufacturing industry and the energy industry, and has the technical characteristics of self-adaptive optimization inspection strategy and high intelligent degree.

Inventors

CHEN TIANQI
WANG JIALE
HUANG YAN
ZHANG MINGYANG
DENG XUTAO
HAN XIN
CHEN BINGYUAN

Assignees

东方电气集团数字科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251223

Claims (10)

1. A quality document adaptive screening system, the adaptive screening system comprising: The workflow and rule policy configuration module is used for constructing an inspection policy space and providing a limited action set for reinforcement learning; an input module for inputting a quality document to be inspected; the task state sensing module is used for identifying the type, the structural characteristics and the task context of the current input quality document and generating a task embedded vector to be used as the input of the reinforcement learning examination strategy space; the multi-channel feedback generation module acquires tasks from three signal channels of a model evaluation channel, a service benefit channel and a manual examination channel and performs feedback; The unified rewards fusion module is used for carrying out unified calculation on the multidimensional feedback signals; The reinforcement learning strategy optimization and online self-calibration module is used for carrying out self-adaptive strategy optimization in a limited rule examination strategy space formed by the workflow and rule strategy configuration module and is used for automatically reassigning weights of different feedback channels; the strategy updating and publishing module is used for updating the optimized strategy parameters to the examination strategy space of the workflow and rule strategy configuration module in real time and guiding the next round of document examination task; A document review module that performs a review process on the current quality document with the matched review policy; And the output module is used for outputting the examination result.
2. The quality document adaptive screening system of claim 1, wherein the workflow and rule policy configuration module constructs a screening policy space comprising: Constructing inspection nodes of each kind of quality document, and defining a corresponding strategy template, an execution threshold and a tool calling mode under each inspection node; and constructing a rule structure of commonalities and differences among different types of quality documents to form an interpretable rule network.
3. The quality document adaptive inspection system according to claim 1, wherein the task state sensing module extracts state features based on document meta information, text content and history inspection results, and generates task embedded vectors through a deep semantic coding model; The status features include, but are not limited to: -a document category; -examining a point distribution and hierarchy; -historical review pass rate and anomaly rate; -a current rule execution state; -time and task context information.
4. The quality document adaptive inspection system according to claim 1, wherein the model evaluation channel of the multi-channel feedback generation module is used for outputting quality scores through a large language model or a decision model, the service benefit channel is used for obtaining key indexes including inspection passing rate, rework rate and manual review delay, and the manual inspection channel is used for obtaining manual review results given by an inspector; The multi-channel feedback generation module also introduces an execution cost function to be used for comprehensively reflecting the cost and the human review cost of the model call; the arrival time of each channel signal of the multi-channel feedback generation module is controlled by the following time weighting function: ; In the formula, The weighting coefficient of the feedback signal under the time delay condition is used; The time difference between the actual arrival time of the feedback signal and the task execution time; is a time decay coefficient.
5. The quality document adaptive review system of claim 1 wherein the unified rewards fusion module is constructed with a rewards fusion model with confidence weighting and adaptive parameter adjustment as follows: ; In the formula, A fused prize value; a time step index for the censoring decision; for examining the completion function; scoring business benefits; as a function of time decay; the feedback signal is the business income; scoring model quality; rechecking and grading for quality inspectors; a manual intervention feedback signal; As a cost function; Respectively dynamic weights, and dynamically updating according to the corresponding confidence coefficient and the offset degree; The confidence level is calculated to satisfy the following relation: ; In the formula, The confidence value of the ith feedback signal channel at the time step t is obtained; is an activation function; A reference bias term calculated for the confidence level; is a variance suppression coefficient; as a feedback function variance Is a correlation enhancement coefficient; correlation coefficients of the signals and the examination passing rate are obtained; adjusting the coefficient for the time delay; An output value that is a time decay function; penalty coefficients for drift; is a distributed offset state.
6. The quality document self-adaptive inspection system according to claim 1, wherein in the reinforcement learning strategy optimization and online self-calibration module, an reinforcement learning method based on strategy gradient is adopted, an inspection strategy is executed in each round of inspection tasks, inspection strategy updating is performed according to a fusion rewarding value, and the long-term average rewarding maximization is realized by continuously adjusting inspection strategy parameters through interaction with quality document tasks; wherein, the audit policy optimization objective satisfies the following relationship: ; In the formula, A review strategy for execution in a round-robin review task; representing expected calculation of rewarding results corresponding to the examination state and the examination action under the examination strategy condition; Is the fused prize value in the round-robin task.
7. The quality document adaptive censoring system of claim 1 or 6 wherein in the reinforcement learning strategy optimization and online self-calibration module, when a decrease in revenue, a task distribution shift or a feedback conflict is detected, an online self-calibration mechanism for weight and parameter recalibration is triggered to determine stability and adaptivity in a multitasking scenario; Wherein the online self-calibration mechanism satisfies the following relationship: ; ; In the formula, Representing the weight latent variable update value corresponding to the ith feedback channel in the time step t+1; representing the current value of the weight latent variable corresponding to the ith feedback channel under the time step t; representing a weight update learning rate; representing a correlation coefficient between the ith feedback signal and the rate of pass of the inspection task; gradient signals representing the direction of optimization of the current censoring strategy; to represent weight decay coefficients; representing the normalized weight vector of each feedback channel at time step t+1; representing latent variables of weights of feedback channels A vector set formed by the vector sets; to represent temperature regulation parameters.
8. The quality document self-adaptive review system of claim 7 wherein the reinforcement learning strategy optimization and online self-calibration module updates historical rewards with a retrospective correction mechanism for late manual review feedback to achieve dynamic correction and rebalancing of asynchronous feedback; the trace back correction mechanism updates the historical rewards: ; In the formula, A fused prize value; Representing historical rewards retention coefficients; to represent a trace back correction coefficient; representing the validity weight of human censoring feedback in the current task; rechecking and grading for quality inspectors; representing a time decay function.
9. The quality document adaptive review system of claim 1 wherein the policy update and release module, the updated content comprises: Policy priority matrix for different quality document types; -examining node policy selection probability distributions; -weight parameter vectors in the reward function.
10. A quality document censoring method, characterized in that it is based on an adaptive censoring system according to claims 1-9, comprising the following processes: S1, inputting task/quality documents in an adaptive inspection system; s2, identifying the current quality document and matching with an inspection strategy; S3, judging whether multichannel feedback is needed according to the matching result of the examination strategy; If not, executing S41 and S51; If yes, executing S42, S52, S6 and S7; S41, directly performing an examination process on the current quality document; S51, ending the examination process and outputting an examination result; s42, generating feedback results of model evaluation, business income, manual inspection and cost evaluation by combining the current quality document and the matched inspection strategy; s52, uniformly calculating a multidimensional feedback result formed by multiple channels; S6, adaptively optimizing an examination strategy according to a calculation result, and distributing weight to each feedback channel through online self-calibration; s7, updating the optimized strategy parameters to an inspection strategy space in real time to guide a next round of document inspection tasks; a closed-loop reinforcement learning examination system of definition-execution-feedback-optimization-redefinition is formed.

Description

Quality document self-adaptive examination system and method Technical Field The invention relates to the technical field of document inspection, in particular to a quality document self-adaptive inspection system oriented to equipment manufacturing industry and energy industry and an inspection method based on the self-adaptive inspection system. Background In the mass management system of the large equipment manufacturing industry and the energy industries such as nuclear power, hydropower, wind power and the like, a large number of mass documents exist, including but not limited to mass inspection reports, material quality assurance books, production process records, welding material records, process evaluation reports, factory qualification certificates, quality control summaries, problem correction records and the like, and the large number of mass documents have the technical characteristics of complex format and numerous inspection standards. Moreover, based on the continuous optimization of the management system, the structure and the examination key points of the quality documents show the technical characteristics of dynamic evolution. The examination of these quality documents has long been in a manual examination mode, which requires professional quality inspectors to check against the standard one by one. Moreover, the rules and standards of these quality documents are complex and variable, different types of documents correspond to different standards (e.g. GB/T, NB/T, enterprise standard Q/DE), and there are minor but critical differences in the censoring rules, for example: "welding material quality assurance" requires a major check of chemical composition and mechanical properties; "parts inspection report" is more concerned with dimensional deviation and inspection method; "test record" emphasizes instrument number and test conditions; "quality assurance planning" focuses on liability signing and flow compliance. Therefore, the quality documents are manually inspected, and the obvious technical problems of large inspection workload, time-consuming inspection, low efficiency and the like are necessarily caused. Moreover, due to understanding differences of different people and different departments, the same type of quality documents can be judged as being 'qualified' or 'unqualified' differently due to inconsistent execution of the inspection standards, and traceability of a quality management system is affected. With the technical development of Natural Language Processing (NLP) and Large Language Model (LLM), various AI-assisted document inspection systems are widely applied, and the technical problems of the traditional manual operation mode are effectively solved. However, most of the current automatic document inspection systems are developed by adopting a fixed Rule base (Rule-based), inspection strategies cannot be automatically adjusted according to different document types and task characteristics, manual recoding logic is needed once the Rule is changed or the document types are newly added, and the automatic document inspection systems are mostly dependent on static accuracy indexes or manual sampling inspection as evaluation means, lack of a dynamic value feedback mechanism for model output, and inspection errors do not back feed the inspection strategies, so that the model stays in a suboptimal state for a long time. The intelligent degree of the current document automatic inspection system is poor, and the system is difficult to be effectively suitable for the automatic inspection technical requirements of service diversification, quality document structure and dynamic evolution of inspection points in the equipment manufacturing industry and the energy industry. Disclosure of Invention Aiming at the technical requirements of mass document examination under the service diversity of the equipment manufacturing industry and the energy industry and the particularity of dynamic evolution of the mass document structures and examination points, the invention provides a mass document self-adaptive examination system which is constructed based on reinforcement learning and a multi-channel feedback technology, can adaptively optimize examination strategies and has higher intelligent degree and is oriented to the equipment manufacturing industry/the energy industry, and an examination method based on the examination system. The technical aim of the invention is achieved by the following technical scheme, namely a quality document self-adaptive inspection system, which comprises: The workflow and rule policy configuration module is used for constructing an inspection policy space and providing a limited action set for reinforcement learning; an input module for inputting a quality document to be inspected; the task state sensing module is used for identifying the type, the structural characteristics and the task context of the current input quality document and generating a task embedde