CN-122019238-A - SRT-RL two-stage fine tuning method for log fault diagnosis
Abstract
The invention relates to a SRT-RL two-stage fine tuning method for log fault diagnosis, which comprises the steps of obtaining a system log, marking an operation and maintenance expert on the system log, constructing a reverse verification thinking chain data set containing a golden answer, carrying out instruction fine tuning on a pre-trained large model to obtain a preliminary model capable of outputting a fault diagnosis reasoning chain, independently inputting an unlabeled error report log into the preliminary model for a plurality of times to obtain a plurality of candidate diagnosis results of each log, carrying out multidimensional scoring on the plurality of candidate diagnosis results of each log by using the scoring model, selecting the highest-scoring and lowest-scoring generated results, forming an acceptance result and a rejection result by using the score and the answer, generating a new data set, carrying out V-DPO fine tuning by using the new data set, forming a fault diagnosis model, and carrying out fault diagnosis on the system log in real time to obtain the fault diagnosis results. The invention improves the reasoning capability in each round of fine tuning, and provides a new idea for fault diagnosis of a complex system.
Inventors
- ZHANG HONGBIN
- LING CHEN
- ZHEN ZHICHAO
- JIA RUIJUN
- MI LIZHU
- Yang Muhao
Assignees
- 中国人民解放军军事科学院军事科学信息研究中心
Dates
- Publication Date
- 20260512
- Application Date
- 20260211
Claims (8)
- 1. The SRT-RL two-stage fine tuning method for log fault diagnosis is characterized by comprising the following steps of: acquiring a system log, carrying out operation and maintenance expert labeling on the system log, and constructing a reverse verification thinking chain data set containing a golden answer; Performing instruction fine adjustment on the pre-trained large model by utilizing the reverse verification thinking chain data set to obtain a preliminary model capable of outputting a fault diagnosis reasoning chain, and independently inputting unlabeled error report logs into the preliminary model for a plurality of times to obtain a plurality of candidate diagnosis results of each log; and carrying out multidimensional scoring on a plurality of candidate diagnosis results of each log by utilizing a scoring model, selecting the highest-scoring and lowest-scoring generated results, forming an acceptance result and a rejection result by the scores and the answers, generating a new data set, and carrying out V-DPO fine tuning by utilizing the new data set to form a fault diagnosis model for carrying out fault diagnosis on the real-time system logs so as to obtain a fault diagnosis result.
- 2. The log fault diagnosis oriented SRT-RL two-level fine tuning method of claim 1, wherein the scoring dimensions in the multi-dimensional scoring comprise structural compliance, log consistency, length redundancy, reasoning rationality, and evidence-reasoning fit; the structure compliance is used for calculating cosine similarity of the reasoning steps in each candidate diagnosis result, and the cosine similarity is used for evaluating the matching degree of the reasoning steps; The log consistency is used for scoring by adopting the text overlapping rate and quantifying the log consistency; The length redundancy is used for ensuring that the content of the candidate diagnosis results is concise, and punishing the redundant information quantity when the information quantity of the candidate diagnosis results is higher than an expected value; The reasoning reasonability is used for evaluating whether the reasoning generated by the model is in accordance with logic; the evidence-reasoning fit is used for carrying out consistency scoring on the evidence obtaining and verifying process of the preliminary model.
- 3. The method for two-stage fine tuning of SRT-RL for log-fault diagnosis according to claim 2, wherein evaluating the degree of matching of the reasoning step using the cosine similarity comprises: ; ; Wherein, the For structural compliance scoring, i.e. taking the mean of the scores Match for five steps, For a single node score in the thought chain, For the vector representation of the ith step in the thought chain in the reasoning result, For the i-th step vector representation of the predetermined structure, And Respectively corresponding to the mould lengths.
- 4. The log fault diagnosis oriented SRT-RL two-level fine tuning method of claim 2, wherein scoring with the text overlap rate comprises: ; Wherein, the The text overlap rate is scored as such, For the input of the model, i.e. the error log information, And (5) reasoning results for the model.
- 5. The log fault diagnosis oriented SRT-RL two-level fine tuning method of claim 2, wherein penalizing the redundant information amount comprises: ; Wherein, the For length redundancy, i.e. redundancy information amount penalty, In order to penalize the coefficients, As an information amount of the candidate diagnosis result, Is the expected value.
- 6. The log fault diagnosis oriented SRT-RL two-level trimming method of claim 1, wherein implementing V-DPO trimming with the new data set comprises: performing V-DPO fine tuning by using the new data set, and fine tuning the preliminary model by using a V-DPO loss function in the process of performing V-DPO fine tuning; the V-DPO loss function includes: ; Wherein, the As a function of the loss of V-DPO, Optimizing the model's reasoning results for standard DPO terms, i.e. by comparing high-quality answers And inferior answer The probability difference between them trains the model, so that the model tends to generate a premium answer, The quality difference of the good and bad answers is measured as a scoring function, In order to verify the signal item, For the optimal strategy in the conventional RLHF paradigm, 、 As a reference model Is provided with an input and an output of the (c), To verify the degree of contribution of the signal weights to the loss function by the quality score differences output by the scoring model, the driving model is more concerned with the absolute improvement of the inference quality, For negative expectations over the entire dataset, for minimizing the prediction error of the model, For Sigmoid function, the implicit rewards difference value calculated by the model is converted into Is superior to Is a function of the probability of (1), And the KL penalty coefficient is used for controlling the deviation degree of the strategy model from the reference model, so that the training stability is ensured.
- 7. The log fault diagnosis oriented SRT-RL two-level fine tuning method of claim 6, wherein the optimization strategy in the legacy RLHF paradigm comprises: ; Wherein, the As a function of the distribution of the components, Is a reward model.
- 8. The log fault diagnosis oriented SRT-RL two-level fine tuning method of claim 7, wherein the reward model comprises: ; Wherein, the In order to verify the contribution degree of the quality score difference output by the scoring model to the loss function, the driving model focuses more on the absolute improvement of the reasoning quality.
Description
SRT-RL two-stage fine tuning method for log fault diagnosis Technical Field The invention relates to the technical field of system fault diagnosis, in particular to a SRT-RL two-stage fine tuning method for log fault diagnosis. Background With the continuous development of information technology, operation and maintenance are gradually transformed from a traditional manual operation mode to a more intelligent and automatic direction. Among other things, artificial intelligence operation (ARTIFICIAL INTELLIGENCE for IT Operations, AIOps) is an important component of this transformation, helping businesses to automatically detect, analyze, and respond to various problems in IT infrastructure by means of machine learning and big data techniques. In AIOps, the fault diagnosis can identify the fault type, locate the problem, and provide effective repair advice by automated means. The traditional fault diagnosis early stage relies on expert experience and rules to construct a relational knowledge base, and is combined with machine learning or deep learning to realize fault classification and root cause positioning automation, for example, bansal et al construct a knowledge graph from log data by using logic and domain knowledge, and perform fault diagnosis and classification on line. In recent years, deep learning and natural language processing (Natural Language Processing, NLP) technology has rapidly evolved. Pre-trained large language models (Large Language Model, LLM) have powerful language understanding and pattern recognition capabilities by fine-tuning on large-scale datasets (e.g., GPT-4, LLaMA, qwen, etc.). Studies have shown that by fine tuning these pre-trained large models, the models can be further learned on specific fault log data, thereby identifying complex fault patterns and providing an effective solution strategy. In the large model age, the large model can analyze the error report log and diagnose like an operation and maintenance engineer by means of natural language processing, a graphic neural network, deep learning and other technologies. The application scenario of the large model at AIOps falls into two categories, finding the root cause of the event and giving a solution to the event. Since the solution of events requires high flexibility, this is consistent with the reasoning capabilities of large models. The current methods are summarized into two types, namely a method based on fine tuning and a method based on prompt words, and although a large model is excellent in multi-task performance, fine tuning depends on a large amount of high-quality labeling data and is difficult to acquire. Disclosure of Invention In order to solve the problems in the prior art, the invention aims to provide a two-stage SRT-RL fine tuning method for log fault diagnosis, and provides a Self-checking reasoning thinking chain reinforcement learning (Self-REFLECTIVE REASONING CHAIN OF THOUGHT-Reinforcement Learning, SRT-RL) method, which realizes Self-checking reasoning and sample filling through a local reverse Self-checking thinking chain and a Self-scoring model. In order to ensure the accuracy of model reasoning, a thinking chain (Chain of Thought, coT) method is used for helping a large model to gradually reason, so that the accuracy of the model is remarkably improved. The SRT-RL approach introduces a mechanism for local self-verification in the CoT framework. In the forward reasoning stage, the model generates a preliminary diagnosis result, and backtracking of the problem is performed based on the hypothesized error log, so that the accuracy of the diagnosis result is ensured. Subsequently, the large model generates repair suggestions according to the backtracking results. In addition, in order to cope with the problem of scarcity of high-quality annotation data, a scoring model is introduced. The model designs a scoring standard with five dimensions, so that the large model can score unlabeled data. Reinforcement learning has been shown to enhance the performance of large models, and preference data generated using scoring results as a standard may be used to fine-tune the model by Value-driven direct preference Optimization (V-DPO). In order to achieve the above object, the present invention provides the following solutions: a SRT-RL two-stage fine tuning method for log fault diagnosis comprises the following steps: acquiring a system log, carrying out operation and maintenance expert labeling on the system log, and constructing a reverse verification thinking chain data set containing a golden answer; Performing instruction fine adjustment on the pre-trained large model by utilizing the reverse verification thinking chain data set to obtain a preliminary model capable of outputting a fault diagnosis reasoning chain, and independently inputting unlabeled error report logs into the preliminary model for a plurality of times to obtain a plurality of candidate diagnosis results of each log; and carryi