CN-122021820-A - Log analysis method and system based on large language model and self-learning knowledge
Abstract
The embodiment of the invention provides a log analysis method and a system based on a large language model and self-learning knowledge, and relates to the technical field of log analysis. The analysis method comprises the steps of obtaining a new log to be analyzed, judging whether the new log can be matched with a template in a cache tree, obtaining the template matched with the new log under the condition that the new log can be matched with the template in the cache tree, analyzing the new log according to the matched template, outputting analysis results, triggering a large language model to analyze the new log under the condition that the new log cannot be matched with the template in the cache tree, generating the new template, storing the new template in the cache tree, and returning to the step of obtaining the new log to be analyzed. According to the method, under the condition that data and manual maintenance rules are not required to be marked, the efficiency of log analysis is improved, the high-efficiency and extensible analysis and analysis of various system logs are realized, and the effectiveness and the practicability of downstream tasks such as anomaly detection, fault diagnosis and the like are improved.
Inventors
- ZHANG YILEI
- TIAN WENXIN
Assignees
- 安徽师范大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251208
Claims (10)
- 1. A log analysis method based on a large language model and self-learning knowledge is characterized by comprising the following steps: acquiring a new log to be analyzed; judging whether the new log can be matched with a template in a cache tree or not; Under the condition that the new log can be matched with a template in a cache tree, acquiring the template matched with the new log; Analyzing the new log according to the matched template, and outputting an analysis result; triggering a large language model to analyze the new log to generate a new template under the condition that the new log cannot be matched with the template in the cache tree; And storing the new template into the cache tree, and returning to the step of acquiring a new log to be analyzed.
- 2. The parsing method of claim 1, wherein determining whether the new log can be matched to a template in a cache tree comprises: judging whether subtrees with the same token length exist in the cache tree according to the token length of the new log; Judging whether a sub-node consistent with the first K static tokens of the new log exists in the sub-tree or not under the condition that the sub-tree with the same token length exists; under the condition that a child node consistent with the first K static tokens of the new log exists, acquiring Jaccard similarity of the new log and all templates in the child node; according to the Jaccard similarity, a candidate template with the Jaccard similarity being more than or equal to a preset threshold value is obtained; and judging whether templates in the candidate templates can be matched with the new log.
- 3. The parsing method of claim 1, wherein triggering a large language model to parse the new log to generate a new template comprises: Judging whether a knowledge base exists in the large language model; checking whether the new log can be matched with the existing templates in the knowledge base or not under the condition that the knowledge base exists in the large language model; Under the condition of matching, multiplexing the corresponding templates and the definition of parameters to analyze the new log; under the condition that a knowledge base does not exist in the large language model or the large language model cannot be matched, extracting fixed texts and dynamic contents in the new log according to prompt words; And taking the fixed text as a template, taking the dynamic content as a parameter, and generating a new template.
- 4. The parsing method of claim 3, wherein the hint words include: analyzing the new log into a structured format containing templates and parameters; Defining a template as a fixed text bearing the core event semantics, including stable marks and consistent keywords; defining parameters as dynamic content having a specific format and specific instance information; Setting a guiding principle for identifying the template and the parameters; examples are provided for a large language model according to the guidelines to provide the desired input and output formats.
- 5. The parsing method of claim 4, wherein the knowledge base includes a template library, a parameter dictionary, domain metadata, and parsing rules.
- 6. The parsing method according to claim 5, wherein the template library includes each template generated in the parsing process, each template is set with a unique template ID, metadata of each template includes a source system, a number of occurrences, a first occurrence time and a last occurrence time, and a variant of the template is associated with a variant template ID.
- 7. The parsing method of claim 5, wherein the parameter dictionary includes each of the parameters identified in the structured format, and wherein each of the entries of the parameter dictionary includes a format rule, a value instance, and an associated template ID.
- 8. The parsing method of claim 5, wherein triggering a large language model to parse the new log generates a new template, further comprising: matching the new log with templates in a template library, and multiplexing the verified templates and parameters; Analyzing the new log which is not matched in the template library according to analysis rules and a parameter dictionary; and analyzing the special marks and the business terms in the new log according to the field metadata.
- 9. The method according to claim 5, the method is characterized by further comprising the following steps: After the analysis of the new log is completed, storing the generated new template into the template library, and updating the entry of the template in the template library; Adding the parameters and supplementary examples parsed in the new log to the parameter dictionary; and optimizing the analysis rule.
- 10. A log parsing system based on a large language model and self-learning knowledge, characterized in that the parsing system comprises a processor for executing the parsing method according to any of claims 1 to 9.
Description
Log analysis method and system based on large language model and self-learning knowledge Technical Field The invention relates to the technical field of log analysis, in particular to a log analysis method and a system based on a large language model and self-learning knowledge. Background In modern software systems, journals are structured or semi-structured text records generated during system operation that capture key information such as execution flow, error information, state changes, and user operations. The log plays an important role in system monitoring, fault detection and performance optimization. For example, in a cloud computing platform, journaling may help engineers track the root cause of service disruption, and in an e-commerce system journaling user behavior paths to support business decisions. The value of the log is not the number itself, but whether or not actionable information can be extracted by efficient parsing, and thus, log parsing has become a core technology in system maintenance and management. Journal parsing refers to the conversion of unstructured or semi-structured journal text into structured semantic representations with the goal of identifying the journal templates and corresponding parameters. The traditional log parsing method has obvious limitations in practical application. Grammar-based parsing methods rely on manually designed rules to extract structured information. When the log features deviate from predefined rules (e.g., log format changes due to system upgrades), resolution accuracy can drop dramatically, thus requiring continued manual rule maintenance. To alleviate this problem, deep learning methods began to emerge that attempted to improve resolution accuracy through semantic understanding. However, such methods typically rely on large amounts of manually annotated data for multiple rounds of training. Log annotation is time consuming and laborious, especially in dynamic system environments where log formats change frequently, it is more difficult to maintain an updated annotation dataset in real-time. With the rapid development of Large Language Models (LLMs), recent research has begun exploring the application of LLMs in log parsing to enhance parsing performance with its powerful semantic understanding capabilities. Existing LLM-based methods are largely divided into two categories, context learning (ICL) -based methods and unsupervised clustering-based methods. For example, using ICL, 5 annotation examples need to be provided for each log to guide parsing. Although this approach can improve accuracy, it introduces a lot of overhead, and the parsing effect is highly dependent on example quality. On the other hand, the clustering-based method is characterized in that the fixed part of the template and the variable part of the parameter are distinguished by firstly clustering the logs and then analyzing the commonalities and differences in the clusters by using the LLM. However, such methods only recognize intra-cluster patterns by means of the implicit semantic capabilities of LLM, and cannot utilize domain knowledge from other clusters or historical parsing results, thus making it difficult to further improve accuracy. Disclosure of Invention The embodiment of the invention aims to provide a log analysis method and a system based on a large language model and self-learning knowledge. The method and the system can realize zero sample generalization without depending on labeling data and manual rule maintenance, and gradually improve analysis performance and efficiency through continuously accumulating knowledge and multiplexing templates. In order to achieve the above object, an embodiment of the present invention provides a log parsing method based on a large language model and self-learning knowledge, including: acquiring a new log to be analyzed; judging whether the new log can be matched with a template in a cache tree or not; Under the condition that the new log can be matched with a template in a cache tree, acquiring the template matched with the new log; Analyzing the new log according to the matched template, and outputting an analysis result; triggering a large language model to analyze the new log to generate a new template under the condition that the new log cannot be matched with the template in the cache tree; And storing the new template into the cache tree, and returning to the step of acquiring a new log to be analyzed. Optionally, determining whether the new log can be matched to a template in a cache tree includes: judging whether subtrees with the same token length exist in the cache tree according to the token length of the new log; Judging whether a sub-node consistent with the first K static tokens of the new log exists in the sub-tree or not under the condition that the sub-tree with the same token length exists; under the condition that a child node consistent with the first K static tokens of the new log exists, acquiring Jac