CN-121996513-A - Micro-service log analysis method based on double-similarity retrieval and self-adaptive reasoning

CN121996513ACN 121996513 ACN121996513 ACN 121996513ACN-121996513-A

Abstract

The invention discloses a micro-service log analysis method based on double-similarity retrieval and self-adaptive reasoning, relates to the technical field of computers, and aims to solve the analysis problem caused by the fact that micro-service logs are various in semantics and unstable in structure. The method comprises the steps of preprocessing and grouping original logs, multiplexing the existing templates through template cache matching, selecting a differential log sample based on an editing distance when the original logs are not hit, selecting a reference example through semantic-structure double similarity retrieval, constructing a structured prompt word containing modules such as task description, reasoning mode selection, thinking chain reasoning and the like, guiding a large language model to adaptively reasoning and extracting the templates, and updating the cache. Experiments show that the method is superior to the existing main stream method in indexes such as grouping accuracy, analysis accuracy and the like, can adapt to micro-service complex scenes, and improves the accuracy and stability of log analysis.

Inventors

ZHANG XIUGUO
ZHAO YIFAN
ZHANG SEN
CAO ZHIYING

Assignees

大连海事大学

Dates

Publication Date: 20260508
Application Date: 20260211

Claims (7)

1. A micro-service log analysis method based on double similarity retrieval and self-adaptive reasoning is characterized by comprising the following steps: S1, acquiring log data to be analyzed, preprocessing the log data to be analyzed, and extracting log content corresponding to the log data to be analyzed; S2, representing the log content as a log content sequence, processing the log content sequence based on an N-gram method, screening constant words in the log content sequence to construct log group structure identifiers, and clustering logs with the same log group structure identifiers into a group; S3, performing template cache matching in a template cache library based on the log group structure identification, wherein the template cache library is used for storing and searching the parsed log template, if the log cache is hit, the corresponding template is directly multiplexed to complete log analysis, and if the log cache is missed, the S4 is executed; s4, taking logs with obvious differences in the log group based on editing distances among log contents as input samples; S5, taking a first log in an input sample as a query log, respectively calculating the comprehensive similarity between the query log and each log in a template example library, and selecting three logs with the highest comprehensive similarity with the input sample as reference examples to write prompt words, wherein the template example library is used for storing marked example structures, and the example structures comprise Id, log content and corresponding templates thereof; s6, constructing a structured promt according to a reference example, wherein the promt comprises task description, example reference, reasoning mode selection, thinking chain reasoning and input and output instructions; S7, calling the LLM model, and extracting a log template based on the structured promt.
2. The method for resolving a micro service log based on double similarity retrieval and adaptive reasoning according to claim 1, wherein the log content comprises event description, state change or parameter information.
3. The micro service log parsing method based on double similarity retrieval and self-adaptive reasoning according to claim 1, wherein taking log samples with significant differences as input is based on editing distance between log contents in a log group, comprising: s401, carrying out de-duplication on all log contents in a log group, if the number of samples is insufficient after de-duplication, directly returning the rest samples to be used as input samples, otherwise, executing S402; s402, performing pairwise editing distance calculation on the log subjected to duplication removal, and sequencing the logs from large to small according to the editing distance; S403, selecting the log pair with the largest editing distance to add the candidate set, then calculating the editing distance between the rest logs and the candidate set, and selecting the log with the largest editing distance and larger than the preset minimum editing distance threshold to add the candidate set until the required number of requirements are met.
4. A micro service log parsing method based on double similarity retrieval and adaptive reasoning according to claim 3, wherein the minimum edit distance threshold is set to 10 characters.
5. The micro service log parsing method based on double similarity retrieval and self-adaptive reasoning according to claim 1, wherein the comprehensive similarity is calculated according to a weighted sum of semantic similarity calculated by cosine similarity of embedded vectors and structural similarity calculated by Jaccard coefficients.
6. The method for micro-service log parsing based on double similarity retrieval and adaptive reasoning according to claim 5, wherein an example with highest structural similarity to the query object is excluded in the ranking process.
7. The method for resolving a micro service log based on double similarity retrieval and adaptive reasoning according to claim 1, wherein the method further comprises: S8, writing the extracted log template into a template cache library.

Description

Micro-service log analysis method based on double-similarity retrieval and self-adaptive reasoning Technical Field The invention relates to the technical field of computers, in particular to a micro-service log analysis method based on double-similarity retrieval and self-adaptive reasoning. Background In the intelligent operation and maintenance scene of the micro-service system, a large number of service instances, containers and nodes which are independently deployed run in parallel, and different services generate logs according to respective development specifications and technical stacks, so that the intelligent operation and maintenance scene has the characteristics of huge quantity, various formats, non-uniform fields, similar structures, obvious variable differences and the like. The same business event is often recorded separately in multiple services, forming a cross-component, multi-source log segment link. Meanwhile, the log has low acquisition cost, wide coverage range, contains key information such as abnormal states, calling parameters, running environments and the like, and becomes one of core data for supporting abnormal detection, root cause positioning and fault diagnosis. However, the micro-service log generally has semi-structured and structural drift characteristics (such as new fields introduced by different versions and different service teams or changing output formats), so that the traditional parsing method relying on manual rules or fixed templates is difficult to stably work for a long time in a high-concurrency and multi-version evolution micro-service environment, and the manual mode is more difficult to deal with massive log data. For this reason, researchers have proposed various automated log parsing methods, the core of which is to identify variables in the log and extract generic templates. The current mainstream log parsing method can be mainly divided into an unsupervised method and a supervised method. The non-supervision method is widely applied to scenes without manual annotation, and template extraction is mainly realized based on heuristic rules or statistical features. More typical of heuristic-based methods are the Drain method and the Spell method. The Drain method performs log hierarchical classification through a rule tree structure with fixed depth based on the assumption that the log head word is constant. However, in the micro-service system, a large number of logs start with dynamic parameters, URLs, request IDs, etc., and this assumption is often no longer true, resulting in deviation in the initial branching stage, which affects the overall resolution accuracy. Spell method extracts template constant through longest public subsequence matching, can adapt to logs with different lengths, but is easily interfered by complex paths, parameter strings, punctuation marks and other characters in a micro-service environment, so that variable identification is inaccurate. The Logram method is an N-gram model based on statistical characteristics, template extraction is realized by identifying high-frequency structure fragments, and the method is highly dependent on preset rules and frequency thresholds although the method is better in the data set with regular structure and single system, and is difficult to adapt to a new mode in time when facing logs with complex semantics and frequent structure evolution in a micro-service scene. In general, although the unsupervised log analysis method does not need manual labeling and is convenient to realize, the unsupervised log analysis method is extremely sensitive to dynamic changes of log structures and semantics in a micro-service environment due to the fact that manual rules and frequency settings are excessively depended, and is difficult to adapt to complex scenes of cross-component and cross-version. The supervised method trains or fine-tunes the model by introducing manual annotation data to realize automatic extraction of the log template. For example, the LogPPT method directs the model to generate a log template that meets expectations using only a small amount of annotation data based on a pre-trained language model and prompt learning techniques. The method can improve the template extraction accuracy in a part of stable scenes, but in a micro-service system, the annotation data needs to be continuously supplemented and maintained for different services and different versions, the training and maintenance cost is high, the model is sensitive to log structure change, and the stability performance is difficult to maintain in environments with various semantics and rapid version iteration. Such methods typically instruct the model to automatically identify variables from the input log and extract templates by designing Prompt words (promts) and a few log template examples. For example, the DivLog method adopts a few sample prompting strategy to enable the model to master the template extraction rule through cont