CN-122019220-A - OS running log processing method, system, terminal and storage medium based on AI large model

CN122019220ACN 122019220 ACN122019220 ACN 122019220ACN-122019220-A

Abstract

The invention discloses an OS running log processing method, system, terminal and storage medium based on an AI large model, wherein the method comprises the steps of collecting system logs and service logs of a plurality of target devices in an OS cluster, uploading the system logs and service logs of the OS running to a cloud side server through a log search engine, marking content meanings of each log according to the system logs and the service logs, reasoning faults which occur in combination with a trained fault analysis model, positioning fault reasons to obtain a fault analysis positioning report, notifying relevant maintainers according to fault types and fault reasons in the fault analysis positioning report, and automatically collecting feedback information of the relevant maintainers. According to the invention, the operation log of the OS is processed by using the AI large model, and an intelligent operation and maintenance system based on the AI large model is provided, so that the input requirement on manpower in the whole operation and maintenance process is greatly reduced, and meanwhile, the technical threshold of manual participation is also greatly reduced.

Inventors

CHENG FEI
ZHANG LIANGLIANG

Assignees

深圳开鸿数字产业发展有限公司

Dates

Publication Date: 20260512
Application Date: 20251210

Claims (20)

1. The OS running log processing method based on the AI large model is characterized by comprising the following steps of: the method comprises the steps that a log search engine receives an instruction of a natural language analysis module based on an AI large model, collects system logs and service logs of a plurality of target devices in an OS cluster, and uploads the system logs and service logs operated by the OS to a cloud side server through the log search engine; The natural language analysis module based on the AI large model marks the content meaning of each section of log according to the system log and the service log, and the trained fault analysis model is combined to infer the fault which has occurred, locate the fault cause, obtain a fault analysis and location report and send the fault analysis and location report to the fault reasoning diagnosis module based on the AI large model; The fault reasoning diagnosis module based on the AI large model informs relevant maintainers according to the fault type and the fault reason in the fault analysis positioning report, automatically collects feedback information of the relevant maintainers, and inputs the feedback information to the natural language analysis module based on the AI large model.
2. The AI-large-model-based OS running log processing method of claim 1, wherein the log search engine receives an instruction of the AI-large-model-based natural language analysis module, collects system logs and service logs of a plurality of target devices in the OS cluster, and uploads the system logs and service logs of the OS running to the cloud-side server through the log search engine, and further comprises: Each target device in the OS cluster is provided with a unique SN in advance, and the system log and the service log corresponding to each target device are marked with the SN.
3. The AI large model based OS running log processing method as set forth in claim 1 wherein the system log and the service log each comprise a header and a file content, the header comprising an SN number, a timestamp of the occurrence of the log, a process number of the service, a thread number of the service, a log level, a key code identification, and a log content.
4. The AI-large-model-based OS running log processing method of claim 1 or claim 3, wherein the log search engine receives an instruction of the AI-large-model-based natural language analysis module, collects system logs and service logs of a plurality of target devices in an OS cluster, and uploads the system logs and service logs of OS running to a cloud-side server through the log search engine, and specifically comprises: the method comprises the steps that a log search engine receives a log acquisition instruction of a natural language analysis module based on an AI large model in a cloud side server, analyzes the log acquisition instruction, and determines a target equipment identifier to be acquired, a log type, a time range and filtering keywords; the log search engine initiates a log acquisition request to corresponding target equipment in the OS cluster according to the analysis result, and receives the system log and the service log from each target equipment in real time in a streaming manner or pulls the system log and the service log in batches according to a preset period; the log search engine performs standardized preprocessing on the collected system logs and the collected service logs, wherein the standardized preprocessing at least comprises uniform format, timestamp synchronization and invalid data filtering; And the log search engine compresses and encrypts the preprocessed system log and the service log, and uploads the compressed and encrypted system log and the compressed and encrypted service log to the natural language analysis module based on the AI large model of the cloud side server through a secure transmission protocol.
5. The AI-large-model-based OS running log processing method of claim 1, wherein the fault analysis localization report includes a fault type, a fault cause, an occurrence device, and a judgment basis.
6. The AI-large-model-based OS running log processing method of claim 5, wherein the AI-large-model-based natural language analysis module marks the content meaning of each log according to the system log and the service log, and infers the fault which has occurred in combination with the trained fault analysis model, locates the cause of the fault, obtains a fault analysis location report and sends the fault analysis location report to the AI-large-model-based fault inference diagnosis module, and specifically comprises: The natural language analysis module based on the AI large model inputs the system log and the business log into a trained fault analysis model, the fault analysis model is utilized to understand the semantics of log text, and one or more labels describing the meaning of content are automatically generated for each log; The natural language analysis module based on the AI large model aggregates the log sequence marked with the content meaning according to time sequence and relevance to form an event chain, matches the event chain with fault modes in a knowledge base, carries out causal reasoning on the successfully matched event chain based on the fault analysis model, identifies the log event of the root cause, and outputs a diagnosis conclusion comprising the fault type, the fault cause, the generating equipment and the judgment basis; And the natural language analysis module based on the AI large model integrates the diagnosis conclusion, the related key log fragments, the content meaning labels and the event chain visual information to generate a structured fault analysis positioning report, and pushes the fault analysis positioning report to the fault reasoning diagnosis module based on the AI large model through a message queue or a remote procedure call interface.
7. The AI large model based OS running log processing method as set forth in claim 6 wherein the tag comprises at least one or more of a log level, a belonging service module, an operation type, and an event type.
8. The AI-large-model-based OS running log processing method of claim 6, wherein the AI-large-model-based fault inference diagnosis module notifies related maintainers according to the fault type and the fault cause in the fault analysis positioning report, automatically collects feedback information of the related maintainers, and inputs the feedback information to the AI-large-model-based natural language analysis module, and specifically comprises: The fault reasoning diagnosis module based on the AI large model analyzes the fault analysis positioning report, extracts the fault type and the fault reason in the fault analysis positioning report, determines relevant maintainers needing to be notified according to preset rules, and sends notification messages to the relevant maintainers through mails, instant messaging tools or operation and maintenance alarm platforms, wherein the messages at least comprise the fault type, the fault reason and report links; The fault reasoning diagnosis module based on the AI large model provides an interactive feedback interface for the relevant maintainers while sending the notification, and collects confirmation information, supplementary explanation, treatment measure record or correction opinion of analysis results of the relevant maintainers through the interactive feedback interface; The AI-based large model fault reasoning diagnosis module carries out structural processing on the collected feedback information, associates the processed feedback information with a corresponding original fault analysis positioning report, stores the feedback information as training data with labels, and regularly carries out incremental training or fine adjustment on the AI-based large model natural language analysis module and/or the trained fault analysis model by utilizing the stored feedback data so as to optimize analysis reasoning capacity.
9. The AI-large-model-based OS running log processing method of claim 1, wherein the AI-large-model-based fault reasoning diagnosis module notifies a relevant maintainer of a fault type and a fault cause in the positioning report according to the fault analysis, further comprising before: And the fault reasoning diagnosis module based on the AI large model judges whether related maintainers need to be notified according to the fault analysis positioning report.
10. The AI-large-model-based OS running log processing method of claim 9, wherein the AI-large-model-based fault inference diagnosis module determines whether a relevant maintainer needs to be notified according to the fault analysis localization report, and further comprises: If the AI-based large model fault reasoning diagnosis module judges that the related maintainer does not need to be notified according to the fault analysis positioning report, the AI-based large model fault reasoning diagnosis module directly feeds back the AI-based large model natural language analysis module to carry out autonomous maintenance.
11. The AI-large-model-based OS running log processing method of claim 1, wherein uploading the system log and the service log of the OS running to the cloud-side server by the log search engine further comprises: On the edge side or local to the equipment, performing preliminary analysis and filtering on the system log and the service log by using a lightweight AI model; Only when a potential anomaly is identified or a specific trigger condition is met, relevant key logs and contexts are uploaded to the cloud side server, otherwise, archiving or discarding is carried out locally.
12. The AI-large-model-based OS running log processing method of claim 1, further comprising: When the fault reason positioned by the fault reasoning diagnosis module based on the AI large model belongs to a type which is known and allows automatic repair, generating an automatic repair script; And after the authorization is obtained, issuing and executing the automatic repair script to the target equipment through the log search engine.
13. The AI-large-model-based OS running log processing method of claim 1, wherein the target devices in the OS cluster comprise connectable cloud devices and non-directly connectable cloud devices.
14. The AI-large-model-based OS running log processing method of claim 13, further comprising: When the target equipment is connectable cloud equipment, directly transmitting a system log and a service log of the connectable cloud equipment to the cloud side server through the log search engine; When the target equipment is the non-direct-connectable cloud equipment, the system log and the service log of the non-direct-connectable cloud equipment are firstly transmitted to the connectable cloud equipment through the log collecting engine, and then transmitted to the cloud side server through the log collecting engine.
15. An OS running log processing system based on an AI large model is characterized in that, the OS running log processing system based on the AI large model comprises: A log search engine for receiving the instruction of the natural language analysis module based on the AI large model, collecting the system logs and service logs of a plurality of target devices in the OS cluster, uploading system logs and service logs operated by the OS to a cloud side server through a log search engine; The natural language analysis module based on the AI large model is used for marking the content meaning of each section of log according to the system log and the service log, and reasoning the fault which has occurred by combining the trained fault analysis model, positioning the fault cause, obtaining a fault analysis positioning report and sending the fault analysis positioning report to the fault reasoning diagnosis module based on the AI large model; And the fault reasoning diagnosis module based on the AI large model is used for notifying relevant maintainers according to the fault type and the fault reason in the fault analysis positioning report, automatically collecting feedback information of the relevant maintainers and inputting the feedback information into the natural language analysis module based on the AI large model.
16. The AI large model-based OS execution log processing system of claim 15, wherein the log search engine comprises: the instruction receiving unit is used for receiving a log acquisition instruction of a natural language analysis module based on an AI large model in the cloud side server, analyzing the log acquisition instruction and determining a target equipment identifier to be acquired, a log type, a time range and filtering keywords; The log acquisition unit is used for initiating a log acquisition request to corresponding target equipment in the OS cluster according to the analysis result, and receiving the system log and the service log from each target equipment in real time in a streaming manner or pulling the system log and the service log in batches according to a preset period; The log preprocessing unit is used for carrying out standardized preprocessing on the collected system log and the service log, and the standardized preprocessing at least comprises uniform format, timestamp synchronization and invalid data filtering; and the log transmission unit is used for compressing and encrypting the preprocessed system log and the service log, and uploading the compressed and encrypted system log and the compressed and encrypted service log to the natural language analysis module based on the AI large model of the cloud side server through a secure transmission protocol.
17. The AI large model-based OS execution log processing system of claim 15, wherein the AI large model-based natural language analysis module comprises: The log analysis unit is used for inputting the system log and the service log into a trained fault analysis model, understanding the semantics of log text by using the fault analysis model, and automatically generating one or more labels describing the meanings of the content for each log; The reason matching unit is used for aggregating the log sequence marked with the meaning of the content according to time sequence and relevance to form an event chain, matching the event chain with a fault mode in a knowledge base, carrying out causal reasoning on the event chain successfully matched based on the fault analysis model, identifying a log event of a root cause, and outputting a diagnosis conclusion comprising a fault type, a fault cause, occurrence equipment and a judgment basis; and the report generating unit is used for integrating the diagnosis conclusion, the related key log fragments, the content meaning labels and the event chain visual information to generate a structured fault analysis positioning report, and pushing the fault analysis positioning report to a fault reasoning diagnosis module based on an AI large model through a message queue or a remote procedure call interface.
18. The AI large model-based OS execution log processing system of claim 15, wherein the AI large model-based fault reasoning diagnostic module comprises: The maintenance notification unit is used for analyzing the fault analysis positioning report, extracting the fault type and the fault reason in the fault analysis positioning report, determining relevant maintenance persons needing to be notified according to preset rules, and sending notification messages to the relevant maintenance persons through mails, instant messaging tools or operation and maintenance warning platforms, wherein the messages at least comprise the fault type, the fault reason and the report link; The feedback receiving unit is used for providing an interactive feedback interface for the relevant maintainers while sending the notification, and collecting confirmation information, supplementary explanation, treatment measure record or correction opinion of analysis results of the relevant maintainers through the interactive feedback interface; The optimization learning unit is used for carrying out structural processing on the collected feedback information, correlating the processed feedback information with a corresponding original fault analysis positioning report, storing the feedback information as training data with labels, and carrying out incremental training or fine adjustment on the natural language analysis module based on the AI large model and/or the trained fault analysis model by periodically utilizing the stored feedback data so as to optimize analysis reasoning capacity.
19. A terminal comprising a memory, a processor, and an AI-large-model-based OS execution log handler stored on the memory and executable on the processor, the AI-large-model-based OS execution log handler, when executed by the processor, implementing the AI-large-model-based OS execution log processing method of any one of claims 1-14.
20. A computer-readable storage medium storing an AI large model-based OS execution log processing program which, when executed by a processor, implements the steps of the AI large model-based OS execution log processing method according to any one of claims 1 to 14.

Description

OS running log processing method, system, terminal and storage medium based on AI large model Technical Field The invention relates to the technical field of computers, in particular to an OS running log processing method, system, terminal and computer readable storage medium based on an AI large model. Background For fault analysis and maintenance (recovery operation) in the operation process of the OS (operation process of the operating system), the existing operation and maintenance system comprises the following key steps of collecting system operation logs (manual or automatic), analyzing the logs by using a certain automatic means, and combining the log content and the service logic of the OS, and carrying out accurate positioning and fault recovery manually. Currently, the greatest disadvantage of operation and maintenance systems is the need for manual deep participation, whether to locate faults or to restore the system to normal operation. Moreover, the operation and maintenance experience is deposited on specific personnel, so that the operation and maintenance becomes a high-technical-threshold human requirement, and great uncertainty is caused to high-quality operation and maintenance along with personnel change. Accordingly, the prior art is still in need of improvement and development. Disclosure of Invention The invention mainly aims to provide an OS running log processing method, an OS running log processing system, an OS running log processing terminal and a computer readable storage medium based on an AI large model, and aims to solve the problem that in the prior art, when an operation and maintenance system is used for locating faults and enabling the system to recover normal operation, manual deep participation is needed, specific personnel are highly relied on, and great uncertainty is caused to high-quality operation and maintenance. In order to achieve the above object, the present invention provides an OS running log processing method based on an AI large model, the OS running log processing method based on the AI large model including the steps of: the method comprises the steps that a log search engine receives an instruction of a natural language analysis module based on an AI large model, collects system logs and service logs of a plurality of target devices in an OS cluster, and uploads the system logs and service logs operated by the OS to a cloud side server through the log search engine; The natural language analysis module based on the AI large model marks the content meaning of each section of log according to the system log and the service log, and the trained fault analysis model is combined to infer the fault which has occurred, locate the fault cause, obtain a fault analysis and location report and send the fault analysis and location report to the fault reasoning diagnosis module based on the AI large model; The fault reasoning diagnosis module based on the AI large model informs relevant maintainers according to the fault type and the fault reason in the fault analysis positioning report, automatically collects feedback information of the relevant maintainers, and inputs the feedback information to the natural language analysis module based on the AI large model. Optionally, in the method for processing an OS running log based on an AI big model, the log search engine receives an instruction of a natural language analysis module based on the AI big model, collects system logs and service logs of a plurality of target devices in an OS cluster, and uploads the system logs and service logs of the OS running to a cloud server through the log search engine, and before the method further includes: Each target device in the OS cluster is provided with a unique SN in advance, and the system log and the service log corresponding to each target device are marked with the SN. The system log and the service log comprise file heads and file contents, wherein the file heads comprise an SN number, a timestamp of occurrence of the log, a process number of the service, a thread number of the service, a log level, a key code identifier and the log contents. Optionally, in the method for processing an OS running log based on an AI big model, the log search engine receives an instruction of a natural language analysis module based on the AI big model, collects system logs and service logs of a plurality of target devices in an OS cluster, and uploads the system logs and service logs of the OS running to a cloud server through the log search engine, and specifically includes: the method comprises the steps that a log search engine receives a log acquisition instruction of a natural language analysis module based on an AI large model in a cloud side server, analyzes the log acquisition instruction, and determines a target equipment identifier to be acquired, a log type, a time range and filtering keywords; the log search engine initiates a log acquisition request to corresponding target