CN-121979931-A - Intelligent agent behavior description method based on knowledge graph
Abstract
The invention relates to the technical field of knowledge graph construction and discloses an agent behavior description method based on a knowledge graph, which comprises the steps of firstly analyzing unstructured agent operation log data flow through a processor, mapping the unstructured agent operation log data flow into a discretized module access sequence by utilizing a topological structure of a code knowledge graph, and removing text redundancy; then, carrying out multidimensional time sequence analysis on the sequence, and calculating resident distribution data representing operation persistence, frequency domain fluctuation data representing switching rhythm and multiscale coverage expansion data representing traversing range; the data is then aggregated into a multi-dimensional index vector of fixed length, and an inode associated with the session entity is built in the graph database, and the vector is written as a binary structured attribute into the storage field. The invention converts massive unstructured logs into compact map structured indexes, and supports rapid positioning and direct retrieval of complex intelligent body operation modes while remarkably saving storage space.
Inventors
- ZHU ZHENYU
- SUN JING
- LIU WEIWEI
Assignees
- 南京宇天智云仿真技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260401
Claims (7)
- 1. The intelligent agent behavior description method based on the knowledge graph is characterized by comprising the following steps of: analyzing unstructured agent operation log data flow through a processor, mapping the agent operation log data flow to a code knowledge graph to generate a discretized module access sequence, carrying out time sequence analysis on the module access sequence, calculating resident distribution feature data representing operation persistence in a module, frequency domain feature data representing context switching frequency among the modules and multi-scale coverage extension feature data representing a code space traversing range, generating a multi-dimensional feature index vector by utilizing all feature data, constructing index nodes related to session entities in a database storage space of the code knowledge graph, and writing the multi-dimensional feature index vector into attribute fields of the index nodes as binary structured data so as to support retrieval of an operation mode through the attribute fields.
- 2. The knowledge-graph-based agent behavior description method of claim 1, wherein parsing, by a processor, an unstructured agent operation log data stream, mapping the agent operation log data stream to a code knowledge graph, generating a discretized module access sequence, comprises: dividing the code knowledge graph by using a community finding algorithm, distributing the code entity nodes in the code knowledge graph to mutually non-overlapping functional module node sets, traversing each operation record in the intelligent agent operation log data stream according to time sequence, extracting the mentioned code entity set from the thinking chain text data of the operation records, calculating the intersection number between the code entity set and the code entity contained in each functional module node, selecting the functional module node corresponding to the maximum value of the intersection number as an access module of the current time step, inheriting the access module of the previous time step if the intersection number is empty, and arranging the access modules determined by all the time steps according to time sequence to form the discretized module access sequence.
- 3. The knowledge-based agent behavior description method of claim 2, wherein performing a time series analysis on the module access sequence calculates resident distribution feature data characterizing operational persistence within a module, comprising: The method comprises the steps of identifying fragments continuously kept as the same functional module node in a module access sequence, counting the duration of each fragment to generate a single residence time sequence, calculating experience probability distribution based on the single residence time sequence, constructing a double-segment power law model, defining a first power law attenuation function and a second power law attenuation function in a first interval smaller than or equal to a preset residence time demarcation threshold and a second interval larger than the demarcation threshold respectively by taking the preset residence time demarcation threshold as demarcation, carrying out least square regression fitting on experience probability distribution data of the first interval and the second interval respectively in a logarithmic coordinate system to obtain a first interval power law index and a second interval power law index, and taking the first interval power law index and the second interval power law index as residence distribution characteristic data representing operation duration in the module.
- 4. The knowledge-based agent behavior description method according to claim 3, wherein performing a time sequence analysis on the module access sequence, calculating frequency domain feature data characterizing an inter-module context switching frequency, comprises: The discretized module access sequence is converted into a binary switching sequence, wherein when the access modules of adjacent time steps change, the access modules are marked as a switching state, otherwise, the access modules are marked as a holding state, the binary switching sequence is subjected to mean removal processing to obtain a zero mean fluctuation sequence, discrete Fourier transformation is carried out on the zero mean fluctuation sequence, power spectrum density values corresponding to all frequency components are calculated, a preset frequency band frequency interval is selected, a linear regression model of the power spectrum density values and the frequency values in a logarithmic coordinate system is established, and absolute values of slope parameters of the linear regression model are calculated and serve as frequency domain characteristic data of context switching frequencies among the characterization modules.
- 5. The knowledge-based agent behavior description method of claim 4, wherein performing a time series analysis on the module access sequence calculates multi-scale coverage extension feature data characterizing a code space traversal range, comprising: Setting a group of time coarse-grain scales which are increased according to geometric series, dividing the discretized module access sequence into time windows which are not overlapped with each other according to each time coarse-grain scale, determining access modules with highest occurrence frequency in each time window as main modules of the window, counting the total number of different modules in the main modules corresponding to all time windows under the time coarse-grain scale as the number of covering modules of the scale, establishing a power law attenuation model of which the number of the covering modules is changed along with the time coarse-grain scale, fitting the power law attenuation model under a logarithmic coordinate system, and extracting the attenuation index of the power law attenuation model as multi-scale covering expansion characteristic data of the traversing range of the characteristic code space.
- 6. The knowledge-based agent behavior description method according to claim 5, wherein generating a multidimensional feature index vector using all feature data, comprises: The method comprises the steps of extracting a first interval power law index and a second interval power law index in residence distribution characteristic data, dividing absolute values of slope parameters in frequency domain characteristic data and attenuation indexes in multi-scale coverage expansion characteristic data to form a basic characteristic set, calculating a numerical difference of the first interval power law index and the second interval power law index to be used as a heavy tail asymmetry degree, calculating a ratio of the second interval power law index to the absolute values of the slope parameters to be used as a time-frequency coupling strength characteristic parameter, calculating a sum of the absolute values of the second interval power law index and the slope parameters to be divided by the attenuation index to obtain a coverage compression ratio characteristic parameter, splicing the basic characteristic set, the heavy tail asymmetry degree, the time-frequency coupling strength characteristic parameter and the coverage compression ratio characteristic parameter into an original behavior characteristic vector, and carrying out standardization processing on the original behavior characteristic vector by using a preset mean value vector and a standard difference vector to obtain the multi-dimensional characteristic index vector.
- 7. The knowledge-based agent behavior description method according to claim 6, wherein constructing an index node associated with a session entity in a database storage space of the code knowledge graph, writing the multi-dimensional feature index vector as binary structured data into an attribute field of the index node to support retrieval of an operation mode through the attribute field, comprises: The method comprises the steps of instantiating an index node with a type of a conversation entity in a code knowledge graph, storing the multidimensional feature index vector as a vector attribute of the index node, counting total residence time and access frequency of each function module node in a module access sequence, establishing a directional correlation edge between the index node and each accessed function module node, calculating the proportion of the total residence time to the total time of the sequence as a weight attribute, writing the weight attribute and the access frequency into a data field of the directional correlation edge, calculating Euclidean distance between the multidimensional feature index vector of the index node which is currently established and vectors of history index nodes which are existing in a database, establishing a similarity correlation edge between the current index node and the history index node, and recording the Euclidean distance as an edge attribute of the similarity correlation edge.
Description
Intelligent agent behavior description method based on knowledge graph Technical Field The invention relates to the technical field of knowledge graph construction, in particular to an intelligent agent behavior description method based on a knowledge graph. Background With the wide application of large language model driven agents in the field of software engineering, the operation log data generated by code editing agents is explosively increased. These data detail the complete process of the agent's mental reasoning, tool invocation, and code modification in the code repository, typically stored in unstructured natural language text streams. In order to monitor the running state of an agent, audit the operation safety of the agent or evaluate the task completion quality of the agent, the prior art means mainly rely on keyword matching, regular expression extraction based on rules or simple text similarity calculation of an original log, in an attempt to restore the operation path of the agent from the original log and perform basic statistical analysis. However, the existing log analysis method is difficult to effectively capture the hidden complex dynamics rules and deep statistical properties behind the behavior of the agent. For example, high-level agents, when handling complex programming tasks, often exhibit heterogeneous dwell patterns (denoted as heavy tail distribution) with long-term deep reasoning alternating with short-term fast searching at specific critical modules, or policy switching cadence (denoted as specific frequency domain noise spectrum) with long-range correlation between different functional modules, and multi-scale fractal coverage of code topology space. The traditional linear statistical method or simple frequency aggregation means can not identify the nonlinear space-time mechanisms, so that a great amount of key information contained in time sequence fluctuation is ignored, and the system can not distinguish random invalid heuristics from strategic deep thinking at the data level. In addition, the existing code knowledge graph technology can effectively manage static dependency relationships among code entities, but lacks an effective mapping mechanism to reduce the dimension of dynamic time sequence data of an agent and solidify the dynamic time sequence data into a graph storage structure. Due to the lack of such index construction techniques that integrate dynamic behavior parameters with static code topologies, current database systems cannot support direct retrieval and efficient querying (e.g., directly searching session records with a specific deep inference rhythm or a specific coverage breadth) for the intelligent agent operation mode, forcing analysts to perform inefficient full-scale scanning in massive raw logs, greatly limiting the deep mining and utilization efficiency of large-scale intelligent agent behavior data. Disclosure of Invention The invention provides an intelligent agent behavior description method based on a knowledge graph, which solves the technical problems in the background technology. The invention provides an agent behavior description method based on a knowledge graph, which comprises the following steps: analyzing unstructured agent operation log data flow through a processor, mapping the agent operation log data flow to a code knowledge graph to generate a discretized module access sequence, carrying out time sequence analysis on the module access sequence, calculating resident distribution feature data representing operation persistence in a module, frequency domain feature data representing context switching frequency among the modules and multi-scale coverage extension feature data representing a code space traversing range, generating a multi-dimensional feature index vector by utilizing all feature data, constructing index nodes related to session entities in a database storage space of the code knowledge graph, and writing the multi-dimensional feature index vector into attribute fields of the index nodes as binary structured data so as to support retrieval of an operation mode through the attribute fields. The method has the advantages that the method realizes the dimension reduction compression storage and the structured index of massive unstructured agent operation logs by constructing the storage structures of the behavior index nodes and the multidimensional characteristic attribute fields in the code knowledge graph database, converts full text scanning and semantic matching tasks which are needed to be subjected to high calculation consumption originally into low-delay graph topology traversal and vector distance calculation tasks, and greatly improves the I/O throughput efficiency and the query response speed of a computer system when the complex operation mode matching and the history tracing are processed while obviously reducing the occupation of the storage space of the database, thereby optimizing the operation performanc