CN-121979740-A - Sampling strategy determining method

CN121979740ACN 121979740 ACN121979740 ACN 121979740ACN-121979740-A

Abstract

The embodiment of the application provides a sampling strategy determining method which can be applied to the technical field of intelligent operation and maintenance, wherein in the method, call data of a plurality of users for the same service link is obtained, and the service link consists of nodes on a plurality of hosts; the method comprises the steps of determining a calling failure rate of any node, determining a core node in a service link based on the calling failure rate of each of a plurality of nodes in the service link, taking the core node and a front node positioned in front of the core node in the service link as a sub-core path of the service link, taking a rear node positioned behind the core node in the service link as a non-sub-core path of the service link, determining a first sampling strategy for the sub-core path based on the calling failure rate of the core node, and determining a second sampling strategy for the non-sub-core path, so that a self-adaptive dynamic service scene can ensure that key data is not sampled in a missing mode, avoid common data sampling redundancy, and improve the sampling accuracy.

Inventors

CHEN HAIPENG

Assignees

深圳前海微众银行股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260104

Claims (10)

1. A sampling strategy determination method, characterized by being applied to a sampling system, comprising: Acquiring call data of a plurality of users on the same service link, wherein the service link consists of nodes on a plurality of hosts; Determining a core node in the service link based on respective call failure rates of a plurality of nodes in the service link; taking the core node and a preamble node positioned in front of the core node in the service link as sub-core paths of the service link, and taking a subsequent node positioned behind the core node in the service link as a non-sub-core path of the service link; and determining a first sampling strategy for the sub-core path and a second sampling strategy for the non-sub-core path based on the call failure rate of the core node, wherein the sampling effect of the first sampling strategy is better than that of the second sampling strategy.
2. The method of claim 1, wherein prior to determining the core node in the traffic link based on the call failure rate of each of the plurality of nodes in the traffic link, further comprising: determining the degree of anomaly of the service link based on the call data; And determining the service link as a core link based on the service attribute of the service link and the abnormality degree of the service link.
3. The method as recited in claim 2, further comprising: and if the service link is a non-core link, setting a sampling-free strategy for the service link.
4. The method of claim 2, wherein the call data further comprises a response time length of a service link, a link call frequency of the service link; the determining the abnormality degree of the service link based on the call data comprises the following steps: determining the comprehensive failure rate of the service link according to the calling failure rate of each node in the service link and the service weight of each node; And determining the anomaly degree of the service link based on the link calling frequency, the response time of the service link and the comprehensive error rate of the service link.
5. The method of claim 4, wherein the determining the anomaly of the traffic link based on the link invocation frequency, the response duration of the traffic link, the integrated error rate of the traffic link comprises: determining the flow fluctuation rate of the service link according to the link calling frequency and the flow base line of the service link; Determining the abnormal delay rate of the service link according to the node response time length of each of the plurality of nodes in the service link and the delay baseline of the service link; determining the comprehensive failure rate of the service link according to the calling failure rate and the importance degree of each node in the service link; and determining the abnormality degree of the service link based on the traffic fluctuation rate, the delay abnormality rate, the comprehensive failure rate and the weights corresponding to the traffic fluctuation rate, wherein the weight of the comprehensive failure rate is higher than the weight of the delay abnormality rate, and the weight of the delay abnormality rate is higher than the weight of the traffic fluctuation rate.
6. The method of claim 2, wherein the traffic attributes include importance of a node and dependency depth of a node, wherein determining that the traffic link is a core link based on the traffic attributes of the traffic link and the anomaly of the traffic link comprises: Determining the business degree of the business link based on the importance degree of each node in the business link and the dependency depth of each node; and determining the service link as a core link according to the service degree of the service link and the abnormality degree of the service link.
7. The method of claim 1, wherein the determining the core node in the traffic link based on the call failure rate of each of the plurality of nodes in the traffic link comprises: and selecting a node with the largest calling failure rate from nodes with the calling failure rate exceeding a node failure threshold value as a core node in the service link based on the calling failure rate of each of the plurality of nodes in the service link.
8. The method of any one of claims 1 to 7, further comprising: If the service link is a core link, the calling failure rate of any node does not exceed the node failure threshold, but the anomaly degree of the service link does not meet the normal standard, determining a third sampling strategy for the service link based on the anomaly degree of the service link; If the service link is a core link, the call failure rate of any node does not exceed the node failure threshold, and the abnormality degree of the service link meets the normal standard, a fourth sampling strategy is set for the node based on the load condition of the host where any node is located.
9. The method of claim 8, wherein setting a fourth sampling policy for the node based on the load condition of the host in which any node is located comprises: aiming at a host where any node in the service link is located, acquiring the CPU utilization rate, the memory duty ratio and the bandwidth utilization rate of the host; Based on the CPU utilization rate, the memory duty ratio and the bandwidth utilization rate, acquiring the load condition of the host; and setting a fourth sampling strategy for a node positioned in the host in the service link based on the load condition of the host.
10. The method as recited in claim 7, further comprising: The call data of any service link is obtained through an edge layer and each sampling strategy is executed through the edge layer; Any sampling strategy is determined by the regional layer; The core nodes, core links, and anomalies are determined by the central layer.

Description

Sampling strategy determining method Technical Field The embodiment of the application relates to the field of intelligent operation and maintenance, in particular to a sampling strategy determining method. Background In a distributed microservice architecture, an application performance monitoring (Application Performance Monitoring, APM) system typically needs to collect relevant data of each call chain, such as response time of each node in any call chain, and fault locate the call chain based on the relevant data of each call chain. In the related technology, related data are sampled for each call chain in the same system at a fixed sampling rate, but the load state of the call chain or the current performance of the system is not considered, if the call chain is in a high load state, the fixed sampling rate cannot meet the current load state, sampling omission is easy to occur, and if the call chain is in a low load state, redundant data are easy to obtain. Thus, there is a lack of flexibility in the manner of a fixed utilization rate, reducing the accuracy of the sampling. Disclosure of Invention The embodiment of the invention provides a sampling strategy determining method which is used for improving the accuracy of sampling. In one aspect, an embodiment of the present application provides a method for determining a sampling policy, where the method includes: Acquiring call data of a plurality of users on the same service link, wherein the service link consists of nodes on a plurality of hosts; Determining a core node in the service link based on respective call failure rates of a plurality of nodes in the service link; taking the core node and a preamble node positioned in front of the core node in the service link as sub-core paths of the service link, and taking a subsequent node positioned behind the core node in the service link as a non-sub-core path of the service link; and determining a first sampling strategy for the sub-core path and a second sampling strategy for the non-sub-core path based on the call failure rate of the core node, wherein the sampling effect of the first sampling strategy is better than that of the second sampling strategy. In one aspect, an embodiment of the present application provides a sampling policy determining apparatus, including: The system comprises an acquisition module, a service link acquisition module and a service link acquisition module, wherein the acquisition module is used for acquiring call data of a plurality of users on the same service link, and the service link consists of nodes on a plurality of hosts; The determining module is used for determining a core node in the service link based on the calling failure rate of each of the plurality of nodes in the service link; The processing module is used for taking the core node and a preamble node positioned in front of the core node in the service link as a sub-core path of the service link, and taking a subsequent node positioned behind the core node in the service link as a non-sub-core path of the service link; and the strategy module is used for determining a first sampling strategy for the sub-core path and a second sampling strategy for the non-sub-core path based on the calling failure rate of the core node, and the sampling effect of the first sampling strategy is better than that of the second sampling strategy. Optionally, the determining module is further configured to: determining the degree of anomaly of the service link based on the call data; And determining the service link as a core link based on the service attribute of the service link and the abnormality degree of the service link. Optionally, the determining module is further configured to: and if the service link is a non-core link, setting a sampling-free strategy for the service link. Optionally, the call data further includes a response time length of the service link and a link call frequency of the service link, and the determining module is specifically configured to: determining the comprehensive failure rate of the service link according to the calling failure rate of each node in the service link and the service weight of each node; And determining the anomaly degree of the service link based on the link calling frequency, the response time of the service link and the comprehensive error rate of the service link. Optionally, the determining module is specifically configured to: determining the flow fluctuation rate of the service link according to the link calling frequency and the flow base line of the service link; Determining the abnormal delay rate of the service link according to the node response time length of each of the plurality of nodes in the service link and the delay baseline of the service link; determining the comprehensive failure rate of the service link according to the calling failure rate and the importance degree of each node in the service link; and determining the abnormality degree of the service