CN-121996359-A - Multi-type test data processing method and equipment based on distributed computing
Abstract
The embodiment of the application discloses a multi-type test data processing method and equipment based on distributed computation, wherein the method comprises the steps of capturing a primary test data stream at each test execution node and packaging the primary test data stream into multi-type test data units, storing the multi-type test data stream into a local stream data buffer area to generate a node test data time sequence, generating a node internal test data stream path map based on time intervals and type conversion relations among the data units, sending all the node internal maps to a distributed coordination control system to perform cross-node association analysis to generate a test data cross-node flow association network, dividing the test execution node into test task execution node groups according to the network, selecting a core node to construct a test data convergence view in the group, extracting a test data unit chain corresponding to a flow test transaction crossing the multi-node according to the dependency relation, distributing a globally unique transaction tracking identifier to the test data unit chain and reinjecting the test data chain to a stream data buffer area of a corresponding node.
Inventors
- HUANG HUI
- Xiang Yuechao
Assignees
- 深圳市芯片测试技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260407
Claims (10)
- 1. A method for processing multiple types of test data based on distributed computing, the method comprising: Capturing a native test data stream generated during running of a test task on each test execution node, packaging the native test data stream into multi-type test data units carrying node source tags and test type tags, and storing the multi-type test data units in a local streaming data buffer to generate a node test data time sequence having a continuous time span; Generating a node internal test data flow path map corresponding to each test execution node according to the time interval and the type conversion relation between the multi-type test data units in the node test data time sequence; Transmitting the internal test data flow path patterns of all the nodes to a distributed coordination control system for cross-node association analysis, and generating a test data cross-node flow association network containing test data transfer directed edges; dividing the test execution nodes into test task execution node groups according to the test data cross-node flow association network, and selecting a core test execution node from each test task execution node group to construct a test data convergence view in the group; and extracting a test data unit chain corresponding to the flow test transaction crossing multiple nodes according to the dependency relationship in the test data convergence view in the group, distributing a globally unique transaction tracking identifier for the test data unit chain, and reinjecting the globally unique transaction tracking identifier to a flow data cache region of a corresponding test execution node.
- 2. The method of claim 1, wherein capturing, at each test execution node, a native test data stream generated during running a test task, encapsulating the native test data stream into multiple types of test data units carrying node source tags and test type tags, comprises: Installing a test task execution agent program on each test execution node in the distributed computing environment, receiving a test task starting instruction sent by a distributed coordination control system through the test task execution agent program, and creating a test task execution process instance in a local operating system process according to task parameter information carried in the test task starting instruction; During the running process of the test task execution process instance, intercepting a data writing operation written into a local file system by the test task execution process instance through the test task execution agent program, and capturing original test data content in a memory buffer area corresponding to the data writing operation; Analyzing a task identifier corresponding to a test task currently being executed by a test task execution process instance from the original test data content, and acquiring a node network address of a test execution node where a test task execution agent program is located as a node source tag; Reading a test data type identification field contained in original test data content generated by a test task execution process instance, and converting the test data type identification field into a unified test type classification code to be used as a test type label; Sequentially arranging the captured original test data contents according to the generated time sequence, combining and packaging the captured original test data contents at each time point with a corresponding task identifier, a node source tag and a test type tag to generate a data package header corresponding to the original test data contents, and obtaining a data load unit containing the data package header and the original test data contents as a multi-type test data unit; maintaining a multi-type test data unit sending queue in the test task execution agent program, and pressing the generated multi-type test data units into the multi-type test data unit sending queue according to the time sequence.
- 3. The method of claim 1, wherein storing the multi-type test data units in a local streaming data buffer to generate a node test data time sequence having a continuous time span comprises: Dividing a ring buffer area with set capacity in a local memory space of each test execution node as a streaming data buffer area, and configuring independent buffer write-in threads and buffer read-out threads for the streaming data buffer area; Continuously monitoring a multi-type test data unit sending queue in a test task execution agent program through the cache writing thread, and taking out the multi-type test data unit from the head of the multi-type test data unit sending queue when detecting that the multi-type test data unit to be processed exists in the multi-type test data unit sending queue; Analyzing the data package packet header of the extracted multi-type test data unit, extracting the generation time stamp information contained in the data package packet header, calculating the writing position index of the multi-type test data unit in the annular buffer area according to the generation time stamp information, and writing the multi-type test data unit into the corresponding storage slot bit of the annular buffer area according to the writing position index; scanning all storage tank bits in the annular buffer area at set time intervals by the buffer reading thread, reading written multi-type test data units from each storage tank bit, and arranging the read multi-type test data units into a linear sequence according to the writing position indexes corresponding to the storage tank bits from small to large; Performing difference calculation on generated timestamp information between adjacent multi-type test data units in the linear sequence, and inserting an empty data unit identifier between the adjacent multi-type test data units if the generated timestamp difference exceeds a preset time fault threshold; And carrying out serialization code conversion on the original test data content corresponding to each multi-type test data unit in the linear sequence after the time fault gap filling processing to generate a node test data time sequence with continuous time span corresponding to each test execution node.
- 4. The method according to claim 1, wherein the generating a node internal test data flow path map corresponding to each test execution node according to a time interval and a type conversion relationship between multiple types of test data units in the node test data time sequence includes: Analyzing a node test data time sequence corresponding to each test execution node, traversing each multi-type test data unit in sequence from the starting position of the node test data time sequence, and acquiring the generation time stamp information, the test type label and the task identifier in the data package header corresponding to each multi-type test data unit; Calculating the time interval between adjacent multi-type test data units according to the recorded generation time stamp information, and marking the adjacent multi-type test data units with the time interval smaller than a preset close time threshold as test data unit pairs with continuous generation relations; Identifying a conversion mode between a test type label of a previous multi-type test data unit and a test type label of a next multi-type test data unit in a test data unit pair with a continuous generation relationship, and determining a type conversion path of sequential evolution of test data types in the same test task execution process; According to the sequence of generating time stamps of all the multi-type test data units corresponding to the same task identifier, connecting the multi-type test data units under the same task identifier into branches of a data flow path in the task according to the sequence of the time stamps; Merging branches with the same test type conversion mode in all task internal data flow path branches to obtain a directed graph structure which takes a test type label as a node and takes a type conversion relation as a directed edge in the node, and the directed graph structure is used as a test data flow path map in the node; Traversing task identifiers corresponding to all multi-type test data units in a node test data time sequence, extracting a start multi-type test data unit and an end multi-type test data unit corresponding to each task identifier, and marking start and stop position nodes of a circulation path corresponding to each task identifier in a node internal test data circulation path map.
- 5. The method of claim 1, wherein the sending the all-node internal test data flow path graph to the distributed coordination control system for cross-node association analysis generates a test data cross-node flow association network comprising test data transfer directed edges, comprising: receiving node internal test data flow path patterns sent by all test execution nodes through a distributed coordination control system, distributing corresponding node identifiers for each node internal test data flow path pattern, and storing the received node internal test data flow path patterns in a central pattern database; Extracting test type label sets contained in test data flow path patterns in all nodes from a central graph spectrum database, carrying out global unified coding on each test type label appearing in the test type label sets, and establishing a mapping relation table between the test type labels and the global unified coding; According to the mapping relation table, replacing all test type labels in each node internal test data flow path map with corresponding global unified codes to obtain a unified coded node internal test data flow path map; Traversing the complete circulation paths corresponding to each task identifier in the node internal test data circulation path map after uniform coding, and extracting the global uniform coding corresponding to the end position node of each complete circulation path as the output test type code to be matched; Searching whether a complete circulation path corresponding to a task identifier with the same global unified code as the output test type code to be matched exists in the global unified code corresponding to the initial position node in the internal test data circulation path maps of other nodes except the node to which the internal test data circulation path map of the current node belongs; If the complete circulation path of the global unified code corresponding to the node at the initial position and the output test type code to be matched are searched, establishing a directed connection edge from the node to which the current node internal test data circulation path map belongs to the node to which the searched complete circulation path belongs, and adding the directed connection edge into a test data cross-node flow association network; if the matched complete circulation path is not searched, marking the node at the end position of the current node internal test data circulation path map as a cross-node flow breakpoint, and adding a cross-node flow breakpoint mark for the node to which the current node internal test data circulation path map belongs in the test data cross-node flow association network.
- 6. The method according to claim 5, wherein the extracting the test type label set included in the test data flow path graph in all nodes from the central graph spectrum database, performing global unified coding on each test type label appearing in the test type label set, and establishing a mapping relation table between the test type labels and the global unified coding, includes: Starting a global unified code generation process through a distributed coordination control system, wherein the global unified code generation process sequentially accesses each node internal test data flow path map stored in a central map database; Analyzing all nodes contained in the node internal test data flow path map for the current accessed node internal test data flow path map, wherein each node corresponds to a test type label, and extracting the original character string of the test type label of each node; Adding the extracted original character string of the test type label into a global test type label summarizing set, checking whether a character string record which is completely identical to the original character string of the current test type label exists in the global test type label summarizing set, and if the character string record which is completely identical to the original character string of the current test type label does not exist in the global test type label summarizing set, inserting the original character string of the current test type label into the global test type label summarizing set as a new item; After traversing the test data flow path atlas in all nodes is completed, obtaining the total number of test type label original character strings contained in the global test type label summarization set, and determining the coding bit number of global unified coding according to the total number; According to the insertion sequence of the original character strings of the test type labels in the global test type label summarization set, an integer sequence which is sequentially increased from an initial coding value is distributed to each original character string of the test type labels as global unified coding; And establishing a key value pair mapping relation taking the original character string of the test type label as a key and the distributed global unified code as a value, and storing the key value pair mapping relation as a mapping relation table between the test type label and the global unified code.
- 7. The method of claim 1, wherein the partitioning test execution nodes into test task execution node groups according to the test data across a node flow association network comprises: Analyzing the graph structure of the test data cross-node flow association network, and acquiring all node identifiers and directed connecting edges among all nodes contained in the test data cross-node flow association network, wherein each directed connecting edge points to a target node from a source node; Initializing an empty node group set, selecting a node identifier which is not distributed to any node group from a cross-node flow association network of test data as a seed node, creating a new node group and adding the seed node into the new node group; Taking the seed node as a starting point, performing breadth-first traversal along the direction of the directional connection edge in the test data cross-node flow correlation network, and adding all node identifiers which can reach the seed node from the seed node through the directional connection edge into the new node group; Marking all node identifiers in the new node group as allocated states, and repeatedly executing the steps of selecting unallocated node identifiers as new seed nodes and creating the new node group until all node identifiers in the cross-node flow association network of the test data are allocated to the corresponding node groups; And for each node group, counting the proportion of the number of the directional connection edges existing in the node group to the number of the maximum directional connection edges existing in all nodes of the node group as the group internal connection density, splitting the node group into a plurality of subgroups for the node group with the group internal connection density lower than a preset density threshold value in a splitting mode that node identifiers with the minimum number of the connection edges are removed from the node group, checking whether the rest node identifiers form connected components or not, and taking each connected component as an independent node group.
- 8. The method according to claim 1 or 7, wherein selecting a core test execution node from each test task execution node group to construct an intra-group test data aggregate view comprises: for each test task execution node group, calculating the outbound value and inbound value of each test execution node in the group in a test data cross-node flow correlation network, and selecting the test execution node with the largest sum of the outbound value and inbound value as the core test execution node corresponding to the group; The method comprises the steps that a test execution node selected as a core test execution node sends a test data time sequence synchronization request to other test execution nodes in the same group through a distributed coordination control system, wherein the test data time sequence synchronization request comprises a time range starting point and a time range ending point for requesting synchronization; after other test execution nodes in the same group receive the test data time sequence synchronization request, node test data time sequence fragments between the time range starting point and the time range ending point are read from the respective stream data buffer areas, and the read node test data time sequence fragments are packaged into a synchronous response data packet and sent back to the core test execution node; Receiving synchronous response data packets returned by other test execution nodes in the same group through the core test execution node, analyzing corresponding node test data time sequence fragments from each synchronous response data packet, and node identifiers of the test execution nodes to which each node test data time sequence fragment belongs; The core test execution node uses the starting point of the time range as an alignment reference, and the self node test data time sequence fragments and all the received node test data time sequence fragments of other test execution nodes are aligned in parallel according to time stamps, so that a group test data convergence view containing a plurality of rows of data sequences is generated; and in the intra-group test data convergence view, identifying multiple types of test data units with the same task identifier in node test data time sequence fragments of different test execution nodes through the core test execution node, and drawing intra-group dependency relation connection lines among the multiple types of test data units.
- 9. The method according to any one of claims 1-7, wherein the extracting a test data unit chain corresponding to a flow test transaction across multiple nodes according to a dependency relationship in the intra-group test data convergence view, and assigning a globally unique transaction tracking identifier to the test data unit chain, comprises: Scanning all intra-group dependency relation links drawn in the intra-group test data convergence view by the core test execution node, starting from a multi-type test data unit corresponding to the initial end of the intra-group dependency relation link of each test task execution node, and tracking and traversing along the direction of the intra-group dependency relation links; In the tracking traversal process, recording a dependency relation connecting line in each passing group and multi-type test data units corresponding to two ends of the connecting line, and sequentially connecting the multi-type test data units according to a tracking sequence to form an initial test data unit chain; Judging whether the currently tracked multi-type test data unit still has an intra-group dependency relation connection line pointing to other multi-type test data units on other test execution nodes, if so, continuing tracking along the newly discovered intra-group dependency relation connection line, and adding the newly tracked multi-type test data unit to the tail end of the initial test data unit chain; If the current tracked multi-type test data unit does not have a dependency link in the group pointing to other multi-type test data units, stopping tracking, and determining the current obtained initial test data unit chain as a complete test data unit chain corresponding to a flow test transaction crossing the multi-test execution node; Generating a globally unique transaction tracking identifier for each determined test data unit chain through the core test execution node, wherein the globally unique transaction tracking identifier is formed by combining a node identifier of the core test execution node, a generation timestamp and an incremental serial number; the generated globally unique transaction tracking identification is appended to the data encapsulation header of each multi-type test data unit contained in the chain of test data units.
- 10. A computer device, comprising: a processor, a storage device having stored thereon a computer program, a network interface for providing network communication functions, which when executed by the processor causes the processor to implement the distributed computing based multi-type test data processing method according to any of claims 1-9.
Description
Multi-type test data processing method and equipment based on distributed computing Technical Field The embodiment of the application relates to the technical field of computer data processing, in particular to a method and equipment for processing multi-type test data based on distributed computing. Background In the development and testing process of large-scale distributed systems and cloud computing platforms, tasks such as performance testing, pressure testing and functional verification generally need to be completed cooperatively by scheduling a plurality of test execution nodes, and the test execution nodes can continuously generate massive primary test data flows during the execution of test tasks, including but not limited to application program logs, performance monitoring indexes, interface call records, database operation sentences, custom test result output and the like. In the prior art, the processing of the test data generally employs a centralized log collection scheme, such as by deploying a log collection agent on each node, to push locally generated log files to a centralized log storage system, such as an elastic search or a distributed file system, at regular or real-time. Another common processing method is to use a streaming data processing framework, such as APACHE KAFKA in combination with Flink or SPARK STREAMING, to access the data generated by each node into a unified data pipeline in a message form for subsequent offline or real-time analysis, so as to realize centralized storage of massive test data and basic keyword retrieval or index aggregation. In addition, part of test management platform can generate unique request identification for each test request by manually burying points in test script and record the processing logs of the request identification on different service nodes, and then the request identification is used as an associated key to carry out log collection so as to realize call chain tracking of single request granularity. Disclosure of Invention The embodiment of the application provides a method and equipment for processing multi-type test data based on distributed computing. In one aspect, an embodiment of the present application provides a method for processing multi-type test data based on distributed computing, where the method is applied to a computer device, and the method includes: Capturing a native test data stream generated during running of a test task on each test execution node, packaging the native test data stream into multi-type test data units carrying node source tags and test type tags, and storing the multi-type test data units in a local streaming data buffer to generate a node test data time sequence having a continuous time span; Generating a node internal test data flow path map corresponding to each test execution node according to the time interval and the type conversion relation between the multi-type test data units in the node test data time sequence; Transmitting the internal test data flow path patterns of all the nodes to a distributed coordination control system for cross-node association analysis, and generating a test data cross-node flow association network containing test data transfer directed edges; dividing the test execution nodes into test task execution node groups according to the test data cross-node flow association network, and selecting a core test execution node from each test task execution node group to construct a test data convergence view in the group; and extracting a test data unit chain corresponding to the flow test transaction crossing multiple nodes according to the dependency relationship in the test data convergence view in the group, distributing a globally unique transaction tracking identifier for the test data unit chain, and reinjecting the globally unique transaction tracking identifier to a flow data cache region of a corresponding test execution node. In one aspect, an embodiment of the present application provides a computer device, including: The system comprises a processor, a storage device, a network interface and a data processing device, wherein the storage device is stored with a computer program, the network interface is used for providing a network communication function, and when the computer program is executed by the processor, the processor realizes any multi-type test data processing method based on distributed computation. In one aspect, embodiments of the present application provide a readable storage medium, where a program or an instruction is stored, where the program or the instruction implements the steps of the distributed computing-based multi-type test data processing method when executed by a processor. The application realizes deep deconstructing and explicit characterization of cross-node data flow logic in distributed test scenes by constructing a complete technical link from native data capture to global transaction identification reinjection. Firstly, g