JP-7857343-B2 - Data recording and analysis system

JP7857343B2JP 7857343 B2JP7857343 B2JP 7857343B2JP-7857343-B2

Inventors

後藤正治

Assignees

キーサイトテクノロジーズ，インク．

Dates

Publication Date: 20260512
Application Date: 20240607
Priority Date: 20190402

Claims (17)

A system for recording and analyzing data streams, An input port adapted to receive the aforementioned data stream, wherein the data stream includes an ordered sequence of data values, An output port adapted for communicating the aforementioned data stream to a mass storage device, A buffer connected to the input port is used to temporarily store a predetermined portion of the data stream when the data stream is received by the system. A system comprising: a controller that identifies a new segment called an extracted data segment (EDS) from the data stream stored in the buffer that satisfies an extraction protocol, and compares the new EDS to each of a plurality of reference data segments (RDS) using a first similarity protocol, wherein the controller stores information identifying the new EDS in an RDS database if the first similarity protocol indicates that the new EDS is similar to one of the RDS, and the controller generates a new RDS if the new EDS is not similar to any of the RDS, each RDS including a list of the EDS that was found to be similar to the RDS and the new EDS that caused the controller to generate the new RDS.
The system according to claim 1, wherein the first similarity protocol calculates a distance measurement between two data segments and a similarity threshold, and the two data segments are defined as similar if the distance has a predetermined relationship with the similarity threshold.
The system according to claim 2, wherein the controller generates a plurality of new RDSs by comparing the EDSs associated with an existing RDS with each other using a second similarity protocol that is more restrictive than the first similarity protocol.
The extraction protocol identifies the data value in the buffer at which the new EDS starts and the data value in the buffer at which the new EDS ends. The system according to claim 1, wherein the data value at the end of the new EDS is a certain number of sample values from the data value at the start of the new EDS.
The first similarity protocol calculates a distance measurement between two data segments and a similarity threshold, and defines the two data segments as similar if the distance measurement has a predetermined relationship with the similarity threshold. The system according to claim 1, wherein the controller combines two of the RDSs in response to user input if the RDSs are similar to each other when determined by a second similarity protocol which is less restrictive than the first similarity protocol.
The first similarity protocol calculates a distance measurement between two data segments and a similarity threshold, and defines the two data segments as similar if the distance measurement has a predetermined relationship with the similarity threshold. The system according to claim 1, wherein the controller generates a plurality of new RDSs from an existing RDS by comparing the EDSs associated with that RDS with one another, using a second similarity protocol that is more restrictive than the first similarity protocol.
The system according to claim 1, wherein the controller generates a compressed data stream by replacing each EDS with a symbol representing the RDS found to be similar to that EDS.
The controller generates a compressed data stream by replacing each EDS with a symbol representing the RDS that is found to be similar to that EDS. The system according to claim 1, wherein the controller replaces each sequence of data values that are not part of the EDS with a count indicating the number of symbols in the sequence.
A method for operating a data processing system to analyze a data stream containing an ordered sequence of data values against a cluster of signals, The aforementioned data stream is received sequentially, and an index is assigned to each data value as it is received. The process involves storing a portion of the received data stream in a buffer, From the aforementioned buffer, extract a new extracted data segment (EDS) that satisfies the extraction protocol, A method comprising comparing the new EDS to each of a plurality of reference data segments (RDS) using a first similarity protocol, wherein the data processing system stores information identifying the new EDS in an RDS database if the first similarity protocol indicates that the new EDS is similar to one of the RDS, and the data processing system generates a new RDS if the new EDS is not similar to any of the RDS.
The extraction protocol identifies the data value in the buffer at which the new EDS starts and the data value in the buffer at which the new EDS ends. The method according to claim 9, wherein the data value at which the new EDS ends is a certain number of sample values from the data value at which the new EDS started.
The data processing system calculates a measured distance between two data segments and a similarity threshold. If the measured distance has a predetermined relationship with the similarity threshold, the two data segments are defined as similar. The method according to claim 9, wherein the data processing system combines two of the RDSs in response to user input if the RDSs are similar to each other when determined by a second similarity protocol which is less restrictive than the first similarity protocol.
The data processing system calculates a measured distance between two data segments and a similarity threshold. If the measured distance has a predetermined relationship with the similarity threshold, the two data segments are defined as similar. The method according to claim 9, wherein the data processing system generates a plurality of new RDSs by comparing EDSs associated with an existing RDS with each other, using a second similarity protocol that is more restrictive than the first similarity protocol.
The method according to claim 9, wherein the data processing system generates a compressed data stream by replacing each EDS with a symbol representing the RDS found to be similar to that EDS.
The data processing system generates a compressed data stream by replacing each EDS with a symbol representing the RDS that is found to be similar to that EDS. The method according to claim 9, wherein the data processing system replaces each sequence of data values that are not part of the EDS with a count indicating the number of symbols in the sequence.
A computer-readable memory comprising instructions for causing a data processing system to perform a method of analyzing a data stream containing an ordered sequence of data values against a cluster of signals, wherein the method is: The aforementioned data stream is received sequentially, and an index is assigned to each data value as it is received. The process involves storing a portion of the received data stream in a memory buffer, From the aforementioned buffer, extract a new extracted data segment (EDS) that satisfies the extraction protocol, A computer-readable memory comprising: comparing the new EDS to each of a plurality of reference data segments (RDS) using a first similarity protocol, wherein the data processing system stores information identifying the new EDS in an RDS database if the first similarity protocol indicates that the new EDS is similar to one of the RDS; and generating a new RDS if the new EDS is not similar to any of the RDS.
The computer-readable memory according to claim 15, wherein the data processing system generates a compressed data stream by replacing each EDS with a symbol representing the RDS found to be similar to that EDS.
The data processing system generates a compressed data stream by replacing each EDS with a symbol representing the RDS that is found to be similar to that EDS. The computer-readable memory according to claim 15, wherein the data processing system replaces each sequence of data values that are not part of the EDS with a count indicating the number of symbols in the sequence.

Description

Data recording systems can now store massive amounts of data, and due to the sheer volume, retrieving stored data sequentially makes the time required to search for it extremely long. Data sets exceeding terabytes are recorded on a daily basis. Retrieving terabytes of data from a conventional disk drive takes several hours. Therefore, quickly searching for a target pattern within the recorded data presents a significant challenge. The present invention includes a system for recording and analyzing a data stream, a method for analyzing the data stream, and a computer-readable memory for storing instructions for a computer to perform the method for analyzing the data stream. The system comprises an input port, an output port, a buffer, and a controller. The input port is adapted to receive a data stream, which contains an ordered sequence of data values. The output port is adapted to communicate the data stream to a mass storage device. A buffer is connected to the input port to temporarily store a predetermined portion of the data stream when the data stream is received by the system. The controller identifies a new segment of the data stream stored in the buffer, called an extracted data segment (EDS), which satisfies an extraction protocol. The controller uses a first similarity protocol to identify the new EDS as part of a plurality of reference data segments (RD The controller compares each of the reference data segments (S) and determines if the first similarity protocol indicates that the new EDS is similar to one of the RDSs. The RDS database stores information that identifies S. The controller checks if a new EDS is R If the RDS is not similar to any of the existing DSs, a new RDS is generated. Each RDS contains a list of EDSs that were found to be similar to that RDS, and the new EDS that caused the controller to generate a new RDS. In one embodiment, the buffer includes a FIFO buffer. In another embodiment, the extraction protocol identifies the data values in the buffer at which a new EDS starts and the data values in the buffer at which a new EDS ends. In another embodiment, the data value at which a new EDS ends is a certain number of sample values from the data value at which the new EDS started. In another embodiment, the first similarity protocol calculates a measure of distance and a similarity threshold between two data segments, and defines two data segments as similar if the distance has a predetermined relationship with the similarity threshold. In another embodiment, the controller combines two RDSs in response to user input if the RDSs are similar to each other as determined by a second similarity protocol that is less restrictive than the first similarity protocol. In another embodiment, the controller generates multiple new RDSs from an existing RDS by comparing the EDSs associated with that RDS with each other, using a second similarity protocol that is more restrictive than a first similarity protocol. In another embodiment, the controller generates a compressed data stream by replacing each EDS with a symbol representing an RDS that is found to be similar to that EDS. In another embodiment, the controller replaces each sequence of data values that are not part of the EDS with a count indicating the number of symbols in that sequence. The present invention also includes a method for operating a data processing system to analyze a data stream containing an ordered sequence of data values against a cluster of signals. This method includes sequentially receiving the data stream and assigning an index to each data value as it is received. A portion of the received data stream is stored in a buffer, from which new EDSs satisfying an extraction protocol are extracted. The data processing system uses a first similarity protocol to process the new EDSs against a cluster of RDs. By comparing each of S, the data processing system determines that the first similarity protocol is a new E If the DS indicates similarity to one of the RDSs, information identifying the new EDS is stored in the RDS database. If the new EDS is not similar to any of the RDSs, a new RDS is generated. In another embodiment, the extraction protocol identifies the data values in the buffer at which a new EDS starts and the data values in the buffer at which a new EDS ends. In another embodiment, the data value at which a new EDS ends is a certain number of sample values from the data value at which the new EDS started. In another embodiment, the data processing system calculates a measure of distance and a similarity threshold between two data segments, and defines two data segments as similar if the distance has a predetermined relationship with the similarity threshold. In another embodiment, the data processing system combines two RDSs in response to user input if the RDSs are similar to each other when determined by a second similarity protocol that is less restrictive than the first similarity protocol. In another e