CN-121979870-A - Data detection method, device, equipment and storage medium
Abstract
The application discloses a data detection method, a device, equipment and a storage medium, belongs to the field of data processing, and can solve the problem that automatic processing cannot be performed on a large amount of complex data. The method comprises the steps of grouping each column of data in a data list, conducting smoothing processing on the data sequence through a smoothing window, determining a first candidate abnormal point in the data sequence according to a preset step threshold value, shielding the first candidate abnormal point, conducting sliding window calculation on the data sequence to obtain a dynamic reference value corresponding to each data point, determining a dynamic threshold value corresponding to each data point according to the dynamic reference value, determining a second candidate abnormal point in the data sequence according to the dynamic threshold value, dividing the first abnormal point and the second abnormal point into a plurality of continuous intervals, and determining the abnormal point in the continuous intervals as an effective abnormal point under the condition that the length of the continuous intervals is larger than or equal to the minimum duration time.
Inventors
- WANG GUANG
- CHEN LIN
- DONG CHANGYONG
- LI SHAOHUA
- WANG ZHE
Assignees
- 重庆赛力斯凤凰智创科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251229
Claims (10)
- 1. A method of data detection, the method comprising: Receiving data to be detected, and grouping the data to be detected according to a preset classification rule to obtain a first data sequence with good grouping; determining a first candidate abnormal point in the smoothed first data sequence according to a preset step threshold value, and shielding the first candidate abnormal point to obtain a second data sequence; calculating the second data sequence through a sliding window to obtain a dynamic reference value corresponding to each data point in the second data sequence; determining a dynamic threshold corresponding to the data point according to the dynamic reference value, and determining a second candidate abnormal point in the second data sequence according to the dynamic threshold; dividing the first abnormal point and the second abnormal point into a plurality of continuous intervals according to a time sequence, and determining the candidate abnormal point in the continuous intervals as an effective abnormal point under the condition that the length of the continuous intervals is greater than or equal to a preset duration.
- 2. The data detection method according to claim 1, wherein before determining a first candidate outlier in the smoothed first data sequence according to a preset step threshold, the method further comprises: moving the smooth window in the first data sequence according to a preset step length; Calculating the average value of all data points contained in the smooth window every time the smooth window is moved; And in the case that the smoothing window moves to the end of the data sequence, the first data sequence after smoothing is formed by all obtained average values.
- 3. The method for detecting data according to claim 1, wherein the determining a first candidate abnormal point in the smoothed first data sequence according to a preset step threshold, and masking the first candidate abnormal point, to obtain a second data sequence, includes: Performing first-order differential calculation on the first data sequence to obtain a differential value; under the condition that the difference value is larger than a preset step threshold value, determining a data point corresponding to the difference value and adjacent data points of the data point as the first candidate abnormal point; And adding the first candidate abnormal point into a step point mask to obtain the second data sequence.
- 4. The data detection method according to claim 1, wherein the calculating the second data sequence through a sliding window to obtain the dynamic reference value corresponding to each data point in the second data sequence includes: Selecting a time window with a preset length for each data point in the second data sequence; and calculating the median of all the data points in the time window to obtain the dynamic reference value.
- 5. The method of claim 1, wherein the determining a second candidate outlier in the second data sequence based on the dynamic threshold comprises: Determining a deviation between the data point and the dynamic reference value; In the event that the deviation exceeds the dynamic threshold, the data point is determined to be the second candidate outlier.
- 6. The data detection method according to claim 1, characterized in that the method further comprises: drawing the first data sequence on a corresponding coordinate axis according to a physical dimension and a numerical range to obtain a coordinate axis image; and marking the effective abnormal point in the coordinate axis.
- 7. The data detection method according to claim 1, characterized in that the method further comprises: determining the abnormal information corresponding to the effective abnormal point, wherein the abnormal information at least comprises an occurrence position, a data column, an abnormal duration, an abnormal statistical feature and a trigger threshold; and filling the abnormal information into a preset template to obtain a corresponding text report.
- 8. A data detection device, the device comprising: The first data sequence acquisition module is used for receiving data to be detected, grouping the data to be detected according to a preset classification rule, and obtaining a grouped first data sequence; The second data sequence acquisition module is used for carrying out smoothing processing on the first data sequence through a smoothing window to obtain a second data sequence; The second data sequence acquisition module is used for determining a first candidate abnormal point in the smoothed first data sequence according to a preset step threshold value, and shielding the first candidate abnormal point to obtain a second data sequence; The dynamic reference value calculation module is used for calculating the second data sequence through a sliding window to obtain a dynamic reference value corresponding to each data point in the second data sequence; The dynamic judgment module is used for determining a dynamic threshold value corresponding to the data point according to the dynamic reference value and determining a second candidate abnormal point in the second data sequence according to the dynamic threshold value; The effective abnormal point determining module is used for dividing the first abnormal point and the second abnormal point into a plurality of continuous intervals according to time sequence, and determining the candidate abnormal point in the continuous intervals as an effective abnormal point under the condition that the length of the continuous intervals is greater than or equal to the preset duration time.
- 9. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which program or instruction when executed by the processor implements the steps of any of the methods of claims 1-7.
- 10. A readable storage medium, characterized in that it stores thereon a program or instructions, which when executed by a processor, implement the steps of any of the methods of claims 1-7.
Description
Data detection method, device, equipment and storage medium Technical Field The application belongs to the field of data processing, and particularly relates to a data detection method, a device, equipment and a storage medium. Background In the test process, a large amount of data can be generated, and in order to ensure the accuracy and reliability of the data, the data needs to be detected to determine abnormal data therein. In the prior art, when detecting data, a detecting person needs to determine a corresponding detection mode according to a corresponding data type. In the prior art, a single detection mode is adopted, so that the method is difficult to adapt to diversified requirements of different test types, and error judgment is easy to occur under the condition of facing complex and changeable test data, so that abnormal points in the data can not be accurately detected. Disclosure of Invention The embodiment of the application aims to provide a data detection method, a device, equipment and a storage medium, which can solve the problem that various test data cannot be accurately detected. In order to solve the technical problems, the application is realized as follows: in a first aspect, an embodiment of the present application provides a data detection method, where the method includes: Receiving data to be detected, and grouping the data to be detected according to a preset classification rule to obtain a first data sequence with good grouping; determining a first candidate abnormal point in the smoothed first data sequence according to a preset step threshold value, and shielding the first candidate abnormal point to obtain a second data sequence; calculating the second data sequence through a sliding window to obtain a dynamic reference value corresponding to each data point in the second data sequence; determining a dynamic threshold corresponding to the data point according to the dynamic reference value, and determining a second candidate abnormal point in the second data sequence according to the dynamic threshold; dividing the first abnormal point and the second abnormal point into a plurality of continuous intervals according to a time sequence, and determining the candidate abnormal point in the continuous intervals as an effective abnormal point under the condition that the length of the continuous intervals is greater than or equal to a preset duration. Optionally, before determining the first candidate outlier in the smoothed first data sequence according to a preset step threshold, the method further comprises: moving the smooth window in the first data sequence according to a preset step length; Calculating the average value of all data points contained in the smooth window every time the smooth window is moved; And in the case that the smoothing window moves to the end of the data sequence, the first data sequence after smoothing is formed by all obtained average values. Optionally, the determining a first candidate abnormal point in the first data sequence after the smoothing processing according to a preset step threshold, and shielding the first candidate abnormal point to obtain a second data sequence includes: Performing first-order differential calculation on the first data sequence to obtain a differential value; under the condition that the difference value is larger than a preset step threshold value, determining a data point corresponding to the difference value and adjacent data points of the data point as the first candidate abnormal point; And adding the first candidate abnormal point into a step point mask to obtain the second data sequence. Optionally, the calculating the second data sequence through a sliding window to obtain a dynamic reference value corresponding to each data point in the second data sequence includes: Selecting a time window with a preset length for each data point in the second data sequence; and calculating the median of all the data points in the time window to obtain the dynamic reference value. Optionally, the determining the second candidate outlier in the second data sequence according to the dynamic threshold includes: Determining a deviation between the data point and the dynamic reference value; In the event that the deviation exceeds the dynamic threshold, the data point is determined to be the second candidate outlier. Optionally, the method further comprises: drawing the first data sequence on a corresponding coordinate axis according to a physical dimension and a numerical range to obtain a coordinate axis image; and marking the effective abnormal point in the coordinate axis. Optionally, the method further comprises: determining the abnormal information corresponding to the effective abnormal point, wherein the abnormal information at least comprises an occurrence position, a data column, an abnormal duration, an abnormal statistical feature and a trigger threshold; and filling the abnormal information into a preset template to