CN-121765405-B - Industrial data full life cycle management method and system
Abstract
The invention relates to the technical field of industrial big data storage and management, and discloses a full life cycle management method and system for industrial data, wherein the method comprises the steps of obtaining original time sequence data of industrial equipment, and performing shannon entropy calculation to obtain a time sequence importance scoring sequence; the method comprises the steps of comparing threshold values according to the time sequence importance scoring sequence, determining a key transient window and a stable normal window, executing lossless differential coding on the key transient window to obtain a high-fidelity data segment, executing feature extraction and sparsification processing on the stable normal window, performing domain splicing and index construction to obtain a primary mixed storage structure, and continuously monitoring a query access path to generate a heat distribution map to obtain a hierarchical filing data set. The method can realize value perception hierarchical storage and dynamic full life cycle management of mass industrial data.
Inventors
- ZHANG HUAJUN
- LI SHUANGYUAN
- FAN JIAXU
- YANG HUAN
- WANG JINGFANG
- GUO XIANWEN
- XU XIAOFAN
Assignees
- 沈阳久成科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260303
Claims (9)
- 1. A method for managing a full lifecycle of industrial data, comprising: Acquiring original time sequence data of industrial equipment, executing time window segmentation of a preset length, and calculating shannon entropy in each time window based on frequency to obtain a time sequence importance scoring sequence; comparing the time sequence importance scoring sequence with a preset shunting threshold, marking a time window with a score higher than the shunting threshold as a key transient window, and marking a time window with a score lower than or equal to the shunting threshold as a stable normal window, wherein the shunting threshold is set based on the identification of curve morphology; Calculating an original data sequence in the key transient window to obtain a differential sequence, and matching and packaging the differential sequence based on a preset global mapping dictionary to generate a high-fidelity data segment, wherein the high-fidelity data segment comprises an extracted first data point serving as a full-quantity reference value and a binary bit stream generated by entropy coding and packaging the differential sequence; Processing data in the stable normal window to obtain a sparse transformation coefficient matrix, and performing interval mapping and encoding on the sparse transformation coefficient matrix based on a non-uniform quantizer to generate a background simplified data segment, wherein the background simplified data segment comprises a low-frequency transformation coefficient and a binary data stream; Based on a preset frame structure, performing time domain splicing on the high-fidelity data segment and the background reduced data segment, establishing a metadata index containing a time stamp and an importance tag, constructing a primary mixed storage structure, and writing the primary mixed storage structure into a preset first-stage storage medium; Continuously recording inquiry operation initiated by the metadata index during the residence period of the primary mixed storage structure, counting access frequency and generating a heat distribution map; And if the heat distribution map shows that the access frequency of the background reduced data segment is lower than a preset cold filing threshold, stripping the background reduced data segment from the primary mixed storage structure and migrating to a preset second-level storage medium to obtain a hierarchical filing data set.
- 2. The method for managing the whole life cycle of industrial data according to claim 1, wherein the steps of obtaining the original time series data of the industrial equipment, performing time window segmentation with a preset length, calculating shannon entropy in each time window, and obtaining the time sequence importance scoring sequence include: Acquiring original time sequence data of industrial equipment, performing numerical discretization on the original time sequence data, and mapping continuous analog signal values into discrete symbols in a finite state set; counting probability distribution of each discrete symbol occurrence in each time window, and calculating information entropy values of the probability distribution based on a shannon entropy formula; and taking the calculated information entropy value as an element of the time sequence importance scoring sequence, wherein the information entropy value is used for quantitatively representing the waveform complexity and the information quantity in the corresponding time window.
- 3. The industrial data full life cycle management method of claim 1, wherein the comparing the sequence of time series importance scores with a preset shunt threshold value, marking a time window with a score higher than the shunt threshold value as a critical transient window, and marking a time window with a score lower than or equal to the shunt threshold value as a steady normal window, comprises: sequencing the time sequence importance scoring sequences and constructing a scoring distribution curve; Identifying a step change point or a density peak point in the scoring distribution curve, and extracting a corresponding scoring numerical value as the shunt threshold; And traversing all time windows, marking the windows with the score values larger than the shunt threshold as the key transient windows, and marking the windows with the score values smaller than or equal to the shunt threshold as the stable normal windows.
- 4. The industrial data full life cycle management method of claim 1, wherein the calculating the differential sequence for the original data sequence in the key transient window, and matching and packaging the differential sequence based on a preset global mapping dictionary, generating a high-fidelity data segment, comprises: Extracting an original data sequence in the key transient window, and calculating a differential value between adjacent data points to obtain a differential sequence; Matching the differential sequences based on a preset global mapping dictionary, and mapping the differential modes appearing at high frequencies into short code words; And carrying out entropy coding encapsulation on the differential sequence by using the global mapping dictionary, and generating a binary bit stream as the high-fidelity data segment.
- 5. The industrial data full life cycle management method of claim 1, wherein the processing the data in the stationary normal window to obtain a sparse transform coefficient matrix, and performing interval mapping and encoding on the sparse transform coefficient matrix based on a non-uniform quantizer to generate a background reduced data segment includes: Converting the data in the stable normal window into a transformation coefficient matrix by discrete cosine transformation; Generating a high-frequency component mask matrix according to a preset compression force, and setting the coefficients of the transformation coefficient matrix corresponding to a high-frequency region to zero to obtain a sparse transformation coefficient matrix; And carrying out interval mapping and coding on non-zero elements in the sparse transform coefficient matrix by adopting a non-uniform quantizer to obtain the background simplified data segment only retaining low-frequency trend information.
- 6. The industrial data full life cycle management method of claim 1, wherein the performing time domain splicing on the high-fidelity data segment and the background reduced data segment based on a preset frame structure, and establishing a metadata index including a timestamp and an importance tag, constructing a primary hybrid storage structure, and writing the primary hybrid storage structure into a preset first-level storage medium comprises: Respectively acquiring byte lengths of the high-fidelity data segment and the background reduced data segment, and constructing storage header information containing length information and version numbers; Performing physical splicing according to the sequence of storing head information-background simplified data segments-high-fidelity data segments to generate a mixed data frame, and calculating a verification feature code of the mixed data frame; Generating a metadata index for recording the initial physical address, the time span and the importance classification label of the mixed data frame; And writing the metadata index and the mixed data frame in the preset first-level storage medium in an associated mode, wherein the first-level storage medium is a solid state disk or a high-speed flash memory array.
- 7. The industrial data full lifecycle management method of claim 6, wherein the continuously recording query operations initiated by the metadata index, counting access frequencies and generating a heat distribution profile during the primary hybrid storage structure residency, comprises: Analyzing the query request aiming at the mixed data frame, and identifying the type of the data segment accessed in the query path; Respectively counting the accessed times of the high-fidelity data segment and the background simplified data segment in a preset period, and calculating access delay overhead; And quantitatively generating a retrieval efficiency index of each data segment based on the accessed times and the access delay cost, and mapping the retrieval efficiency index into the visualized heat distribution map.
- 8. The industrial data full life cycle management method of claim 1, wherein if the heat distribution map shows that the access frequency of the background reduced data segment is lower than a preset cold archiving threshold, stripping the background reduced data segment from the primary hybrid storage structure and migrating to a preset second-level storage medium to obtain a hierarchical archiving data set, comprising: Identifying cold data blocks in the heat distribution map having access frequencies below the cold archiving threshold, the cold data blocks corresponding to background reduced data segments that have not been accessed for a long period of time; reading the cold data block from the first-stage storage medium and transferring the cold data block to the preset second-stage storage medium, wherein the second-stage storage medium is a high-capacity mechanical hard disk or a magnetic tape library; And updating the physical address mapping relation in the metadata index to point to a new address of a second-level storage medium, and reserving the high-fidelity data segment and the metadata index in a first-level storage medium to form the cold-hot separated hierarchical filing data set.
- 9. An industrial data full lifecycle management system, comprising: The flow slicing and evaluating module is used for acquiring original time sequence data of industrial equipment, executing time window slicing of a preset length, calculating shannon entropy in each time window and obtaining a time sequence importance scoring sequence; The shunt judgment module is used for comparing the time sequence importance scoring sequence with a preset shunt threshold value, marking a time window with a score higher than the shunt threshold value as a key transient window, and marking a time window with a score lower than or equal to the shunt threshold value as a stable normal window; The high-fidelity processing module is used for calculating the original data sequence in the key transient window to obtain a differential sequence, and matching and packaging the differential sequence based on a preset global mapping dictionary to generate a high-fidelity data segment; The background simplifying processing module is used for processing the data in the stable normal window to obtain a sparse transformation coefficient matrix, and performing interval mapping and encoding on the sparse transformation coefficient matrix based on a non-uniform quantizer to generate a background simplified data segment; The mixed storage construction module is used for performing time domain splicing on the high-fidelity data segment and the background simplified data segment based on a preset frame structure, establishing a metadata index containing a time stamp and an importance tag, constructing a primary mixed storage structure and writing the primary mixed storage structure into a preset first-stage storage medium; The life cycle monitoring module is used for continuously recording inquiry operation initiated by the metadata index during the residence time of the primary hybrid storage structure, counting the access frequency and generating a heat distribution map; And the dynamic archiving migration module is used for stripping the background reduced data segment from the primary mixed storage structure and migrating the background reduced data segment to a preset second-level storage medium to obtain a hierarchical archiving data set if the heat distribution map shows that the access frequency of the background reduced data segment is lower than a preset cold archiving threshold.
Description
Industrial data full life cycle management method and system Technical Field The invention relates to the technical field of industrial big data storage and management, in particular to a full life cycle management method and system for industrial data. Background Currently, with the deep advancement of industrial 4.0 and intelligent manufacturing strategies, various high-frequency sensors and monitoring terminals are widely deployed on production sites, and massive and continuous time series data are generated. How to construct an efficient industrial data processing system, and to store and mine mass data in full life cycle with high efficiency, has become a key link for digital transformation of manufacturing enterprises. In one prior art, a unified sampling strategy or a fixed compression algorithm is typically used to perform full processing on the acquired streaming data. For example, the system does not distinguish whether the device is currently in a "normal state" of smooth running or in a "transient state" of abnormal fluctuation, and performs data sampling and disc landing according to a preset fixed frequency (such as 10 Hz), or applies identical compression parameters and storage strategies to data in all time periods by adopting a general lossless compression algorithm (such as a Lempel-Ziv algorithm) or a fixed lossy compression algorithm (such as a revolving door compression algorithm SDT). On the premise of ensuring the data integrity of the key transient event, the method eliminates the normal redundant information to the greatest extent and optimizes the hierarchical storage layout, which is a technical problem to be solved currently. The prior art has the problem of lacking dynamic perceptibility of data value density. Disclosure of Invention The invention provides a full life cycle management method and system for industrial data, which are used for solving the problem that the prior art lacks dynamic perception capability of data value density. In order to solve the above technical problems, the present invention provides a method for managing a full life cycle of industrial data, including: Acquiring original time sequence data of industrial equipment, executing time window segmentation of a preset length, and calculating shannon entropy in each time window to obtain a time sequence importance scoring sequence; comparing the time sequence importance scoring sequence with a preset shunt threshold, marking a time window with a score higher than the shunt threshold as a key transient window, and marking a time window with a score lower than or equal to the shunt threshold as a stable normal window; calculating an original data sequence in the key transient window to obtain a differential sequence, and matching and packaging the differential sequence based on a preset global mapping dictionary to generate a high-fidelity data segment; Processing data in the stable normal window to obtain a sparse transformation coefficient matrix, and performing interval mapping and coding on the sparse transformation coefficient matrix based on a non-uniform quantizer to generate a background simplified data segment; Based on a preset frame structure, performing time domain splicing on the high-fidelity data segment and the background reduced data segment, establishing a metadata index containing a time stamp and an importance tag, constructing a primary mixed storage structure, and writing the primary mixed storage structure into a preset first-stage storage medium; Continuously recording inquiry operation initiated by the metadata index during the residence period of the primary mixed storage structure, counting access frequency and generating a heat distribution map; And if the heat distribution map shows that the access frequency of the background reduced data segment is lower than a preset cold filing threshold, stripping the background reduced data segment from the primary mixed storage structure and migrating to a preset second-level storage medium to obtain a hierarchical filing data set. In a second aspect, the present invention provides an industrial data full lifecycle management system, comprising: The flow slicing and evaluating module is used for acquiring original time sequence data of industrial equipment, executing time window slicing of a preset length, calculating shannon entropy in each time window and obtaining a time sequence importance scoring sequence; The shunt judgment module is used for comparing the time sequence importance scoring sequence with a preset shunt threshold value, marking a time window with a score higher than the shunt threshold value as a key transient window, and marking a time window with a score lower than or equal to the shunt threshold value as a stable normal window; The high-fidelity processing module is used for calculating the original data sequence in the key transient window to obtain a differential sequence, and matching and packaging the differentia