CN-122019595-A - SCADA system historical data query optimization system and method

CN122019595ACN 122019595 ACN122019595 ACN 122019595ACN-122019595-A

Abstract

The invention relates to a historical data query optimization system and method of a SCADA (Supervisory control and data acquisition) system, in particular to the field of industrial automation, wherein a storage architecture is dynamically optimized through a data thermal layering module, high-frequency access data automatically reside in a high-speed storage layer, hot spot data query efficiency is remarkably improved, an intelligent preloading module is used for converting passive query into active loading based on working condition context prejudging data requirements, response delay of a key operation scene is effectively reduced, a multidimensional index construction module is used for accurately positioning data through a dynamic index strategy, full-table scanning resource consumption is avoided, and a query execution caching module is used for constructing an efficient closed loop from a query request to a result return by combining an intelligent compression and self-adaptive adjustment mechanism. The whole set of system greatly reduces the time consumption of complex condition retrieval while guaranteeing the query stability, and is particularly suitable for high-concurrency industrial monitoring scenes.

Inventors

YU BO
CHEN YANLEI
SU RUIZHI
ZHANG LIWU
CAO LIANG
LI LINGXIN
QI YANBIN

Assignees

北京华能新锐控制技术有限公司
西安热工研究院有限公司
华能吉林发电有限公司新能源分公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. The SCADA system historical data query optimization system is characterized by comprising a data thermal layering module, an intelligent preloading module, a multidimensional index building module and a query execution cache module, wherein the data thermal layering module is used for storing data to be queried; The data thermal layering module runs according to a preset daily period after the system is started, takes time sequence data sets in an SCADA historical database as operation objects, adopts a configuration script to count the access frequency of each data set in approximately 7 days, dynamically distributes data to a memory caching layer, an SSD storage layer and an HDD archiving layer according to a high-frequency threshold value, a medium-frequency threshold value and a low-frequency threshold value, triggers the ascending and degrading scheduling of the data among the layers according to the continuity condition of data access, and finally outputs a data thermal distribution map comprising a data identifier, a current storage layer and thermal values to the intelligent preloading module; The intelligent preloading module is triggered when a data thermal distribution diagram is received and a SCADA system generates a real-time working condition event, takes an operator historical query sequence and a real-time working condition label as operation objects, and based on the similarity of the query sequence and the matching degree of the working condition label, a weighted calculation model is applied to obtain the correlation degree, and when the correlation degree is greater than or equal to a preset correlation degree threshold value, target data is positioned from the data thermal distribution diagram and preloaded to a memory cache layer, and a preloading decision record is generated at the same time to the multidimensional index construction module; The multidimensional index construction module is triggered when a preload decision record is received or new data is put in storage, takes equipment identification, data type, time stamp and working condition label as index fields, preferentially constructs a combined index based on a B+ tree structure aiming at high-association data in the preload decision record, maintains basic indexes, parameter indexes and working condition indexes for all data, synchronously updates index nodes in an increment period T2, dynamically starts and stops indexes according to the frequency of index use, and finally outputs the latest index mapping table to the query execution cache module; The query execution caching module is triggered when a query request and an index mapping table are received, whether an effective compressed caching result exists in a memory caching layer or not is firstly searched, if the effective compressed caching result does not exist, the index mapping table is called to position data in hierarchical storage, the data is extracted, the result is returned after the data is compressed by an LZ4 algorithm, the cache is updated, query time consumption and the cache hit rate are monitored in real time, and when the query time consumption is greater than the maximum tolerance time consumption or the cache hit rate is smaller than the minimum hit rate, a self-adaptive adjustment signal is sent to the data thermal layering module and the multidimensional index construction module.
2. The SCADA system historical data query optimization system of claim 1, wherein the data thermal layering module operates according to a first preset daily period after the system is started, time sequence data sets in an SCADA historical database are used as operation objects, configuration scripts are adopted to count access frequency of each data set in approximately seven days, and data are dynamically distributed to a memory cache layer, a solid state disk storage layer and a hard disk archiving layer according to a high-frequency threshold, a medium-frequency threshold and a low-frequency threshold.
3. The SCADA system historical data query optimization system of claim 2, wherein the high frequency threshold, the medium frequency threshold, and the low frequency threshold in the data thermal layering module are dynamically adjusted based on thermal entropy values; Meanwhile, the data thermal layering module triggers the ascending and degrading scheduling of data among layers according to the continuity condition of data access, wherein the continuity condition means that a data set has continuous multiple access events in the last statistical period and the last access time is within the preset grace period; Finally, the data thermal layering module outputs a data thermal distribution map containing the data identifier, the current storage level and the thermal value to the intelligent preloading module.
4. The SCADA system historical data query optimization system according to claim 3, wherein the intelligent preloading module is triggered when a data thermal distribution diagram is received and a real-time working condition event is generated by the SCADA system, an operator historical query sequence and a real-time working condition label are taken as operation objects, a weighted calculation model is applied to obtain a relevance degree based on the similarity of the query sequence and the matching degree of the working condition label, and when the relevance degree is greater than or equal to a preset relevance degree threshold value, target data is positioned from the data thermal distribution diagram and preloaded to a memory cache layer, and a preloading decision record is generated to the multidimensional index construction module.
5. The SCADA system historical data query optimization system of claim 4, wherein in the intelligent preloading module, query sequence similarity is calculated by an improved dynamic time warping algorithm that incorporates an adaptive weight distribution mechanism while taking into account timing alignment bias and thermal weight; The matching degree of the working condition labels is calculated through a multi-mode embedded space, semantic similarity and context similarity are combined, and the similarity is fused through a composite function, and the weighted calculation model dynamically distributes the weights of the query sequence similarity and the matching degree of the working condition labels by adopting an attention mechanism to obtain the comprehensive relevance.
6. The system for optimizing historical data query of SCADA system according to claim 5, wherein the multidimensional index construction module is triggered when a preloading decision record is received or new data warehouse entry is detected, a combined index based on a B+ tree structure is preferentially constructed for high-association data in the preloading decision record by taking a device identifier, a data type, a time stamp and a working condition label as index fields, basic indexes, parameter indexes and working condition indexes are maintained for all data, index nodes are synchronously updated at preset increment periods, and an index is dynamically started and stopped according to index use frequency, and finally an up-to-date index mapping table is output to the query execution cache module.
7. The system for optimizing historical data query of SCADA system according to claim 6, wherein in the multidimensional index construction module, when updating index nodes synchronously, the optimal batch size of incremental update is calculated through an optimization model to balance input and output costs and processor costs, the dynamic start-stop index calculates utility values of the index through utility functions, the utility values are based on index use frequency, storage size and query delay, and the utility values are combined with weight coefficients, and when the utility values are lower than a preset threshold, the index is stopped, otherwise, the index is started.
8. The system for optimizing historical data query of SCADA system according to claim 7, wherein in the query execution buffer module, when a query request and an index mapping table are received, triggering is performed, whether an effective compressed buffer result exists in a memory buffer layer is searched first, if yes, the result is directly returned, if not, the index mapping table is called to locate the position of data in layered storage, the data is extracted, the result is returned after being compressed by an LZ4 compression algorithm, the buffer is updated, query time consumption and the buffer hit rate are monitored in real time, and when the query time consumption exceeds the maximum tolerance time consumption or the buffer hit rate is lower than the minimum hit rate, an adaptive adjustment signal is sent to the data thermal layering module and the multidimensional index construction module.
9. The system for optimizing historical data query of SCADA system according to claim 8, wherein the query execution caching module determines a cache replacement policy by a cache priority score calculation model that calculates priority scores based on historical access frequency, thermodynamic value, compressed data size, and last access time interval of the data items when updating the cache, wherein query time is calculated using an exponentially weighted moving average to smooth short term fluctuations, and wherein cache hit rate is calculated using an exponentially decaying weighted sum to emphasize recent queries.
10. The SCADA system historical data query optimization method based on any one of the claims 1-9 is characterized by comprising the following steps: S1, running according to a preset daily period after the system is started, taking a time sequence data set in a SCADA historical database as an operation object, adopting a configuration script to count the access frequency of each data set in approximately 7 days, dynamically distributing data to a memory caching layer, an SSD storage layer and an HDD archiving layer according to a high-frequency threshold, a medium-frequency threshold and a low-frequency threshold, triggering the ascending and degrading scheduling of the data among the layers according to the continuity condition of data access, and finally outputting a data thermodynamic distribution map comprising a data identifier, a current storage layer and a thermodynamic value to an intelligent preloading module; S2, triggering when a data thermal distribution diagram is received and a SCADA system generates a real-time working condition event, taking an operator historical query sequence and a real-time working condition label as operation objects, applying a weighted calculation model to obtain a degree of association based on the similarity of the query sequence and the matching degree of the working condition label, and when the degree of association is greater than or equal to a preset degree of association threshold, positioning target data from the data thermal distribution diagram and preloading the target data to a memory cache layer, and generating a preloading decision record to a multidimensional index construction module; S3, triggering when a pre-load decision record is received or new data is put in storage, taking a device identifier, a data type, a time stamp and a working condition label as index fields, preferentially constructing a combined index based on a B+ tree structure aiming at high-association data in the pre-load decision record, maintaining basic indexes, parameter indexes and working condition indexes for all data, synchronously updating index nodes in an increment period T2, dynamically starting and stopping the indexes according to the frequency of index use, and finally outputting the latest index mapping table to a query execution cache module; And S4, triggering when a query request and an index mapping table are received, firstly searching whether an effective compressed cache result exists in a memory cache layer, if not, calling the index mapping table to position data in hierarchical storage, extracting the data, compressing the data through an LZ4 algorithm, returning the result, updating the cache, monitoring the query time consumption and the cache hit rate in real time, and when the query time consumption is greater than the maximum tolerance time consumption or the cache hit rate is less than the minimum hit rate, sending a self-adaptive adjustment signal to a data thermal layering module and a multidimensional index construction module.

Description

SCADA system historical data query optimization system and method Technical Field The invention relates to the technical field of industrial automation, in particular to a historical data query optimization system of a SCADA system. Background In the field of industrial automation, a data acquisition and monitoring System (SCADA) is used as a key infrastructure and is widely applied to real-time monitoring and process control in industries such as electric power, petrochemical industry, rail transit and the like. With the rapid development of industrial internet of things, modern SCADA systems need to process massive time series data, such as fan rotation speed, temperature, vibration signals and other equipment operation parameters, and these data are continuously generated at millisecond frequency and stored in a historical database. In an actual operation scenario, operators often need to trace back historical data for analysis, such as diagnosing equipment failures, optimizing operational parameters, or generating compliance reports. Typical query operations include retrieving historical data curves for a particular device under certain conditions (e.g., full load operating phases), or comparing operating parameter trends for different time periods. Such queries often involve a combination of multidimensional conditions, such as specifying device numbers, data types, time ranges, and operating condition labels simultaneously, and require response times to be controlled within seconds to meet real-time decision requirements. The data scale of the industrial field can reach the TB level, and the inquiry concurrency is obviously increased in the peak time, which puts a high requirement on the history data inquiry efficiency. There are significant limitations to the traditional data query schemes commonly employed by current SCADA systems. Most systems rely on standard indexing mechanisms for relational databases, such as time-stamp indexing based on B-tree structures, or compound indexing in combination with device identification. However, such static index architecture is difficult to accommodate for the highly dynamic nature of industrial data. Because of the significant difference in access frequencies of different data, high-frequency access hot spot data (such as recent fault equipment parameters) and low-frequency access archive data (such as annual overhaul records) are mixed and stored in the same medium, so that query performance fluctuates severely. The prior art attempts to alleviate this problem by data tiered storage schemes, such as storing hot spot data in a memory cache and cold data in a hard disk array. However, such schemes lack an intelligent data heat sensing mechanism, can only perform static layering based on simple rules (such as the latest access time), and cannot dynamically adjust data distribution according to an actual query mode. A further significant problem is that conventional systems do not consider the correlation between operating condition context and query behavior and cannot predict the operator's data requirements at a particular alarm event. When equipment abnormality occurs, the system still needs to passively respond to the query request, and data is retrieved layer by layer from the underlying storage, so that response delay is up to several seconds or even minutes, and fault handling efficiency is seriously affected. The query delay not only prevents real-time decision, but also can cause overload of system load under the multi-user concurrency scene, thereby forming performance degradation cycle. Therefore, how to realize the self-adaptive intelligent hierarchical storage based on the data access mode and preload key data by combining the working condition context becomes a core technical bottleneck for improving the historical data query performance of the SCADA system. Disclosure of Invention The invention provides a system and a method for optimizing historical data query of a SCADA system aiming at the technical problems in the prior art so as to solve the problems in the prior art. The invention solves the technical problems as follows, and particularly relates to a historical data query optimization system of an SCADA system, which comprises a data thermal layering module, an intelligent preloading module, a multidimensional index building module and a query execution cache module, wherein the data thermal layering module is used for storing data of a user; The data thermal layering module runs according to a preset daily period after the system is started, takes time sequence data sets in an SCADA historical database as operation objects, adopts a configuration script to count the access frequency of each data set in approximately 7 days, dynamically distributes data to a memory caching layer, an SSD storage layer and an HDD archiving layer according to a high-frequency threshold value, a medium-frequency threshold value and a low-frequency threshold val