CN-122019489-A - Active cloud-group-side lake bin data pre-fetching method for industrial small and medium-sized enterprise group

CN122019489ACN 122019489 ACN122019489 ACN 122019489ACN-122019489-A

Abstract

The invention belongs to the field of industrial Internet and intelligent manufacturing, and particularly relates to a cloud-group-side lake bin data active prefetching method for small and medium-sized industrial enterprises. The method comprises the steps of 1, constructing a cloud-group-edge three-level asymmetric storage architecture, executing static physical settlement of data in a normal state, 2, constructing a time sequence fault prediction model based on LSTM and a multi-dimensional device space association map, 3, extracting sliding window features at the edge end of an enterprise in real time, conducting LSTM online inference, 4, executing smooth anti-shake logic, judging and locking source trigger nodes based on an early warning threshold value, 5, traversing the multi-dimensional device space association map to conduct space retrieval, generating a device set to be prefetched, 6, sensing the current downlink bandwidth load, and executing multi-granularity self-adaptive prefetching scheduling based on joint scoring. The invention realizes accurate reverse preheating of cold/temperature data, and realizes cross-domain cooperation and zero-delay fault investigation while reducing storage overhead of an edge end.

Inventors

WEI ZHONGXIANG
BAI YAHAO
LIU ERWU
JIA NING
JI WEN

Assignees

同济大学

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (9)

1. The active pre-fetching method of the cloud-group-side lake bin data for the industrial small and medium enterprise group is characterized by comprising the following steps of: Step 1, constructing a cloud-group-edge three-level asymmetric storage architecture, and executing static physical sedimentation of data in a normal state; step 2, constructing a time sequence fault prediction model based on LSTM and a multidimensional equipment space association map; step 3, extracting sliding window characteristics at the edge end of the enterprise in real time, and carrying out LSTM online inference; step 4, executing smooth anti-shake logic, and judging and locking a source trigger node based on an early warning threshold value; Step 5, traversing the multi-dimensional equipment space association map to perform space retrieval, and generating an equipment set to be prefetched; and 6, sensing the current downlink bandwidth load, and executing multi-granularity self-adaptive prefetching scheduling based on the joint score.
2. The method for actively prefetching cloud-group-side lake warehouse data for small and medium-sized industrial enterprises according to claim 1, wherein in step 1, The cloud-group-edge three-level asymmetric storage architecture comprises edge end nodes, group end nodes and cloud nodes, and is specifically as follows: The edge end node instant heating layer is deployed on the production site of a specific small and medium-sized enterprise, is internally provided with a memory database and is only responsible for storing two types of data inside the enterprise, namely stream data generated by a sensor in real time; The group end node is a temperature layer, is deployed in a data center of an industrial park or a shared server of a regional small enterprise group and is used as a 'collaborative buffer zone' of a system, and is used for intensively storing middle-term service flow data which is gradually returned from the edge end of each single enterprise in the park; the cloud node is a cold layer, which is an object storage service deployed in industry public cloud and used for storing the original historical data of the full-quantity nondestructive sensor exceeding the residence time of the temperature layer for a long time; the initial static sedimentation circulation process is that under the normal state that no abnormality occurs, the system executes the unidirectional static physical sedimentation circulation process based on the preset time window threshold value, specifically, the current time is set as The sensor data block is time stamped as Setting a first sedimentation threshold value at the edge end The group end sets a second sedimentation threshold value When (1) Data from edge to group end is settled when And in the process, the group end is settled to the cloud for long-term archiving.
3. The method for actively prefetching cloud-group-side lake warehouse data for small and medium-sized industrial enterprises according to claim 1, wherein step 2 comprises the following steps, At the cloud/group end with sufficient computing power, offline computation and model training of a 'space-time double-drive engine' are finished in advance; step 2.1, training an LSTM prediction model; Using historically archived sensor stream data to fix sliding windows Intercepting sequence input Adopting reverse time window cutting method to make fault occurrence time Front length of Labeling samples within a window of (a) as The normal period sample label is Extracting long-range degradation characteristics through a multilayer LSTM network, and mapping the long-range degradation characteristics into fault probability through a Sigmoid function at an output layer ; The binary cross entropy loss function is adopted for carrying out back propagation updating weight during training: After loss convergence, issuing and deploying the lightweight LSTM model to the enterprise edge; Step 2.2, constructing a multidimensional equipment space association map ; Offline mining historical fault work orders and query logs, and constructing with sensors as nodes The associated weight is an edge Topology of two nodes And (3) with Associated weights between The total co-occurrence probability and the physical distance are obtained as follows: Wherein, the Triggering queries for a set historical troubleshooting time window At the same time retrieve Conditional probability of (2); The actual physical topology hop count distance between two devices; Is a confidence adjustment factor; is the distance attenuation coefficient; After the weight calculation between every two nodes is completed, the system aims at all the participating monitoring The sensor nodes map the associated weight values according to the row-column index to construct and generate a node A correlation map adjacency matrix of dimensions, in which matrix the first Line 1 Matrix elements of columns, i.e. corresponding nodes And node Associated weights between And prescribing the main diagonal elements of the matrix as And after calculation and construction are completed, carrying out serialization processing on the correlation spectrum adjacency matrix, generating a static spectrum file, and presetting and storing the static spectrum file in an edge memory for space adjacency traversal in the follow-up online triggering.
4. The cloud-group-side lake warehouse data active pre-fetching method for the industrial small and medium-sized enterprises according to claim 3 is characterized in that a correlation map dynamic update mechanism is introduced at a group end/cloud end, wherein the method comprises periodic time trigger and event trigger, wherein the periodic time trigger is monthly time trigger; the dynamic updating process includes re-evaluating and updating the associated weight between nodes by using the latest historical fault work order and query log And generating a correlation map adjacent matrix, serializing, and then sending to the edge end to update and cover the original map file.
5. The method for actively prefetching cloud-group-side-lake warehouse data for small and medium-sized industrial enterprises according to claim 1, wherein step 3 is specifically that during online operation of the system, the edge end node collects the sensor in real time Data, and obtaining a sequence after data cleaning Providing LSTM predictive model to sensor, forward propagating model, real-time calculating and outputting sensor In the future Transient fault probability within a window 。
6. The method for actively prefetching cloud-group-side lake warehouse data for small and medium-sized industrial enterprises of claim 5, wherein step 4 comprises the following steps, The system introduces a moving average anti-shake algorithm to the transient probability sequence: Wherein, the Smoothing the sampling period number for anti-shake; Setting a safety early warning threshold value When meeting the following requirements At this time, the sensor is judged Monitoring device degradation trend establishment, marking it as a "source trigger node "And generates signaling to activate the spatial diffusion mechanism.
7. The method for actively prefetching cloud-group-side lake storehouse data for small and medium-sized industrial enterprises according to claim 6, wherein, Is taken by a sensor Is the actual physical sampling frequency of (a) And a preset anti-shake physical time window Dynamically determined, the specific mathematical mapping relation thereof satisfies: , Wherein, the Representing a rounding-up operation, anti-shake physical time window Is selected to cover the typical maximum duration of the transient electromagnetic interference in the field.
8. The method for actively prefetching cloud-group-side lake warehouse data for small and medium-sized industrial enterprises according to claim 1, wherein the step 5 is specifically, Setting topology cutoff threshold Edge-side source trigger node Traversing preset adjacent matrix of association graph as circle center, extracting all satisfying association weight Neighboring node of (a) Constitute the target set to be prefetched 。
9. The method for actively prefetching cloud-group-side lake warehouse data for small and medium-sized industrial enterprises according to claim 1, wherein step 6 is specifically to sense the current downlink bandwidth load and execute multi-granularity adaptive active prefetching scheduling based on joint scoring; Source node Aggregation and collection Merging to form node set to be evaluated for each target node Calculating its "prefetch urgency" spatiotemporal joint score using an evaluation function The following are provided: wherein for the source node Self, constantly fetch The weight parameter satisfies ; Meanwhile, the edge end monitors the downlink bandwidth load rate from the current cold/warm layer to the hot layer in real time Setting a network congestion threshold ; Setting a high value score threshold Medium scoring threshold Traversing the node set to be evaluated, and dynamically issuing a prefetching instruction according to the following closed-loop rule: Full-scale high-precision waveform prefetch if And the current network is unblocked Then issue instruction to cold/warm layer and pull node History high-precision original waveform in similar time period to edge thermal layer; Downsampling statistical feature prefetch if Or although the score is extremely high But currently network congestion To prevent network collapse triggering degradation, nodes are pulled only from the far end Downsampling the statistical eigenvalues in the time domain to a thermal layer; Neglecting pull operations if The system judges weak related interference and does not execute any cross-domain pulling action; So far, the system completes the accurate reverse scheduling closed loop of the data from the remote collaborative storage pool to the enterprise edge memory.

Description

Active cloud-group-side lake bin data pre-fetching method for industrial small and medium-sized enterprise group Technical Field The invention belongs to the field of industrial Internet and intelligent manufacturing, and particularly relates to a cloud-group-side lake bin data active prefetching method for small and medium-sized industrial enterprises. Background In the existing equipment monitoring and intelligent operation and maintenance scene facing to small and medium-sized enterprises in industry, various sensors in a single enterprise can continuously generate high-frequency time sequence data. Because the storage resources of the edge computing nodes (such as enterprise local servers and factory gateways) in a single small and medium-sized enterprise are very limited, the prior art generally adopts a traditional cloud-edge collaborative two-level storage architecture and combines a data life cycle management scheme based on static rules. The main operation mechanism is as follows: The cloud-edge two-level storage architecture based on the static time threshold is characterized in that the system is physically divided into a hot storage layer of a single enterprise edge end and a cold storage layer of an industry cloud. Time sequence data acquired by sensors in middle and small enterprises in real time are firstly stored in local storage of the enterprises. The system typically sets a fixed static time threshold (e.g., retains the last 7 days of data). When the residence time of the data exceeds the threshold, the system executes a timing task, packages, compresses and uploads the part of the data to an industry cloud center database for long-term archiving, and deletes a local copy of the data at the edge end of an enterprise to release the storage space. And a passive data pulling mechanism triggered afterwards, wherein in daily monitoring, the enterprise edge only processes and displays real-time sensor waves. When a particular sensor reading at an enterprise site exceeds a safety threshold and triggers a system alarm, the enterprise operation and maintenance engineer may intervene in the troubleshooting. At this time, if the engineer needs to retrieve the historical normal waveform or the historical fault slice of the sensor for several months, the system initiates a data downloading instruction to the industry cloud after receiving the explicit query request of the engineer, and pulls the historical data of the sensor from the cloud cold store to the enterprise edge for decompression and display. Independent query logic based on a single device dimension, existing data retrieval engines treat the data of each sensor as independent time-series objects. In the process of data archiving and subsequent historical data pulling, the system only executes accurate matching inquiry according to a single device ID and a time range input by a user, and independently schedules data blocks of each sensor according to a mode of 'one-to-one response'. In the complex equipment operation and fault detection scene of small and medium-sized enterprises in industry, the prior art has the following three obvious and objective disadvantages, and the disadvantages severely restrict the coordination and response efficiency of the system: (1) The cloud-edge two-stage static storage architecture lacks elasticity, namely, for high-frequency shared data in an industrial park or an industrial chain cooperative center, if the high-frequency shared data is directly and roughly settled from a single enterprise to an industry cloud, the calling cost is extremely high when the high-frequency shared data are cooperated across enterprises and production lines. Existing two-level architectures lack a "warm-layer buffer" that is physically directed to a campus or small business group. (2) The query of the passive data pulling is high in delay, and the existing system adopts a post passive response mode of failure-manual request-cloud downloading. Cloud-side cold data is huge in volume and limited by public network bandwidth of small and medium-sized enterprises, and long time is usually needed for temporarily pulling massive historical data across networks. The high delay causes that engineers cannot compare historical data at the first time after arriving at the site, so that the equipment downtime is prolonged, and larger production loss is caused. (3) The lack of spatial topological correlation results in a strong physical and technological correlation between "data islands" and multiple pullings: sensors of industrial equipment. The isolated data query logic of the existing system causes that when the sensor A alarms, the system only pulls the data of the sensor A, and if the engineer finds that the strongly related sensors B and C need to be checked after analysis, the pulling request must be reinitiated for a plurality of times. The data scheduling mode lacking the associated context not only increases the overhead of network