CN-121984911-A - Multi-source heterogeneous data quasi-real-time synchronization method and system based on integrated lake and warehouse architecture
Abstract
The invention discloses a near real-time synchronization method and a near real-time synchronization system for multi-source heterogeneous data based on a lake and warehouse integrated framework, and relates to the technical field of power system automation and data communication, wherein the method comprises the steps that a sending end collects the multi-source heterogeneous data and stores the multi-source heterogeneous data in the lake and warehouse integrated framework in a grading manner; the method comprises the steps of converting a data format according to the memory allowance of a channel, injecting the data format into a corresponding channel according to the data grade, calculating the priority of each transmission branch at a transmission node, selecting the highest priority branch to form a priority transmission path, detecting the performance of the priority transmission path, dynamically reconstructing the transmission path according to the detection result, carrying out consistency check and timeliness evaluation on the data by a receiving end, injecting the data into a data lake or a data warehouse if the data passes the check, otherwise, requesting retransmission and executing data complement and time sequence reforming. According to the invention, through a data grading, dynamic routing and path reconstruction mechanism, the real-time performance, the reliability and the resource utilization efficiency of cross-station data transmission are improved, and the stable operation of key business of a power grid is ensured.
Inventors
- ZUO TIANCAI
- ZHANG YUJI
- XIAO JIAN
- XIE ZHIQI
- DU ZEXIN
- SU QIAN
- LUO YU
- LI LIN
- TANG XIAOBO
Assignees
- 贵州乌江水电开发有限责任公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260324
Claims (10)
- 1. The multi-source heterogeneous data quasi-real-time synchronization method based on the integrated lake and warehouse architecture is characterized by comprising the following steps of: The method comprises the steps of S1, defining two substations to be subjected to data interaction as a sending end and a receiving end respectively, wherein the sending end and the receiving end are connected with a lake and warehouse integrated framework through a plurality of data transmission nodes; Step S2, converting transmission formats of multi-source heterogeneous data subjected to the grading treatment by calculating memory allowance of each data transmission channel, and sequentially injecting the multi-source heterogeneous data subjected to the format conversion into corresponding data transmission channels according to data grades; step S3, performance detection is carried out on each data transmission branch path on the priority transmission path, and according to the detection result, the priority transmission path is split and recombined to realize dynamic reconstruction of the transmission path; And S4, receiving the transmitted multi-source heterogeneous data by a receiving end, performing consistency check and timeliness evaluation on the multi-source heterogeneous data, if the check is passed, injecting the data into a target data lake or a data warehouse in the lake and warehouse integrated architecture, if the check is not passed, requesting data retransmission in a designated period from a transmitting end, performing data complement and time sequence reforming on the receiving end, and after the processing is finished, injecting the data into the lake and warehouse integrated architecture.
- 2. The method for quasi-real-time synchronization of multi-source heterogeneous data based on a lake and reservoir integrated architecture according to claim 1, wherein the step S1 comprises the following steps: the method comprises the steps of S1-1, establishing communication connection between a sending end and a receiving end through a data transmission node and a lake and bin integrated architecture, wherein the lake and bin integrated architecture is composed of a data lake module, a data warehouse module and a lake and bin interactive gateway module, wherein the data lake storage module is used for storing original multi-source heterogeneous data, the data warehouse processing module is used for storing standardized structured data, and the lake and bin interactive gateway module is responsible for transferring and converting data between lakes and bins; The method comprises the following steps of S1-2, deploying a multi-source heterogeneous data acquisition terminal at a transmitting end, acquiring multi-source heterogeneous data in a transformer substation in real time by the acquisition terminal, and generating an original multi-source heterogeneous data record once per acquisition of the multi-source heterogeneous data, wherein the original multi-source heterogeneous data record comprises a data type, an acquisition time stamp, a data value, a device number and a data source identifier; Step S1-3, establishing a plurality of data transmission channels between each data transmission node and the next transmission node according to the data transmission direction and the hierarchical data set, wherein dynamically distributing the number of the data transmission channels corresponding to each level of data according to a preset data hierarchical rule comprises the following steps: The method comprises the steps of collecting instantaneous data quantity Q i of each level of data in real time, wherein i represents a data level identifier, calculating the data quantity duty ratio P i of each level of data, and the calculation formula is as follows: ; Wherein, the Representing the sum of instantaneous data amounts of all level data; According to the transmission priority weight W i of each level of data, calculating the distribution weight A i of each level of data channel: ; According to the distribution weight A i of each grade data channel, the number N i of data transmission channels to be distributed for each grade data is calculated: ; where N is the total number of available data transmission channels, Assigning a sum of weights to all classes of channels; Dynamically distributing the corresponding number of data transmission channels to each level of data according to the calculated number N i of each level of data transmission channels; And configuring independent channel identifiers for each data transmission channel, and binding the independent channel identifiers with corresponding data grades.
- 3. The method for quasi-real-time synchronization of multi-source heterogeneous data based on integrated lake and reservoir architecture according to claim 1, wherein the step S2 comprises the following steps: S2-1, collecting the total memory capacity and the used memory capacity of each data transmission channel in real time through a channel state monitoring unit, wherein the memory margin of each data transmission channel is equal to the difference value between the total memory capacity and the used memory capacity, selecting a corresponding data transmission format according to the proportion of the memory margin to the total memory capacity, and injecting data into the corresponding data transmission channel according to the channel identification after format conversion is completed; S2-2, dividing each data channel of the current data transmission node into a plurality of data transmission branches according to link branches in a network topology structure; And S2-3, sequencing the data transmission branches from high to low according to the priority, selecting the corresponding data transmission branches according to the number of the allocated data transmission passbands, and preferentially selecting the highest priority data transmission branch as the next data transmission channel to form a dynamically adjusted priority transmission path.
- 4. The method for quasi-real-time synchronization of multi-source heterogeneous data based on integrated lake and reservoir architecture according to claim 1, wherein the step S3 comprises the following steps: S3-1, collecting transmission performance data of each data transmission branch on a priority transmission path at fixed time intervals, wherein the transmission performance data comprises collected time stamps, transmission delay of the priority transmission path at a time interval and a data packet loss rate; S3-2, respectively constructing a first time sequence diagram of which the transmission delay changes along with time and a second time sequence diagram of which the data packet loss rate changes along with time, wherein each time sequence diagram takes the acquisition time as a horizontal axis and the corresponding transmission performance data as a vertical axis, marks the performance data of each acquisition period as coordinate points in the corresponding time sequence diagram, and sequentially connects each coordinate point according to time sequence to form a performance curve corresponding to each time sequence diagram; S3-3, respectively carrying out slope calculation on adjacent coordinate points of each time sequence diagram to obtain a first slope set K1 of a first time sequence diagram and a second slope set K2 of a second time sequence diagram, and making a perpendicular to a horizontal axis by using coordinate points corresponding to the same acquisition time in the two time sequence diagrams, wherein the perpendicular and performance curves of the two time sequence diagrams are respectively intersected at one point, and slope values corresponding to the two coordinate points are recorded as slope combinations (K1 x, K2 x), wherein x is the corresponding acquisition time; s3-4, carrying out trend judgment on each group of slope combinations, if K1x and K2x are negative values, judging that the performance of the data transmission branch circuit is in a forward trend under the current acquisition time, and reserving the current data transmission branch circuit, and if any slope value is positive value, replacing the data transmission branch circuit with the data transmission branch circuit corresponding to the next priority.
- 5. The method for quasi-real-time synchronization of multi-source heterogeneous data based on integrated lake and reservoir architecture according to claim 1, wherein the step S4 comprises the following steps: S4-1, after receiving the multi-source heterogeneous data transmitted by the priority transmission path, the receiving end carries out consistency check on the multi-source heterogeneous data, compares whether the check code of the received multi-source heterogeneous data is consistent with the expected check code of the transmitting end, and judges the timeliness of the data and whether the data transmission time length is lower than the maximum allowable transmission time length of the corresponding data grade; S4-2, if the consistency passes the verification and the timeliness reaches the standard, respectively injecting the multi-source heterogeneous data into a data lake or a data warehouse in the lake and warehouse integrated framework; s4-3, if the consistency check fails or the timeliness does not reach the standard, the receiving end sends a retransmission request to the sending end, the sending end retransmits the corresponding missing data after receiving the request, the receiving end receives the retransmission data in the cache, performs data complement and time sequence reforming operations, and fills the corresponding storage unit after the processing is completed.
- 6. The multi-source heterogeneous data quasi-real-time synchronization system based on the lake and warehouse integrated architecture is used for executing the multi-source heterogeneous data quasi-real-time synchronization method based on the lake and warehouse integrated architecture, and is characterized by comprising a data grading module, a format conversion module, a routing module, a path management module and a receiving and checking module; The data grading module is used for grading the collected multi-source heterogeneous data at the transmitting end, dynamically distributing the number of data transmission channels according to the data grade, and realizing grading management and channel binding of the data; The format conversion module is used for dynamically selecting a data transmission format according to the memory allowance of the data transmission channel, carrying out format conversion on the classified data, and injecting the converted data into the corresponding data transmission channel; The routing module is used for calculating the priority of each data transmission branch at the same transmission node, and selecting the branch with the highest priority as a next-hop transmission channel to form a priority transmission path; The path management module is used for detecting the performance of the priority transmission path and dynamically reconstructing the transmission path according to the detection result; the receiving and checking module is used for carrying out consistency check and timeliness evaluation on the received data at the receiving end, and carrying out data warehousing or retransmission completion processing according to the check result.
- 7. The multi-source heterogeneous data quasi-real-time synchronization system based on the integrated lake and warehouse architecture of claim 6, wherein the data classification module comprises an acquisition unit, a classification unit and a channel distribution unit; the acquisition unit is used for acquiring multi-source heterogeneous data in the transformer substation in real time through an acquisition terminal arranged at the transmitting end to generate an original multi-source heterogeneous data record, wherein the record comprises a data type, an acquisition time stamp, a data value, an equipment number and a data source identifier; The grading unit is used for grading the original data into a plurality of grades according to the influence degree of the multi-source heterogeneous data on the safe operation of the power grid and forming a grading data set; The channel allocation unit is used for dynamically calculating the number of data transmission channels to be allocated for each grade of data according to the instantaneous data quantity and the transmission priority weight of each grade of data, configuring independent channel identifiers for each channel, and binding the channels with corresponding data grades.
- 8. The multi-source heterogeneous data quasi-real-time synchronization system based on the integrated lake and warehouse architecture of claim 6, wherein the routing module comprises a shunt dividing unit, a priority calculating unit and a path selecting unit; the branching dividing unit is used for dividing each data channel of the current data transmission node into a plurality of data transmission branches according to link branches in a network topology structure; the priority calculating unit is used for calculating the priority score of each transmission branch according to the data grade and the branch load; The path selection unit is used for sequencing the data transmission branches from high to low according to the priority, selecting the corresponding data transmission branches according to the number of the allocated data transmission passbands, and preferentially selecting the highest priority data transmission branch as the next data transmission channel to form a dynamically adjusted priority transmission path.
- 9. The multi-source heterogeneous data quasi-real-time synchronization system based on the integrated lake and warehouse architecture of claim 6, wherein the path management module comprises a performance acquisition unit, a time sequence construction unit and a trend judgment unit; The performance acquisition unit is used for acquiring transmission performance data of each data transmission branch on the priority transmission path at fixed time intervals, wherein the transmission performance data comprises acquired time stamps, transmission time delay and data packet loss rate; The time sequence construction unit is used for respectively constructing a first time sequence diagram of which the transmission delay changes along with time and a second time sequence diagram of which the data packet loss rate changes along with time, marking the performance data of each acquisition period as coordinate points and connecting the coordinate points according to time sequence to form a performance curve; The trend judging unit is used for carrying out slope calculation on adjacent coordinate points of each time sequence diagram to obtain a first slope set and a second slope set, carrying out trend judgment on slope combinations corresponding to the same acquisition time, reserving current data transmission branches if the slope values are negative values, and replacing the data transmission branches with data transmission branches corresponding to the next priority if any slope value is positive value.
- 10. The multi-source heterogeneous data quasi-real-time synchronization system based on the integrated lake and warehouse architecture of claim 6, wherein the receiving and checking module comprises a checking and evaluating unit, a data warehouse-in unit and a retransmission completion unit; the verification evaluation unit is used for carrying out consistency verification on the received multi-source heterogeneous data at the receiving end, comparing whether the verification code of the received data is consistent with the expected verification code of the sending end, judging the timeliness of the data, and judging whether the data transmission time length is lower than the maximum allowable transmission time length of the corresponding data grade; The data warehouse unit is used for injecting core data and important data into the data warehouse and injecting conventional data into the data lake when the consistency check passes and the timeliness reaches the standard; And the retransmission completion unit is used for sending a retransmission request to the sending end when the verification fails or the timeliness does not reach the standard, receiving retransmission data in the buffer of the receiving end, executing the operations of data completion and time sequence reforming, and injecting the data into the corresponding storage unit after the processing is completed.
Description
Multi-source heterogeneous data quasi-real-time synchronization method and system based on integrated lake and warehouse architecture Technical Field The invention relates to the technical field of power system automation and data communication, in particular to a multi-source heterogeneous data quasi-real-time synchronization method and system based on a lake and warehouse integrated architecture. Background In the construction process of the modern intelligent power grid, the construction of a substation cooperative control and protection system based on a high-speed communication network is significant, the system depends on low time delay of data transmission between adjacent transformer stations, can realize advanced applications such as regional fault quick isolation and the like, and provides core support for intelligent management and control of the power grid. The existing data transmission and synchronization methods between adjacent transformer stations mostly adopt preset periodic polling communication or a simple message transmission mechanism. However, in application scenarios such as relay protection, which have strict requirements on data timeliness and consistency, the method has obvious defects. For example, when a transient fault such as a short circuit occurs in the power grid, the related protection command and the measurement data need to complete cross-station interaction within a millisecond time window. However, in actual operation, once the network has traffic burst, path switching or equipment load rising, the traditional communication mechanism is prone to message transmission delay, jitter or even loss, so that data cannot be sent within a decision time window. Such communication failures may further cause synergistic anomalies in the protection system, particularly in the form of longitudinal differential protection lockout or false operation, and logic conflicts between off-site backup power automatic switching devices. These problems may not only enlarge the scope of fault impact, but may also induce cascading failures, which pose a serious threat to the safe operation of the main network. Disclosure of Invention The invention aims to provide a multi-source heterogeneous data quasi-real-time synchronization method and system based on a lake and warehouse integrated architecture, so as to solve the problems in the prior art. In order to achieve the purpose, the invention provides the technical scheme that the multi-source heterogeneous data quasi-real-time synchronization method based on a lake and warehouse integrated architecture comprises the following steps: The method comprises the steps of S1, defining two substations to be subjected to data interaction as a sending end and a receiving end respectively, wherein the sending end and the receiving end are connected with a lake and warehouse integrated framework through a plurality of data transmission nodes; Step S2, converting transmission formats of multi-source heterogeneous data subjected to the grading treatment by calculating memory allowance of each data transmission channel, and sequentially injecting the multi-source heterogeneous data subjected to the format conversion into corresponding data transmission channels according to data grades; step S3, performance detection is carried out on each data transmission branch path on the priority transmission path, and according to the detection result, the priority transmission path is split and recombined to realize dynamic reconstruction of the transmission path; And S4, receiving the transmitted multi-source heterogeneous data by a receiving end, performing consistency check and timeliness evaluation on the multi-source heterogeneous data, if the check is passed, injecting the data into a target data lake or a data warehouse in the lake and warehouse integrated architecture, if the check is not passed, requesting data retransmission in a designated period from a transmitting end, performing data complement and time sequence reforming on the receiving end, and after the processing is finished, injecting the data into the lake and warehouse integrated architecture. Further, step S1 includes: the method comprises the steps of S1-1, establishing communication connection between a sending end and a receiving end through a data transmission node and a lake and bin integrated architecture, wherein the lake and bin integrated architecture is composed of a data lake module, a data warehouse module and a lake and bin interactive gateway module, wherein the data lake storage module is used for storing original multi-source heterogeneous data, the data warehouse processing module is used for storing standardized structured data, and the lake and bin interactive gateway module is responsible for transferring and converting data between lakes and bins; The integrated architecture of the lake and the warehouse adopts a logic-fused distributed architecture, a data lake storage module,