CN-121998624-A - Method and system for adaptively adjusting comprehensive evaluation weight of operation performance of direct current collecting system of offshore wind farm based on reinforcement learning

CN121998624ACN 121998624 ACN121998624 ACN 121998624ACN-121998624-A

Abstract

The invention particularly provides a self-adaptive adjustment method and a self-adaptive adjustment system for comprehensive evaluation weight of the operation performance of a direct current collecting system of an offshore wind farm based on reinforcement learning, wherein the method comprises the steps of collecting and preprocessing multi-source operation data of the direct current collecting system of the offshore wind farm to obtain real-time performance indexes; the method comprises the steps of constructing a Markov decision process for weight adjustment based on real-time performance indexes and current evaluation weight vectors, obtaining an optimal weight adjustment strategy based on interaction between an intelligent body driven by reinforcement learning and the constructed Markov decision process environment, obtaining an optimal weight vector based on the optimal weight adjustment strategy, and carrying out real-time comprehensive evaluation on the operation performance of a direct current collecting system of the offshore wind farm based on the optimal weight vector. The invention provides an innovative technical approach for solving the dynamic evaluation problem of the complex industrial system.

Inventors

CHE YANBO
WANG LEI
Hua Anran
ZHENG MENGXIANG
PENG JIN
SU JIANBO
Qiu Runze

Assignees

天津大学

Dates

Publication Date: 20260508
Application Date: 20260203

Claims (8)

1. The self-adaptive adjustment method for the comprehensive evaluation weight of the operation performance of the direct current collecting system of the offshore wind farm based on reinforcement learning is characterized by comprising the following steps: Collecting and preprocessing multisource operation data of a direct current collecting system of the offshore wind farm to obtain real-time performance indexes; Constructing a Markov decision process for weight adjustment based on the real-time performance index and the current evaluation weight vector; Interacting the agent based on reinforcement learning driving with the constructed Markov decision process environment to obtain an optimal weight adjustment strategy; And obtaining an optimal weight vector based on the optimal weight adjustment strategy, and carrying out real-time comprehensive evaluation on the operation performance of the offshore wind farm direct current collector system based on the optimal weight vector.
2. The method of claim 1, wherein the multi-source operational data comprises stability data, reliability data, and energy efficiency data, and wherein the real-time performance metrics comprise a stability aggregation metric, a reliability aggregation metric, and an energy efficiency aggregation metric; The stability data comprise DC bus voltage fluctuation rate, converter station current fluctuation rate and system frequency deviation; The reliability data comprises a key equipment health index, an average fault-free time predicted value and equipment utilization rate; the energy efficiency data comprise a network loss rate and power transmission efficiency, and are used for representing the loss level and the energy utilization efficiency of the direct current collecting system in the energy transmission process.
3. The method of claim 2, wherein the Markov decision process includes a state space, an action space, and a reward function; the elements of the state space comprise a current evaluation weight vector and a real-time performance index vector; the elements of the action space are the adjustment actions of the preset adjustment step length on each weight parameter in the current evaluation weight vector; The bonus function takes the form of a multi-objective composite including a first sub-bonus item, a second sub-bonus item, a third sub-bonus item, and a stability penalty item.
4. The method of claim 3, wherein, in the bonus function, The first sub-rewarding item is the relative improvement rate of the voltage stability index between two adjacent moments; The second sub-rewarding item is the relative lifting rate of the key equipment health index between two adjacent moments; The third sub-rewarding item is the relative lifting rate of the energy efficiency aggregation index between two adjacent moments; The stability penalty term is determined by the Euclidean distance between the evaluation weight vector at the current moment and the evaluation weight vector at the last moment; And the value of the rewarding function is obtained by carrying out weighted summation on the first sub rewarding item, the second sub rewarding item and the third sub rewarding item according to a preset first weight coefficient, a preset second weight coefficient and a preset third weight coefficient, and then subtracting the stability punishment item.
5. The method of claim 1, wherein the agent employs an Actor-Critic architecture, wherein the Actor policy network and the Critic value network each comprise an input layer, a first hidden layer, a second hidden layer, and an output layer; the Actor network is used for giving probability distribution of each action in the current state and deciding the adjustment action of the evaluation weight; the Critic network is used for calculating state value estimation and obtaining an advantage function.
6. An adaptive adjustment system for comprehensive evaluation weight of operation performance of a direct current collector system of an offshore wind farm based on reinforcement learning, for implementing the method of any one of claims 1-5, comprising: The data acquisition and preprocessing module is used for acquiring and preprocessing multi-source operation data of the direct current collecting system of the offshore wind farm to obtain real-time performance indexes; The reinforcement learning intelligent decision module is used for constructing a Markov decision process for weight adjustment based on the real-time performance index and the current evaluation weight vector; The weight dynamic execution module is used for interacting with the constructed Markov decision process environment based on the reinforcement learning driven agent to obtain an optimal weight adjustment strategy; and the comprehensive evaluation output module is used for obtaining an optimal weight vector based on the optimal weight adjustment strategy and carrying out real-time comprehensive evaluation on the operation performance of the offshore wind farm direct current collector system based on the optimal weight vector.
7. The system according to claim 6, wherein the data acquisition and preprocessing module establishes a connection with a monitoring and data acquisition system SCADA and a device status monitoring system CMS of a DC collecting system of an offshore wind farm through an industrial communication protocol to realize real-time acquisition of multi-source operation data.
8. The system of claim 6, wherein the integrated rating output module is further provided to a Web services interface and a visualization interface to support real-time presentation of rating results and historical data queries.

Description

Method and system for adaptively adjusting comprehensive evaluation weight of operation performance of direct current collecting system of offshore wind farm based on reinforcement learning Technical Field The invention belongs to the technical field of intelligent operation and maintenance and dynamic decision-making of an offshore wind power direct current collector system, and particularly relates to a comprehensive evaluation weight self-adaptive adjustment method and system for the operation performance of the offshore wind power plant direct current collector system based on reinforcement learning. Background With the global increase of clean energy demand, offshore wind power is rapidly developing as an important renewable energy form. The offshore wind farm usually adopts a direct current collecting system to collect and remotely convey the electric energy generated by a plurality of wind generators, and the safe and stable operation of the system is crucial to the operation performance and the power supply reliability of the whole wind farm. In order to comprehensively evaluate the operation state of the dc current collecting system, a comprehensive evaluation method is generally adopted in the industry. The method is realized by constructing a comprehensive evaluation system covering multiple dimensions of stability, reliability, energy efficiency and the like. Specifically, a series of key performance indicators are selected for each dimension, and a weighted sum model is constructed. The most common model is the weighted sum method, i.e. composite score = w 1 index 1 + w 2 index 2 + w n index n, where w 1, w2, ..., wn is the weight of each index. However, the comprehensive evaluation method in the prior art has the following significant drawbacks: 1. the weight is static fixed, and once the weight is set, the weight remains unchanged for a long time. These weights are typically determined once at the design stage, depending on subjective or semi-subjective methods based on expert experience (such as analytic hierarchy process AHP). 2. The method can not adapt to the time-varying working conditions, and the running environment of the offshore wind farm is highly dynamic and uncertain. The natural conditions such as wind speed, sea conditions and the like change drastically, and the equipment can gradually age or randomly malfunction with running time. The static weight cannot be adjusted according to the current real running state of the system and the external environment, so that the evaluation result is distorted. For example, when the equipment ages seriously, the weight of the reliability index should be correspondingly increased, and the static weight model cannot reflect the change. 3. The evaluation result is lagged, and the decision guidance is poor, so that the weight cannot be adaptively adjusted, the evaluation result often cannot accurately and timely reflect the short plates and potential risks of the system, and therefore operation and maintenance personnel cannot make an optimal control strategy or maintenance decision based on the evaluation result, and the optimal intervention time can be missed, and even safety accidents are caused. The prior power system and wind farm evaluation field already forms a series of national standard and industry standard, for example, GB/T2900.13-2008 electrical engineering term credibility and service quality provides reliability terms such as a power shortage expected value (EENS), DL/T793.1-2017 power generation equipment reliability evaluation procedure part 1 general rule prescribes a statistical method of reliability indexes such as dividing states such as use, availability, unavailable states and the like of power generation equipment and availability coefficients, shutdown coefficients and the like, DL/T686-2018 electric power network electric energy loss calculation rule and GB/T40267-2021 electric power system electric energy loss calculation rule unify calculation methods of power transmission line loss and line loss rate, and NB/T31117-2017, NB/T11603-2024 and NB/T11599-2024 respectively provide engineering criteria for submarine cable current capacity, thermal stability limit, offshore converter station voltage deviation and recovery time. However, the above standard is mainly used for static planning design and periodic evaluation, the existing work generally does not combine the standardized indexes with a dynamic adjustment mechanism of the on-line multidimensional performance evaluation weight, and the fixed weight model is still mainly used, so that the self-adaptive optimization of comprehensive evaluation is difficult to realize under complex time-varying working conditions. Therefore, a technical scheme for overcoming the defect of static weight and realizing the self-adaptive adjustment of the evaluation weight according to the real-time working condition is needed, so as to improve the accuracy of comprehensive evaluation and