US-12617312-B2 - Systems and methods for adaptive optimization for energy systems

US12617312B2US 12617312 B2US12617312 B2US 12617312B2US-12617312-B2

Abstract

Systems and methods are provided for dynamically selecting a control policy from among several available control policies for controlling an energy system having multiple controllable assets. The performance of the selected control policy is monitored and a different control policy may be deployed in its place if the different control policy has a higher chance of providing better performance given the current control environment. Thus, as the control environment changes, the control policy that controls the power system may also be changed in an adaptive manner. In this way, the control policies may be changed as the control environment changes to provide an improved real-time performance compared to the use of a single control policy.

Inventors

Nasrin SADEGHIANPOURHAMAMI
Mostafa FARROKHABADI

Assignees

BluWave Inc.

Dates

Publication Date: 20260505
Application Date: 20220204

Claims (20)

1 . A computer-implemented method comprising: at one or more electronic devices each having one or more processors and memory: storing a database comprising historical control environment data associated with an energy system having a plurality of controllable assets; training an agent selection policy of a control agent selector, wherein the training comprises: calculating a historical system-wide performance score for each of a plurality of system-wide control agents based on the historical control environment data and on historical performance information of the control agents, wherein each of the plurality of control agents comprises a system-wide control policy for controlling the energy system, and wherein each of the historical system-wide performance scores relates to the performance of a respective control agent with its respective system-wide control policy in controlling the energy system; and training the agent selection policy based on the historical control environment data and the calculated historical system-wide performance scores, and wherein the training comprises learning a function that maps the historical control environment data to the historical system-wide performance score for each of the plurality of control agents; inputting new control environment data associated with the energy system into the control agent selector; selecting, by the control agent selector based on the trained agent selection policy, a system-wide control agent from among the plurality of control agents, the selecting comprising: calculating predicted system-wide performance scores for the plurality of control agents based on the new control environment data; and selecting the control agent based on the calculated predicted system-wide performance scores; and controlling the energy system using the system-wide control policy of the selected system-wide control agent and based on the new control environment data to perform at least one of: charge or discharge at least one of the controllable assets, and turn on or turn off at least one of the controllable assets.
2 . The method according to claim 1 , wherein the plurality of controllable assets comprises at least one of a battery energy storage system (BESS), and a thermostatically controllable load (TCL).
3 . The method according to claim 1 , wherein the storing a database comprises storing a first database comprising historical control environment data associated with a first energy system having a first plurality of controllable assets, and storing a second database comprising historical control environment data associated with a second energy system having a second plurality of controllable assets; wherein the training an agent selection policy comprises training an agent selection policy of a first control agent selector associated with the first energy system, and training an agent selection policy of a second control agent selector associated with the second energy system; wherein the inputting new control environment data comprises inputting new control environment data associated with the first energy system into the first control agent selector, and inputting new control environment data associated with the second energy system into the second control agent selector; wherein the selecting a system-wide control agent comprises selecting a first system-wide control agent for the first energy system, and selecting a second system-wide control agent for the second energy system; and wherein the controlling the energy system comprises controlling the first energy system based on the first system-wide control agent and on the new control environment data associated with the first energy system, and controlling the second energy system based on the second system-wide control agent and on the new control environment data associated with the second energy system.
4 . The method according to claim 1 , further comprising, subsequent to the training the agent selection policy: collecting additional control environment data associated with the energy system, and updating the historical control environment data in the database based on the additional control environment data; and re-training the agent selection policy of the control agent selector based on the updated historical control environment data.
5 . The method according to claim 2 , further comprising: collecting and storing experience data of one or more control agents, wherein the experience data comprises information relating to the experience of the one or more control agents as they interacted with their control environments, wherein experience data associated with a specific control agent is collected and stored based on an experience selection probability, wherein the experience selection probability is associated with a system-wide performance score of the specific control agent; and training a control module of at least one of the plurality of control agents with training data comprising at least a portion of the experience data.
6 . The method according to claim 1 , further comprising: aggregating at least a portion of the new control environment data, wherein the aggregating comprises assigning individual controllable assets represented in the data into at least one of a plurality of groups based on a predefined similarity feature thereby producing an aggregated control problem, wherein at least one of the controlling the energy system using the selected control agent, and the calculating predicted system-wide performance scores of the plurality of control agents, is based on the aggregated data such that at least some control decisions in the aggregated control problem are made for each of the plurality of groups.
7 . The method according to claim 1 , further comprising: clustering at least a portion of the new control environment data, wherein the clustering comprises: identifying clusters of controllable assets represented in the data based on a predefined clustering feature; and assigning a cluster ID associated with a given cluster to each of the controllable assets in that cluster, wherein at least one of the controlling the energy system using the selected control agent, and the calculating predicted system-wide performance scores of the plurality of control agents, is based on the clustered data.
8 . A computer-implemented system, comprising: a database comprising historical control environment data associated with an energy system having a plurality of controllable assets; one or more electronic devices each having one or more processors and memory configured to: train an agent selection policy of a control agent selector, wherein the training comprises: calculating a historical system-wide performance score for each of a plurality of system-wide control agents based on the historical control environment data and on historical performance information of the control agents, wherein each of the plurality of control agents comprises a system-wide control policy for controlling the energy system, and wherein each of the historical system-wide performance scores relates to the performance of a respective control agent with its respective system-wide control policy in controlling the energy system; and training the agent selection policy based on the historical control environment data and the calculated historical system-wide performance scores, wherein the training comprises learning a function that maps the historical control environment data to the historical system-wide performance score for each of the plurality of control agents; and input new control environment data associated with the energy system into the control agent selector; select, by the control agent selector based on the trained agent selection policy, a system-wide control agent from among the plurality of control agents, the selecting comprising: calculating predicted system-wide performance scores for the plurality of control agents based on the new control environment data; and selecting the control agent based on the calculated predicted system-wide performance scores; control the energy system using the system-wide control policy of the selected system-wide control agent and based on the new control environment data to perform at least one of: charge or discharge at least one of the controllable assets, and turn on or turn off at least one of the controllable assets.
9 . The system according to claim 8 , wherein the plurality of controllable assets comprises at least one of a battery energy storage system (BESS), and a thermostatically controllable load (TCL).
10 . The system according to claim 8 , wherein the database is a first database, the energy system is a first energy system, and the plurality of controllable assets is a first plurality of controllable assets, the system further comprising a second database comprising historical control environment data associated with a second energy system having a second plurality of controllable assets, wherein the configuration of the one or more electronic devices to train an agent selection policy of a control agent selector comprises training an agent selection policy of a first control agent selector associated with the first energy system, and training an agent selection policy of a second control agent selector associated with the second energy system, to input new control environment data comprises inputting new control environment data associated with the first energy system into the first control agent selector, and inputting new control environment data associated with the second energy system into the second control agent selector, to select a system-wide control agent comprises selecting a first system-wide control agent for the first energy system, and selecting a second system-wide control agent for the second energy system, and to control the energy system comprises controlling the first energy system based on the first system-wide control agent and on the new control environment data associated with the first energy system, and controlling the second energy system based on the second system-wide control agent and on the new control environment data associated with the second energy system.
11 . The system according to claim 8 , further configured to, subsequent to the training the agent selection policy: collect additional control environment data associated with the energy system, and update the historical control environment data in the database based on the additional control environment data; and re-train the agent selection policy of the control agent selector based on the updated historical control environment data.
12 . The system according to claim 8 , further configured to: collect and store experience data of one or more control agents, wherein the experience data comprises information relating to the experience of the one or more control agents as they interacted with their control environments, wherein experience data associated with a specific control agent is collected and stored based on an experience selection probability, wherein the experience selection probability is associated with a system-wide performance score of the specific control agent; and train a control module of at least one of the plurality of control agents with training data comprising at least a portion of the experience data.
13 . The system according to claim 8 , further configured to: aggregate at least a portion of the new control environment data, wherein the aggregating comprises assigning individual controllable assets represented in the data into at least one of a plurality of groups based on a predefined similarity feature thereby producing an aggregated control problem, wherein at least one of the controlling the energy system using the selected control agent, and the calculating predicted system-wide performance scores of the plurality of control agents, is based on the aggregated data such that at least some control decisions in the aggregated control problem are made for each of the plurality of groups.
14 . A non-transitory computer-readable medium having computer-readable instructions stored thereon, the computer-readable instructions executable by a processor of one or more electronic devices to cause the performance of operations comprising: storing a database comprising historical control environment data associated with an energy system having a plurality of controllable assets; training an agent selection policy of a control agent selector, wherein the training comprises: calculating a historical system-wide performance score for each of a plurality of system-wide control agents based on the historical control environment data and on historical performance information of the control agents, wherein each of the plurality of control agents comprises a system-wide control policy for controlling the energy system, and wherein each of the historical system-wide performance scores relates to the performance of a respective control agent with its respective system-wide control policy in controlling the energy system; and training the agent selection policy based on the historical control environment data and the calculated historical system-wide performance scores, wherein the training comprises learning a function that maps the historical control environment data to the historical system-wide performance score for each of the plurality of control agents; inputting, new control environment data associated with the energy system into the control agent selector; selecting, by the control agent selector based on the trained agent selection policy, a system-wide control agent from among the plurality of control agents, the selecting comprising: calculating predicted system-wide performance scores for the plurality of control agents based on the new control environment data; and selecting the control agent based on the calculated predicted system-wide performance scores; and controlling the energy system using the system-wide control policy of the selected system-wide control agent and based on the new control environment data to perform at least one of: charge or discharge at least one of the controllable assets, and turn on or turn off at least one of the controllable assets.
15 . The non-transitory computer-readable medium according to claim 14 , wherein the plurality of controllable assets comprises at least one of a battery energy storage system (BESS), and a thermostatically controllable load (TCL).
16 . The non-transitory computer-readable medium according to claim 14 , wherein the storing a database comprises storing a first database comprising historical control environment data associated with a first energy system having a first plurality of controllable assets, and storing a second database comprising historical control environment data associated with a second energy system having a second plurality of controllable assets; wherein the training an agent selection policy comprises training an agent selection policy of a first control agent selector associated with the first energy system, and training an agent selection policy of a second control agent selector associated with the second energy system; wherein the inputting new control environment data comprises inputting new control environment data associated with the first energy system into the first control agent selector, and inputting new control environment data associated with the second energy system into the second control agent selector; wherein the selecting a system-wide control agent comprises selecting a first system-wide control agent for the first energy system, and selecting a second system-wide control agent for the second energy system; and wherein the controlling the energy system comprises controlling the first energy system based on the first system-wide control agent and on the new control environment data associated with the first energy system, and controlling the second energy system based on the second system-wide control agent and on the new control environment data associated with the second energy system.
17 . The non-transitory computer-readable medium according to claim 14 , further comprising, subsequent to the training the agent selection policy: collecting additional control environment data associated with the energy system, and updating the historical control environment data in the database based on the additional control environment data; and re-training the agent selection policy of the control agent selector based on the updated historical control environment data.
18 . The non-transitory computer-readable medium according to claim 14 , further comprising: collecting and storing experience data of one or more control agents, wherein the experience data comprises information relating to the experience of the one or more control agents as they interacted with their control environments, wherein experience data associated with a specific control agent is collected and stored based on an experience selection probability, wherein the experience selection probability is associated with a system-wide performance score of the specific control agent; and training a control module of at least one of the plurality of control agents with training data comprising at least a portion of the experience data.
19 . The non-transitory computer-readable medium according to claim 14 , further comprising: aggregating at least a portion of the new control environment data, wherein the aggregating comprises assigning individual controllable assets represented in the data into at least one of a plurality of groups based on a predefined similarity feature thereby producing an aggregated control problem, wherein at least one of the controlling the energy system using the selected control agent, and the calculating predicted system-wide performance scores of the plurality of control agents, is based on the aggregated data such that at least some control decisions in the aggregated control problem are made for each of the plurality of groups.
20 . The non-transitory computer-readable medium according to claim 14 , further comprising: clustering at least a portion of the new control environment data, wherein the clustering comprises: identifying clusters of controllable assets represented in the data based on a predefined clustering feature; and assigning a cluster ID associated with a given cluster to each of the controllable assets in that cluster, wherein at least one of the controlling the energy system using the selected control agent, and the calculating predicted system-wide performance scores of the plurality of control agents, is based on the clustered data.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation application of U.S. application Ser. No. 16/985,841 filed Aug. 5, 2020. FIELD The present disclosure relates generally to controlling a power system, and more particularly to adaptive optimization control of a power or an energy system. BACKGROUND Machine Learning (ML) may be used in power or energy systems, with penetration of renewable energy such as wind, solar, or tidal energy, to improve the utilization of variable renewable resources and coordinate consumption/demand. Machine learning models may be used to predict future resource availability and demand requirements. These predictions may then be used to schedule generation, storage, and/or pricing to optimally coordinate these energy systems to achieve various objectives such as cost minimization, efficiency maximization, or optimal use of local renewable energy. Prediction and optimization models may also be based on machine learning. Power grids are undergoing a major transition, partly to meet worldwide ambitions to reduce carbon dioxide footprint. Some manifestations thereof are the increased penetration of the renewable generation, for example wind and solar, proliferation of Distributed Energy Storage Systems (DESS), and adoption of Electric Vehicles (EVs) as an alternative for internal combustion engine cars. Integration of such technologies adds complexity to the control paradigm of the power grids and mandates intelligent control mechanisms. An ultimate goal of an intelligent control mechanism is to exploit the flexibility in electricity usage offered by DESS, electric vehicle batteries or any other controllable assets such as thermostatically controllable loads (TCL) in response to price-based and incentive-based signals to ensure system reliability and to yield economic and environmental benefits. Hence, extensive research is being done in proposing such algorithms. Some initial studies took a model-based approach to formulate the control problem as an optimization problem that minimizes/maximizes a predefined objective subject to various operating constraints. Recently, due to the abundance of power system data, various machine-learning algorithms have been employed to provide analytical and forecasting information to the model-based control algorithms or to facilitate model-free and data-driven control mechanisms. In the model-free approaches, the control problem is cast as a Markov Decision Process (MDP) and a learning agent interacts with the environment by taking actions in response to a system state, observing the reward and the next state of the environment. However, both of the aforementioned approaches have limitation in their performance. The performance of model-based solutions is limited by the accuracy of the models and their parameters, which is often challenging to obtain due to complexity of the real-world problem they are modeling. Model-free approaches circumvent the challenges of model selection by inferring from data however, their applicability to the real-world problems is hindered due to the scalability of the state-action spaces of the problem. Among the controllable assets or loads in such systems, electric vehicle charging demands are particularly more challenging to coordinate due to bounds on the timing and duration of asset availability: energy requirement of the EV should be met during its sojourn. However, electric vehicles time of arrival, its sojourn and associated energy demand are influenced by their owner's behavioral patterns and fleet owner routing decisions. Coordinating electric vehicle fleet charging demand or multiple electric vehicles in large-scale could have to take into account the heterogeneity of the end users, differences in behavioral patterns, and uncertainty surrounding their behavior. This hinders the performance of the model-based approaches, which are based on accurate models of the problem. On the other hand, various model-based methodologies have been proposed to intelligently coordinate the electric vehicle charging demand, their application to the real-world problems is limited. With the increase over time in the size of electric vehicle data, data-driven methodologies have recently been proposed to circumvent the challenges of the model-based approaches. These methods employ reinforcement learning to infer the best coordination policy by extracting with the control environments formulated as MDP. However, their application to jointly controlling an electric vehicle fleet is limited due to the limited scalability of the state-action space. There has been no attempt to date to combine the aforementioned control mechanisms intelligently. The above information is presented as background information only to assist with an understanding of the present disclosure. No assertion or admission is made as to whether any of the above, or anything else in the present disclosure, unless explicitly stated, might be applica