CN-120031292-B - Large public building demand response intelligent regulation and control method based on multi-mode reinforcement learning intelligent agent system
Abstract
The invention discloses an intelligent regulation and control method for demand response of a large public building based on a multi-mode reinforcement learning intelligent agent system. The method comprises the steps of constructing a multi-mode reinforcement learning intelligent system by introducing multi-mode data processing and multi-intelligent cooperation mechanisms, deploying the multi-mode reinforcement learning intelligent system into a large public building, realizing cooperative control of the multi-mode reinforcement learning intelligent system through a distributed execution mechanism, and cooperatively solving a large public building demand response double-layer regulation model of an electric-carbon-green evidence market based on the multi-mode reinforcement learning intelligent system, so that key regulable equipment is regulated and controlled, and large public buildings participate in electric market, carbon market and green evidence market demand response regulation and control. The invention has the advantages that through multi-mode data fusion, the operation and maintenance state sensing capability of the large public building is improved, and the adaptability of the large public building to dynamic markets and complex environments is enhanced.
Inventors
- Zhong Wai
- WANG SONGJIE
- LIN XIAOJIE
- ZHOU YI
- WU YANLING
Assignees
- 浙江大学
Dates
- Publication Date
- 20260505
- Application Date
- 20250110
Claims (5)
- 1. A method for intelligently regulating and controlling demand response of a large public building based on a multi-mode reinforcement learning intelligent agent system is characterized by comprising the following steps: s1, determining a large public building participation demand response regulation range and identifying key adjustable equipment according to the operation and maintenance characteristics of the large public building, and determining the types and respective functions of related intelligent agents according to the participation demand response regulation range and the key adjustable equipment to obtain a multi-mode reinforcement learning intelligent agent system; S2, realizing multi-modal data sensing and acquisition of a large public building by adopting an intelligent sensor, constructing a multi-modal data set D of the large public building, and extracting features of the set D to obtain a multi-modal data feature set F of the large public building; S3, intensively training a multi-mode reinforcement learning agent system by adopting a deep learning reinforcement algorithm according to the multi-mode data feature set F and each agent constructed in the step S2, and realizing cooperative control of each agent constructed in the step S2 based on a distributed execution mechanism; s4, deploying the multi-mode reinforcement learning intelligent system into the large public building; S5, the multi-mode reinforcement learning intelligent system participates in intelligent regulation and control of the demand response of the large public building, and the specific method comprises the steps of establishing an electric-carbon-green evidence market large public building demand response double-layer regulation and control model, cooperatively solving the electric-carbon-green evidence market large public building demand response double-layer regulation and control model based on the multi-mode reinforcement learning intelligent system, regulating and controlling key regulable equipment in the step S1 based on a model solving result, and maintaining the multi-mode reinforcement learning intelligent system; The step S2 includes the following steps: Step S21, realizing multi-mode data sensing and acquisition of a large public building by adopting an intelligent sensor, wherein the intelligent sensor comprises a temperature sensor, a gas sensor, an infrared sensor, an intelligent camera and a radio frequency identification sensor, the multi-mode data of the large public building refer to diversified type data reflecting operation and maintenance conditions of each link of the large public building, the operation and maintenance conditions of each link of the large public building comprise environment data, energy data, indoor personnel behavior data, electric market data and carbon market data, and the diversified type data comprise text, audio, video and image multi-mode data which are uploaded to an existing comprehensive energy intelligent management and control platform of the large public building through a wireless communication technology; S22, constructing a large public building multi-mode data set D, wherein the multi-mode data set D is composed of historical data sets summarized by the existing comprehensive energy intelligent management and control platform of the large public building and the existing knowledge sets of the large public building, and the existing knowledge sets of the large public building comprise electric carbon coupling demand response, green building energy efficiency optimization, energy system integration and management, renewable energy utilization covering, energy conversion and storage technology and knowledge existing in an intelligent energy monitoring and scheduling system, and carriers of the knowledge are electronic books, existing standards, audios and videos or case set data formed in the operation and maintenance process of the large public building or conversation voice among operation and maintenance personnel; the large public building multi-modal data set D is expressed in the form of a plurality of groups: wherein D i represents the data set of the ith modality, n is the total number of modalities, the data set of each modality Further expressed as: Wherein, the Is the value of the kth data point of the ith modality at time t, m i is the total number of data points of the ith modality; Step S23, performing feature processing and fusion on the multi-mode data set D by using an intelligent algorithm, wherein the feature processing and fusion specifically comprises denoising, time alignment, standardization and normalization, feature extraction and feature fusion, and the multi-mode data feature set F of the large public building is obtained, wherein the F is expressed in a multi-element form: Wherein F i represents the data feature set of the ith modality, n is the total number of modalities, and the data feature set of the ith modality F i is expressed as: Wherein, the Is the value of the kth feature of the ith modality at time t, and M i is the total number of data features of the ith modality; step S24, preliminarily constructing each single agent of the large public building based on the existing large language model, visual language model and deep reinforcement learning algorithm, wherein the single agent specifically comprises the following steps: Firstly, providing a multi-modal data feature set F of a large public building as input features for an existing large language model and a visual language model, and initially constructing each intelligent agent in the multi-modal reinforcement learning intelligent agent system; Then, performing preliminary training on each intelligent agent by using a deep reinforcement learning algorithm, wherein the specific method comprises the following steps: defining parameters global state Local state space observed by the ith single agent Action space of ith single agent And a reward function for the ith individual agent ; Global state The system is used for describing the operation and maintenance states of the whole large public building, and specifically comprises real-time state information of environment temperature, environment humidity, equipment operation states, personnel activity levels, an electric power market, a carbon market and a green certificate market; the local state of the ith single agent is Its observed local state space Is a subset of the global state, namely: wherein T is the ambient temperature, H is the ambient humidity, E is the running state of the equipment, P is the activity level of personnel, Is electric power market information, Is carbon market information, Is green market information; All possible actions performed by the ith single agent Is expressed as a motion space The method comprises the following steps: In the formula, For the joint action of the ith agent, Represents the environmental temperature regulation quantity of a large public building, Representing the amount of lighting adjustment for large public buildings, Represents the heating adjustment quantity of a large public building, Trade volume in electric market for large public building, Trade volume in carbon market for large public building, The transaction amount in the green certificate market for large public buildings; Reward function Is defined according to the current state of the ith single agent and the action executed, and is used for guiding the learning process of the agent; In the formula, Is a reward based on energy consumption and, Is a reward based on indoor comfort, Is a reward for participation in the power market demand response, Is a reward for participating in carbon market transactions, Is a reward for participating in green evidence market transactions; training the essence of the ith agent using deep reinforcement learning algorithm is learning Under policies corresponding to maximizing desired jackpot The method is specifically represented by the following mathematical model: In the formula, Is the ith agent learning strategy Performance index of (2); Representing the expected value, the expected value is represented by the initial state s 0 and the action Rho (|) is calculated by the distribution probability of ) Is in the policy The lower state distribution; is the i-th agent's discount factor at time t, for balancing the importance of the instant and future rewards,
- 2. The intelligent regulation and control method for demand response of large public building based on multi-modal reinforcement learning intelligent agent system according to claim 1, wherein in step S1: the large public building participates in a demand response regulation and control range, specifically comprises electric power market demand response regulation and control, carbon market demand response and green license market demand response regulation and control, and the key adjustable equipment comprises cold and heat source power equipment, central air conditioning equipment, energy storage equipment, lighting equipment and renewable energy equipment; The multi-mode reinforcement learning intelligent system comprises an upper intelligent system, a lower intelligent system and a communication coordination intelligent system, wherein the upper intelligent system and the lower intelligent system perform global cooperative operation through the communication coordination intelligent system; The upper-layer intelligent system comprises an electric power market demand response intelligent agent, a carbon market demand response intelligent agent and a green certificate market demand response intelligent agent, and the lower-layer intelligent system comprises an environment perception intelligent agent, an energy management intelligent agent, a user behavior intelligent agent, a fault detection and maintenance intelligent agent, a strategy generation intelligent agent and a multi-mode data fusion intelligent agent.
- 3. The intelligent regulation and control method for demand response of large public buildings based on the multi-mode reinforcement learning intelligent agent system according to claim 1, wherein the step S3 specifically comprises the following steps: step S31, establishing a global joint action of the multi-mode reinforcement learning intelligent agent system Expressed as: Wherein, the Representing the joint action of the ith agent; building global action space of multi-mode reinforcement learning intelligent system , Cartesian product of all agent action spaces, specifically expressed as Wherein, the Representing an action space of the ith agent; step S32, define a federation Function of , Is expressed in a global state And global joint actions Rewards of the following i-th agent: = Wherein, the Q network parameters representing the i-th agent; Policy network of the ith agent By strategic gradient Optimizing and strategically grading Expressed as: Wherein, the The gradient is represented by a gradient, Is the gradient of the joint Q function to the action of the ith agent; Is a parameter of the policy network; is a policy network Regarding parameters Is a gradient of (2); Joint Q function By mean square error Updating: Wherein the target value The method comprises the following steps: Wherein, the Representing global joint actions of the multi-mode reinforcement learning intelligent system in the next state; the global state of the multi-mode reinforcement learning intelligent system in the next state is obtained; the Q network parameter of the ith agent in the next state; A discount factor representing an ith agent; Collaborative control of each agent in a multi-modal reinforcement learning agent system through global objectives Expressed as: Wherein, the The global expected value is represented as such, Is a global rewards function defined as a weighted sum of all agent rewards: is the weight of the i-th agent; In the training stage, the strategy and Q function of each intelligent agent i are intensively trained and optimized, and the strategy distributed updating rule of each intelligent agent i is as follows: The parameter updating rule of the Q function is as follows: Wherein, the Is the learning rate; step S33, in distributed execution, the agent based on local observations And policies And the decision action is combined to realize the cooperative operation of each intelligent agent in the multi-mode reinforcement learning intelligent agent system.
- 4. The intelligent regulation and control method for demand response of large public building based on multi-modal reinforcement learning intelligent agent system according to claim 3, wherein the step S4 comprises the following steps: Step S41, integrating an upper-layer intelligent agent system with a comprehensive energy intelligent management and control platform, wherein a data interface is in butt joint with a building management system and an energy market platform through a protocol to acquire market price, building energy consumption prediction data and historical transaction record information; and step S42, integrating the lower-layer intelligent agent system with the equipment controller, and performing embedded deployment, namely performing embedded deployment on the trained lower-layer intelligent agent in the equipment controller.
- 5. The intelligent regulation and control method for demand response of large public building based on multi-modal reinforcement learning intelligent agent system according to claim 4, wherein the step S5 comprises the following steps: step S51, a large public building participation electricity-carbon-green evidence market demand response double-layer regulation and control model is established, wherein the double-layer regulation and control model comprises an upper economic model and a lower day-ahead dispatching model; The upper economic model participates in the maximization profit obtained in the electric power market, the carbon market and the green evidence market demand response with large public buildings The method is characterized by comprising the following steps: In the formula, Is the electricity purchase price of the large public building b at time t, Is the marginal cost of the large public building b at time t, Is the electricity consumption of the large public building b at time t, Is the green certificate price of the large public building b at time t, Is the green certificate transaction amount purchased by the large public building b at time t, Is the carbon price of the large public building b at time t, Is the carbon emission of the large public building b at time t; The lower layer day-ahead dispatch model takes part in minimizing cost of each energy device in power market, carbon market and green evidence market demand response by large public building The method is characterized by comprising the following steps: Wherein, the Representing the day-ahead dispatch cost of large public building b over time t, Is a daily scheduling decision variable of each key adjustable device in the large public building b in the time t; representing the total cost of the large public building b's electricity market over time t, including electricity purchase costs and demand response costs; representing the total cost of the carbon market for large public building b over time t, including the carbon quota purchase cost; the total cost or benefit of the green license market for large public building b over time t is represented, depending on whether it is purchasing or selling green licenses; Expressed as: Wherein, the The method refers to a day-ahead scheduling decision of cold and heat source power equipment in a large public building b; the method refers to a daily scheduling decision of air conditioning equipment in a large public building b; the method refers to a daily scheduling decision of energy storage equipment in a large public building b; means the day-ahead scheduling decisions of lighting devices in large public building b; the method refers to a day-ahead scheduling decision of renewable energy devices in a large public building b; Constraint conditions of the lower layer day-ahead scheduling model comprise power supply and demand balance constraint, carbon emission limit constraint, green license transaction limit constraint and energy source equipment output constraint; the power supply and demand balance constraint is as follows: Wherein, the Is the amount of electricity purchased by the large public building b at time t, Is the sales of electricity at time t for large public building b, Is the amount of electricity purchased or sold by the large public building b, Indicating that all times t are true; The carbon emission limits are constrained as: Wherein, the Is the carbon emission of the large public building b at time t, Is the carbon quota for large public building b; the green card transaction limit constraints are: Wherein, the Is the green certificate transaction amount of the large public building b at time t, The renewable energy generating capacity of the large public building b at time t; Energy equipment output constraint: And (3) with Respectively refers to a lower output limit and an upper output limit of an ith energy device in a large public building; Step S52, respectively solving an upper economic model and a lower day-ahead scheduling model by using the upper intelligent system and the lower intelligent system, realizing information interaction and collaborative solution of the upper intelligent system and the lower intelligent system by the communication coordination intelligent system, and solving the upper economic model to obtain 、 、 Solving the acquisition of the lower-layer day-ahead scheduling model ; By establishing a mean square error function of an upper-layer intelligent agent system Objective function of upper level agent system Policy network gradient of upper layer agent system Further solving the upper economic model to obtain 、 、 The method specifically comprises the following steps: Mean square error function of upper intelligent agent system In order to achieve this, the first and second, In the formula, The upper mean square error function value is the upper-layer intelligent agent system; Is expected value of the upper intelligent agent system; the method is a joint Q function of an upper-layer intelligent agent system in the multi-mode reinforcement learning intelligent agent system; The global state of the upper intelligent agent system; is the joint action of the upper intelligent agent system, expressed as ; For adjusting Learning parameters of the function; target value of upper intelligent system; A reward function established for the upper-layer intelligent agent system according to the upper-layer economic model, namely ; The discount factor of the upper intelligent agent system is used for balancing the rewards of the upper intelligent agent system; the joint Q function of an upper-layer intelligent agent system in the multi-mode reinforcement learning intelligent agent system in the next state; objective function of upper level agent system The method comprises the following steps: The policy updating process of the upper-layer intelligent agent system is based on the policy network gradient of the upper-layer intelligent agent system The implementation is as follows: In the formula, For upper intelligent system objective function Network parameters with respect to its policies For guiding policy updates to obtain maximization ; Is the network parameter of the upper-layer intelligent agent system about the strategy thereof Is not limited to the desired one; policy network parameters for the upper level agent system; policy network for upper layer intelligent agent system; Policy network for upper level agent system Is a gradient of (2); combining the gradient of the Q function for the upper-layer intelligent body system; by establishing the underlying mean square error function of the underlying intelligent agent system Objective function of underlying agent system Policy network gradient of underlying agent system Solving the lower layer day-ahead scheduling model to obtain The method specifically comprises the following steps: upper mean square error function of lower intelligent agent system In order to achieve this, the first and second, In the formula, A lower-layer mean square error function for a lower-layer intelligent agent system; Expected values for the underlying agent system; a joint Q function of a lower-layer intelligent agent system in the multi-mode reinforcement learning intelligent agent system; the global state of the lower intelligent agent system; Is the joint action of the lower intelligent agent system, expressed as ; For adjusting Learning parameters of the function; target values for the lower intelligent agent system; A reward function established for the lower intelligent agent system according to the lower day-ahead scheduling model, namely ; Discount factors of the lower intelligent agent system are used for balancing rewards of the lower intelligent agent system; the joint Q function of a lower-layer intelligent agent system in the multi-mode reinforcement learning intelligent agent system in the next state; objective function of underlying agent system The method comprises the following steps: the policy updating process of the underlying intelligent agent system is based on the policy network gradient of the underlying intelligent agent system The implementation is as follows: In the formula, For upper intelligent system objective function Network parameters with respect to its policies For guiding policy updates to obtain maximization ; Is the network parameter of the lower intelligent agent system about the strategy thereof Is not limited to the desired one; policy network parameters for the underlying agent system; policy network for the lower intelligent agent system; Policy network for underlying agent system Is a gradient of (2); combining the gradient of the Q function for the underlying agent system; Model-based solution 、 、 、 Regulating the key controllable device; And step S53, maintaining the multi-mode reinforcement learning intelligent system, wherein the specific method is to monitor the energy use condition in the building in real time through an intelligent sensor, and further optimize and maintain the behavior of the multi-mode reinforcement learning intelligent system according to feedback information.
Description
Large public building demand response intelligent regulation and control method based on multi-mode reinforcement learning intelligent agent system Technical Field The invention belongs to the crossing field of large public building energy systems and artificial intelligence, and particularly relates to an intelligent regulation and control method for demand response of a large public building based on a multi-mode reinforcement learning intelligent body system. Background The large building energy system has the problems of high energy consumption, low automation level, low intelligent control, complex subsystem structure and the like, and the demand response technology gradually becomes one of key means for realizing the green, low-carbon and intelligent development of large public building energy. The building demand response technology can be used for adjusting the energy consumption of a large public building in the electricity consumption peak period, improving the energy use efficiency through intelligent regulation and control, reducing the operation cost and realizing energy conservation. However, the conventional building energy management method mainly relies on a rule-based static control strategy, and it is difficult to cope with real-time dynamic energy market fluctuation and complex and changeable operation environments inside the building. In addition, with the rapid development of the electric market, the carbon trade market and the green market, large public buildings need to participate in the optimization of the demand response of multiple markets under the constraint of multiple objectives (such as economy, environmental benefit, user comfort). In recent years, artificial intelligence technology, particularly reinforcement learning, has emerged, and a new path is provided for intelligent optimization of complex systems. The reinforcement learning can adaptively optimize the control strategy through interactive learning, and the multi-mode reinforcement learning further combines the characteristics of multi-source heterogeneous data (such as environmental state, equipment running state and market price dynamic) in the building, thereby providing an efficient solution for the demand response of large public buildings. In addition, through the multi-agent collaborative reinforcement learning technology, unified optimization of global and local targets can be realized in a multi-layer structure (such as a market layer and an equipment layer) of a building. The method can dynamically regulate and control the internal equipment (such as air conditioner, heating, lighting system and the like) of the building while supporting the building to participate in the electric market transaction, the carbon market and the green certificate market, and realizes the maximization of the energy utilization efficiency, the minimization of the running cost and the improvement of the environmental benefit. Aiming at the challenges, how to cooperatively optimize the operation and maintenance management and control strategy of the large-scale building comprehensive energy system by means of artificial intelligence, thereby improving the operation and maintenance efficiency and the safety reliability of the large-scale building comprehensive energy system, realizing green energy conservation, and simultaneously ensuring the normal operation and maintenance safety and the comfort of the large-scale building for various scenes is a problem to be solved urgently. Disclosure of Invention The invention provides an intelligent regulation and control method for demand response of a large public building based on a multi-mode reinforcement learning intelligent agent system, which aims to overcome the limitation of the traditional method, and constructs an intelligent agent system which can adapt to an electric-carbon-green evidence dynamic market and a complex running environment by introducing multi-mode data processing and multi-intelligent agent cooperative mechanisms, thereby providing an efficient and intelligent solution for participation demand response of the large public building in the electric-carbon-green evidence market. In order to achieve the above purpose, the present invention provides the following technical solutions: a large public building demand response intelligent regulation and control method based on a multi-mode reinforcement learning intelligent agent system comprises the following steps: S1, determining a large public building participation demand response regulation range and identifying key adjustable equipment according to the operation and maintenance characteristics of the large public building, and determining the types and respective functions of related intelligent agents according to the participation demand response regulation range and the key adjustable equipment to obtain a multi-mode reinforcement learning intelligent agent system; S2, realizing multi-modal data sensing and acq