CN-122021814-A - Self-adaptive interpretable energy scheduling method based on linear genetic programming and related equipment

CN122021814ACN 122021814 ACN122021814 ACN 122021814ACN-122021814-A

Abstract

The embodiment of the application provides a self-adaptive interpretable energy scheduling method and related equipment based on linear genetic programming, belonging to the technical field of energy management and evolution calculation. The method comprises the steps of constructing an energy system environment model, initializing an LGP population, encoding population individuals into a linear instruction sequence, combining a system state and candidate actions into enhanced input at each decision time, multiplexing intermediate results through a calculation register, adaptively learning a coupling relation between actions, transparently outputting action priorities and selecting optimal actions, evaluating and driving population evolution through fitness, optimizing and searching through multiple genetic operations, and finally decoding the individuals with optimal fitness into explicit and human-readable scheduling heuristic rules. The application realizes the inherent interpretability of the scheduling strategy, can adaptively match scheduling scenes with different coupling strengths, and obviously improves the practicability and the deployability of the method while ensuring high performance.

Inventors

QI RUI
JIA YAHUI
DONG HAOBO
LIN ZHENHONG
JIANG HUAIGUANG
CHEN WEINENG

Assignees

华南理工大学

Dates

Publication Date: 20260512
Application Date: 20251215

Claims (10)

1. An adaptive interpretable energy scheduling method based on linear genetic programming, the method comprising the steps of: Step 1, constructing a dynamic energy management system environment model, defining a state space, an action space and an optimization target, and initializing a Linear Genetic Programming (LGP) population, wherein the population comprises a plurality of individuals, and each individual is encoded into a linear instruction sequence; at each decision time, adopting a discrete action mapping strategy based on LGP, forming an enhanced input by the system state and the candidate discrete actions, calculating the output value of each candidate action by using a calculation register to store and multiplex intermediate calculation results, and selecting the action with the highest priority as the actual execution action; Step 3, evaluating and acquiring fitness values, namely applying the selected actions to the environment of the dynamic energy management system, calculating the running cost or income of the system, and counting the average target values under all training scenes as the fitness of individuals; step 4, judging whether the termination condition is met, if yes, executing step 8, otherwise, executing step 5; step 5, performing elite retention strategy on the population according to the fitness, and selecting partial optimal individuals to directly enter the next generation; step 6, performing cross mutation operation, namely performing tournament selection on individuals in the population, and selecting parent individuals for genetic operation, wherein the genetic operation comprises two-point cross operation and multi-surface mutation operation, and generating offspring individuals; step 7, updating the population by elite individuals and newly generated offspring individuals, and repeating the steps 2 to 7; And 8, outputting an individual with the optimal fitness as an optimal energy scheduling strategy, and converting the linear instruction sequence of the individual into an explicit human-readable scheduling heuristic rule for real-time scheduling of an actual energy system.
2. The method of claim 1, wherein initializing the LGP population in step 1 comprises: Initializing a population The population size is Each individual is initialized to a linear sequence of instructions Wherein Is the program length; Each instruction Represented as tuples Wherein: An operator selected from a predefined set of functions; Indexing a source register; A target register index; the register includes three sets: input register set For storing environmental state variables; constant register set For storing a predefined constant; Computing register set The method is used for storing intermediate calculation results and final output, and context semantic information transfer is realized through multiplexing.
3. The method according to claim 1, wherein storing and multiplexing intermediate calculation results using calculation registers in step 2 comprises: For a strong coupling action scene, the evolved instruction sequence coordinates the output of different actions by multiplexing intermediate values in the same calculation register; For a weakly coupled action scene, the evolved instruction sequences form structurally separated calculation paths in the same program, and different actions are calculated respectively; therefore, self-adaptive action coupling relation learning is realized, and a single-agent or multi-agent architecture is not required to be preset.
4. The method of claim 1, wherein the LGP-based discrete action mapping policy in step 2 comprises: for each candidate discrete action Attach it to the current state Post-formation enhanced state ; Will enhance the state Input LGP program execution, scalar output value from specified output register As a priority score for the action; Selecting actions with highest priority scores As an actual execution action: 。
5. The method according to claim 1, wherein the evaluating the fitness in step 3 comprises: Randomly selecting from a preset training set The dynamic energy management system scenes are used as training scenes; executing a scheduling strategy corresponding to the current individual in each scene, and calculating an accumulated objective function value; counting the average objective function value under all training scenes as the fitness of the individual 。
6. The method of claim 1, wherein the multi-faceted mutation operation of step 6 comprises: randomly selecting a part of instructions in the individual, and randomly executing one of the following three mutation types for each selected instruction: 1) An operator mutation, namely replacing an operator of an instruction with a new operator in a function set; 2) Source mutation, namely randomly changing a source register index of an instruction; 3) Target mutation, namely randomly changing the target register index of the instruction.
7. The method of claim 6, wherein in the operator mutation, if the number of parameters required for a new operator is different, the source registers are randomly regenerated from all register pools.
8. An electronic device comprising a memory storing a computer program and a processor implementing the method of any of claims 1 to 7 when the computer program is executed by the processor.
9. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.

Description

Self-adaptive interpretable energy scheduling method based on linear genetic programming and related equipment Technical Field The application relates to the technical field of energy management and evolution calculation, in particular to a self-adaptive interpretable energy scheduling method based on linear genetic Programming (LINEAR GENETIC Programming (LGP) and related equipment. Background With the large-scale access of renewable energy sources, the fluctuation and uncertainty of energy systems are increasingly prominent. Traditional model-based optimization methods (e.g., mixed integer programming, stochastic optimization) rely heavily on accurate system models and predictive data and are difficult to implement in dynamic and nonlinear environments. In recent years, deep reinforcement learning (Deep Reinforcement Learning, DRL) has made significant progress in energy scheduling as a model-free method. However, DRL has two prominent defects that firstly, the strategy is usually represented by a deep neural network, the decision process is opaque, the strategy belongs to a 'black box' model and is difficult to trust and deploy in a security-sensitive practical system, and secondly, when the scheduling problem of different action coupling characteristics is faced, a single-agent or multi-agent architecture is selected due to lack of a clear guiding principle, so that an algorithm is not matched with a problem structure, and scheduling performance is influenced. In the prior art, research is also attempted to use heuristic rules for energy scheduling, but the design depends on expert experience, and the method is difficult to adapt to complex and changeable dynamic environments. Therefore, a new energy scheduling method that has high performance, provides transparent decision logic, and adapts to the action coupling relationship is needed. Disclosure of Invention The embodiment of the application mainly aims to provide a self-adaptive interpretable energy scheduling method and related equipment based on linear genetic programming, so that inherent interpretability of a decision process is realized while scheduling performance is ensured, and different action coupling relations are processed in a self-adaptive manner. To achieve the above object, an aspect of an embodiment of the present application proposes an adaptive interpretable energy scheduler based on linear genetic programming, the method comprising: Step 1, constructing a dynamic energy management system environment model, defining a state space, an action space and an optimization target, and initializing a Linear Genetic Programming (LGP) population, wherein the population comprises a plurality of individuals, and each individual is encoded into a linear instruction sequence; At each decision time, adopting a discrete action mapping strategy based on LGP, forming an enhanced input by the system state and the candidate discrete actions, storing and multiplexing intermediate calculation results (semantic information) by using a calculation register, calculating an output value of each candidate action, and selecting the action with the highest priority as an actual execution action; Step 3, evaluating and acquiring fitness values, namely applying the selected actions to the environment of the dynamic energy management system, calculating the running cost or income of the system, and counting the average target values under all training scenes as the fitness of individuals; step 4, judging whether the termination condition is met, if yes, executing step 8, otherwise, executing step 5; step 5, performing elite retention strategy on the population according to the fitness, and selecting partial optimal individuals to directly enter the next generation; Step 6, performing cross mutation operation, namely performing tournament selection on individuals in the population, and selecting parent individuals for genetic operation, wherein the genetic operation comprises two-point cross operation and multi-face mutation operation (including operator, source register and target register mutation), so as to generate offspring individuals; step 7, updating the population by elite individuals and newly generated offspring individuals, and repeating the steps 2 to 7; And 8, outputting an individual with the optimal fitness as an optimal energy scheduling strategy, and converting the linear instruction sequence of the individual into an explicit human-readable scheduling heuristic rule for real-time scheduling of an actual energy system. In some embodiments, initializing the LGP population in step 1 comprises: Initializing a population The population size isEach individual is initialized to a linear sequence of instructionsWhereinIs the program length; Each instruction Represented as tuplesWherein: An operator selected from a predefined set of functions; Indexing a source register; A target register index; the register includes three sets: input register set For s