Search

CN-122009927-A - Multi-elevator intelligent scheduling method and device based on different composition and PPO algorithm

CN122009927ACN 122009927 ACN122009927 ACN 122009927ACN-122009927-A

Abstract

The invention discloses a multi-elevator intelligent scheduling method and device based on different composition and a PPO algorithm. The method comprises the steps of constructing an event-driven simulation environment for training an elevator dispatching strategy, constructing an iso-graph and a request feature vector to be distributed in each dispatching step, inputting the iso-graph and the request feature vector into a strategy value network of a PPO algorithm, outputting motion probability distribution and state value estimation, sampling in a discrete motion space according to the motion probability distribution, distributing the request to be distributed to candidate elevators, receiving a reward signal fed back by the environment, collecting experience data, carrying out repeated small-batch gradient update on the strategy value network, realizing end-to-end optimization of the elevator dispatching strategy, and deploying the obtained strategy in a real elevator dispatching system, so as to realize efficient intelligent group control. According to the invention, the dynamic heterogeneous graph modeling is used for enhancing the state representation, and the PPO algorithm is combined for realizing the stable strategy learning, so that the waiting time of passengers is effectively reduced.

Inventors

  • LIANG LIHUA
  • ZHOU YANG
  • CHEN DONGDONG
  • LIU WANBING
  • QIAN ZHIXUE
  • ZHANG XIAODONG
  • WANG HENGLIN
  • HOU SHUAIHAO

Assignees

  • 浙江工业大学
  • 恒达富士电梯有限公司
  • 浙江省特种设备科学研究院

Dates

Publication Date
20260512
Application Date
20251204

Claims (10)

  1. 1. The intelligent multi-elevator dispatching method based on the heterograms and the PPO algorithm is characterized by comprising the following steps of: s1, constructing an event-driven simulation environment for simulating passenger arrival, multi-elevator operation and a request distribution process; s2, constructing an abnormal pattern by taking elevator nodes and floor nodes as entities and taking semantic relations among the entities as edges; S3, inputting the iso-graph and the request feature vector to be distributed into a strategy value network of a PPO algorithm, and outputting motion probability distribution and state value estimation; S4, sampling in a discrete action space according to the action probability distribution, distributing a request to be distributed to candidate elevators, and receiving an environment feedback rewarding signal; S5, circularly executing S2 to S4, and after experience data collection of one track is completed, carrying out gradient update on the strategy value network based on generalized advantage estimation and a PPO agent objective function; S6, circularly executing S2 to S5 in the simulation operation period to realize end-to-end optimization of the elevator dispatching strategy, and disposing the obtained strategy in a real elevator dispatching system to realize intelligent group control of multiple elevators.
  2. 2. The intelligent scheduling method for multiple elevators based on the iso-composition and PPO algorithm according to claim 1, wherein the event types of the simulation environment in step S1 include a passenger generation event, an elevator stop event, an elevator door opening event, an elevator door closing event, and a request allocation event.
  3. 3. The intelligent multi-elevator dispatching method based on the heterograms and the PPO algorithm according to claim 2, wherein the request allocation event is used for allocating service elevators to be allocated, each request allocation event corresponds to a dispatching decision, and is defined as a dispatching step used for driving environment state transition, and the duration of the dispatching step is longer than the minimum allocation interval, and one request allocation event is triggered.
  4. 4. The multi-elevator intelligent scheduling method based on the heterograph and the PPO algorithm according to claim 3 is characterized in that the allocated elevator taking requests are regularly reassigned according to the maximum assignment interval, when the scheduling step time is longer than the maximum assignment interval, a request assignment event is triggered once, the request waiting for the longest reassignment time is selected for reassignment, and the maximum assignment interval is dynamically adjusted according to the number of the current outstanding requests.
  5. 5. The intelligent scheduling method for multiple elevators based on the iso-graph and the PPO algorithm according to claim 1, wherein in step S3, the edges in the iso-graph are directed edges for modeling the directional interaction relationship between different entities in the elevator system, including inter-floor connection edges, elevator-floor interaction edges, and self-loop edges. The inter-floor connecting edges are used for representing physical reachable relations among floors; the interactive edges of the elevators and the floors are used for representing the association relation between task allocation and states between the elevators and the floors; the self-loop edge is used for reserving state information of the node itself.
  6. 6. The multi-elevator intelligent scheduling method based on the heterograms and the PPO algorithm according to claim 1, wherein in step S3, the policy value network is composed of a feature extraction network, a policy network and a value network, wherein the feature extraction network is used for generating a shared comprehensive state representation and is used as input of the policy network and the value network, the policy network and the value network respectively receive the shared comprehensive state representation as input, the policy network outputs action probability distribution to represent the probability that each candidate elevator is allocated with a request, and the value network outputs state value estimation to represent expected accumulated returns of the current system state.
  7. 7. The multi-elevator intelligent scheduling method based on the heterograms and the PPO algorithm is characterized in that the feature extraction network comprises a feature embedding module, a graphic neural network module, a cross attention module, a global attention pooling module and a feature conversion and enhancement module, wherein the feature embedding module is used for embedding elevator nodes, floor nodes and currently-to-be-allocated request features in the heterograms, inputting the embedded features and edges in the heterograms into the graphic neural network module, distributing differentiated attention weights according to node types and edge types, extracting context aware embedded representations of all the nodes, generating enhanced request features based on the cross attention module, respectively aggregating the context aware embedded representations of the elevator nodes and the floor nodes through the global attention pooling module, generating a map level representation of the elevator nodes and a map level representation of the floor nodes, splicing the map level representation of the elevator nodes, the map level representation of the floor nodes and the enhanced request features to obtain comprehensive state vectors, and inputting the feature conversion and enhancement module to obtain final shared comprehensive state representations.
  8. 8. The multi-elevator intelligent scheduling method based on the heterograph and the PPO algorithm according to claim 1, wherein in step S4, the discrete action space is composed of all elevators capable of participating in scheduling, each action represents a scheduling decision for allocating a passenger request to be currently allocated to a specific elevator, the reward signal is used as a feedback index for evaluating the scheduling behavior quality, and is composed of performance indexes of multiple dimensions through weighted linear combination, and specifically comprises a passenger arrival reward, a passenger waiting time penalty, a passenger waiting timeout penalty, an elevator energy consumption penalty, an elevator overload penalty and an allocation change penalty.
  9. 9. The multi-elevator intelligent scheduling method based on the heterograms and the PPO algorithm according to claim 1, wherein in the step S5, the track is a complete experience sequence formed by interaction of strategies with environments in a plurality of continuous scheduling steps, the generalized dominance estimation calculates dominance values according to multi-step rewarding signals and state value estimation, the dominance values are used for quantifying the relative superiority and inferiority of executed actions and serve as input of PPO proxy objective functions to drive optimization updating of a strategy network, and the proxy objective functions are constructed based on action probability ratios of current strategies and old strategies and limit updating amplitude through a clipping mechanism.
  10. 10. The multi-elevator intelligent scheduling device based on the iso-composition and the PPO algorithm comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the multi-elevator intelligent scheduling device based on the iso-composition and the PPO algorithm is characterized in that the processor realizes the multi-elevator intelligent scheduling method based on the iso-composition and the PPO algorithm according to any one of claims 1-9 when executing the executable codes.

Description

Multi-elevator intelligent scheduling method and device based on different composition and PPO algorithm Technical Field The invention relates to the technical field of elevator dispatching, in particular to a multi-elevator intelligent dispatching method and device based on an iso-composition and a PPO algorithm. Background The high-rise and super-high-rise buildings have a large number of elevators and high passenger flow density, and the dispatching decision of the elevator group control system needs to coordinate a plurality of elevators with the floor requests. The traditional method mostly adopts heuristic strategies based on fixed rules, such as shortest waiting time priority, nearest elevator priority or minimum load priority, and is simple to realize and quick in response, but when facing complex passenger flow scenes, the response strategies are difficult to dynamically adjust, so that the waiting time is overlong or the elevator empty rate is increased. In recent years, artificial intelligence techniques such as reinforcement learning, fuzzy control, ant colony algorithm and the like are gradually applied to the field of elevator dispatching so as to improve decision making capability of a system. The reinforcement learning is a machine learning method for learning an optimal decision strategy through interaction of an agent and an environment, wherein the agent observes the current state of the environment at each time step, selects and executes actions according to the strategy, the environment is transferred to a new state and feeds back an instant reward signal, and the agent aims to optimize the strategy by maximizing long-term jackpot reward so as to realize autonomous decision of complex tasks. In the elevator dispatching scene, the intelligent agent corresponds to a dispatching controller, the environment represents the whole elevator operation system, the state of the intelligent agent comprises information such as the position, the operation direction, the loading state, the floor passenger request and the like of each elevator, the action corresponds to a response instruction of the elevator, and the rewarding function is usually related to performance indexes such as waiting time, overload times or energy consumption and the like. The existing reinforcement learning method mainly comprises a depth Q network based on a value function and a strategy gradient-based method, such as a dominant actor-critique algorithm and a PPO algorithm. The PPO algorithm limits the update amplitude of the current strategy relative to the old strategy by introducing an agent objective function based on strategy ratio cutting, supports multi-round gradient update of single-batch experience data, and is suitable for complex decision tasks such as robot control, path planning and the like. In the related art, a patent CN120503206A aims at the problems of low training efficiency and difficult convergence in the track planning of a seven-degree-of-freedom redundant mechanical arm, adopts a PPO algorithm and designs a staged rewarding function, and gradually guides the mechanical arm to complete a target task through two stages, and a patent CN120491653A proposes an unmanned vehicle dynamic obstacle avoidance method based on the PPO algorithm, which captures the time sequence motion characteristics of dynamic obstacles by fusing multi-mode environment information and adopts a frame stacking technology to enhance the time sequence consistency of state representation. However, existing elevator dispatching methods based on PPO algorithm still have limitations. For example, patent CN120328279a constructs a state input by a basic state matrix and an extended state quantity, but because it uses a multi-layer perceptron to process, structured data still needs to be flattened into a one-dimensional vector, resulting in that the spatial adjacency relationship and dynamic interaction pattern between floors and elevators cannot be captured effectively, and it is difficult to adapt to the input dimension changes under different building configurations. In the field of intelligent scheduling, graph structures are often used to characterize structured relationships between entities, where nodes represent functional units in a system and edges represent connections or interactions between nodes. For example, patent CN120706799a proposes a cascade reservoir optimizing dispatching method, by constructing a graph model and extracting local association features by using a graph neural network, and generating a dispatching strategy by combining a PPO algorithm, the space-time dependency relationship of the system is effectively modeled. According to the type complexity of the nodes and the edges, the graph can be divided into an isomorphic graph and an heterogeneous graph, wherein all the nodes and the edges in the isomorphic graph are identical in type and suitable for a scene with a single structure, and the heterogeneous graph