CN-122022335-A - Park intelligent operation scheduling method and system based on reinforcement learning

CN122022335ACN 122022335 ACN122022335 ACN 122022335ACN-122022335-A

Abstract

The invention discloses a park intelligent operation scheduling method and system based on reinforcement learning, and belongs to the technical field of operation scheduling. The method firstly builds an entity-space-time topological graph of fusion equipment, personnel and space nodes. And the time sequence characteristics and the physical connection characteristics of the equipment are fused by utilizing the space-time diagram convolution network, so that accurate cascade fault prediction is realized. The core scheduling module adopts the deep reinforcement learning with enhanced mask, and the search space is greatly compressed through the invalid 'task-person' pairing action of the dynamic generated binary mask vector forced mask skill mismatch or the unavailability of the person. Meanwhile, a dynamic rewarding function perceived by environmental context is introduced, so that a scheduling strategy can adapt to macroscopic environmental changes of a park. The method solves the problems of slow convergence of heterogeneous resource scheduling, lack of spatial correlation of fault prediction, stiff strategy and the like, and realizes real-time, accurate and self-adaptive intelligent scheduling of park operation.

Inventors

Zhong Zicheng
CHEN JIANWEN

Assignees

深圳市一应科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (9)

1. The intelligent operation scheduling method for the park based on reinforcement learning is characterized by comprising the following steps: S100, constructing an entity-space-time topological graph G= (V, E, A) representing a physical space and a logical relationship of a park based on building information model BIM data, internet of things (Internet of things) IoT sensor data and personnel information data; The system comprises a node set, an edge set, a rule set and a rule set, wherein V represents the node set and comprises equipment nodes, personnel nodes and space nodes; s200, inputting the equipment node time sequence data in the entity-space-time topological graph G into a space-time graph convolution network, fusing the time sequence characteristics of the equipment and the space topological characteristics based on the physical connection edge, and outputting fault probability prediction vectors of all the equipment in the park in a future time window; S300, fusing the fault probability prediction vector with an externally input work order request to generate a dynamic task set of a current scheduling period, and generating a binary action mask vector according to the skill requirement of each task in the dynamic task set and the real-time skill label and the available state of the personnel node in the entity-space-time topological graph The mask vector For identifying invalid ones of all task-person pairing actions; s400, embedding the current state of the entity-space-time topological graph into a low-dimensional state vector Input to a reinforcement learning agent with action masking mechanism, said reinforcement learning agent incorporating an environmental context in accordance with a context Dynamic reward function of (2) And utilize the motion mask vector Constraint of decision space, calculation of decision value of all effective task-personnel pairing actions, and selection of action with highest decision value Outputting the optimal scheduling instruction; s500, instruction issuing and closed loop feedback, namely issuing the optimal scheduling instruction Collecting the actual execution result data of the instruction, including the actual completion time and completion quality of the task, and collecting the execution result data and the corresponding state Action of And the obtained rewards are stored in an experience playback pool, and continuous online training and model updating are carried out on the reinforcement learning intelligent agent.
2. The reinforcement learning-based intelligent campus operation scheduling method of claim 1, wherein S200 specifically comprises: S201, the space-time diagram convolution network adopts a double-flow structure, and comprises a time flow and a space flow; s202, the time flow adopts a gating circulation unit to process time sequence data of a single equipment node, and extracts time dependency characteristics ; S203, adopting a graph rolling network for the space flow, and based on the adjacency matrix Aggregating the features of physically connected neighbor nodes and extracting the space topology features ; S204, the attention mechanism pairs And (3) with And carrying out weighted fusion to generate final fault probability prediction.
3. The reinforcement learning-based intelligent campus operation scheduling method of claim 1, wherein in S300, an action mask vector is generated The rules of (1) include: For any one person assigned Processing tasks Action of (2) The corresponding mask value is determined if any of the following conditions is satisfied Set to 0, otherwise set to 1: The first condition is personnel The skill set of labels does not contain tasks The required core skills; condition two, personnel Is busy or unavailable.
4. The intelligent operation scheduling method for a park based on reinforcement learning according to claim 1, wherein in S400, a dynamic rewarding function is provided The definition is as follows: ; In the formula, For a weighted sum of priorities of the completed tasks, In order for the response time to be an average, In order to schedule the cost of the scheduling, To be respectively in context with the environment An associated dynamic weight coefficient; the environmental context Including one or more of weather pre-warning level, campus activity level, emergency public event level.
5. The reinforcement learning-based intelligent campus operation scheduling method according to claim 1, wherein in S400, the decision value The calculation formula of (2) is as follows: ; In the formula, Representing reinforcement learning network versus in-state Down selection action The output original Q value; For personnel Skill vector and task of (a) Cosine similarity of the demand vector of (1) with a value range of [0,1]; For personnel Is a function of the current skill level of the skill, For the task Is used for estimating the complexity level; Taking the space-time attenuation coefficient as a super parameter with the value larger than 0; For personnel Current location arrival task Space path cost of the site And tasks Is a predicted processing time of (a) Is given by: ; And The weight coefficients of the space path cost and the estimated processing time are respectively; to schedule the inertia compensation term, if personnel Successful completion of the task in the near future The highly similar task takes 1, otherwise takes 0.
6. The reinforcement learning-based intelligent campus operation scheduling method according to claim 5, wherein the scheduling inertia compensation term is in a dynamic comprehensive decision value formula The task judgment standard of the high similarity is that the new task The coincidence degree of the historical task and the historical task in three dimensions of equipment type, fault code and required skill exceeds a preset threshold value.
7. The intelligent operation scheduling method for a park based on reinforcement learning as set forth in claim 1, wherein the reinforcement learning agent calculates a target Q value of the loss during the training phase And the calculation of (2) is also corrected by using the decision value.
8. The intelligent operation scheduling system for the park based on the reinforcement learning, which is used for realizing the intelligent operation scheduling method for the park based on the reinforcement learning as claimed in claim 1, is characterized in that the system comprises: the multi-source heterogeneous data map construction module is used for accessing BIM data, ioT sensor data and personnel data, constructing and updating the entity-space-time topological graph G in real time; the space-time topology fusion prediction module comprises the space-time diagram convolution network and is used for predicting the failure probability of the output equipment; The mask enhanced dynamic scheduling module comprises the reinforcement learning agent with an action mask mechanism, integrates the dynamic rewarding function and the formula of the decision value, and generates an optimal scheduling instruction; And the execution and closed-loop feedback module is used for issuing an optimal scheduling instruction, collecting execution result data, and managing an experience playback pool so as to finish continuous training update of the reinforcement learning intelligent agent.
9. The intelligent operation scheduling system for the park based on reinforcement learning according to claim 8, wherein in the entity-space-time topological graph G constructed by the multi-source heterogeneous data map construction module, physical connection edges are determined according to actual connection relations among pipelines, circuits and air channels in the BIM.

Description

Park intelligent operation scheduling method and system based on reinforcement learning Technical Field The invention relates to the technical field of operation scheduling, in particular to a park intelligent operation scheduling method and system based on reinforcement learning. Background Under the powerful drive of the current global digital transformation surge and two-carbon strategic targets, the intelligent park is used as a core carrier for intelligent transformation of urban space and industrial upgrading, and is undergoing a deep transformation from technical integration to ecological reconstruction. The park is not only a physical space for industry gathering, but also a key node for promoting economic high-quality development and realizing green low-carbon transformation. However, with the increasing number of enterprises and personnel in a campus, unprecedented high requirements for operation management efficiency, quality of service and resource integration are put forward, and it has been difficult for traditional operation management modes to meet the complex demands of modern campus development. Although the technical conditions are mature, the technical scheme of the existing intelligent park still has systematic defects in the aspect of realizing efficient and intelligent collaborative scheduling. Firstly, deep data splitting and fusion failure are common problems, subsystems such as security protection, energy consumption and traffic are often operated independently to form a data island, so that global coordination is difficult, and comprehensive and deep description of the operation state of a park is difficult to form. Secondly, in the key prediction and decision links, the existing method depends on a preset static rule or a simple statistical model, is slow in response, and cannot effectively cope with dynamic and sudden park changes. The method of partly introducing machine learning is also focused on correlation analysis among data, but not deep causal relationship reasoning, and influences the accuracy of decision making. More importantly, the existing scheduling strategy often adopts a solidified optimization target, lacks elasticity, and cannot dynamically balance and intelligently switch among a plurality of conflict targets such as energy conservation, safety, efficiency and the like according to real-time situations. These defects together lead to low levels of refinement and intellectualization of resource allocation, and often problems such as resource waste, low management efficiency, untimely service response and the like occur. Disclosure of Invention The invention aims to provide a park intelligent operation scheduling method and system based on reinforcement learning, which are used for solving the problems in the background technology. In order to solve the technical problems, the invention provides a park intelligent operation scheduling method based on reinforcement learning, which comprises the following steps: And S100, constructing an entity-space-time topological graph G= (V, E, A) representing the physical space and the logical relation of the park based on the building information model BIM data, the internet of things (Internet of things) IoT sensor data and the personnel information data. The system comprises a node set, an edge set, a management attribute matrix and a storage matrix, wherein V represents the node set and comprises equipment nodes, personnel nodes and space nodes, E represents the edge set and comprises physical connection edges between equipment, space adjacent edges between entities and management attribute edges between personnel and equipment or space based on BIM information, and A is a corresponding adjacent matrix. The multi-source heterogeneous data such as BIM, ioT, personnel information system and the like are integrated into a structured graph model, the data island is broken through from the root, and a comprehensive, accurate and computable digital twin base of the park is constructed. A unified context is provided for all subsequent analysis decisions. S200, inputting the time sequence data of the equipment nodes in the entity-space-time topological graph G into a space-time graph convolution network, fusing the time sequence characteristics of the equipment and the space topological characteristics based on the physical connection edge, and outputting the fault probability prediction vector of each equipment in the park in a future time window. The method specifically comprises the following steps: S201, a space-time diagram convolution network adopts a double-flow structure, and comprises a time flow and a space flow. The time-stream GRU excels in the operational laws and degradation trends of the capture device itself. The spatial stream GCN learns co-occurrence patterns or interactions of the local device population by aggregating the features of neighboring nodes. S202, processing time sequence data of a single equipment