CN-122019071-A - Super-graph-based crowdsourcing edge cloud task scheduling method and system

CN122019071ACN 122019071 ACN122019071 ACN 122019071ACN-122019071-A

Abstract

The invention relates to the technical field of distributed computing, in particular to a crowdsourcing edge cloud task scheduling method and system based on hypergraph, comprising a pre-scheduling stage, wherein when a new job is received, a dynamic hypergraph neural network is utilized to model the current resource state of a cluster, and the modeling explicitly captures high-order dynamic association among resources so as to generate a resource state representation for determining the initial placement position of the new job; and a rescheduling stage, namely, constructing a unified hierarchical resource hypergraph model in response to cluster state change, wherein the hierarchical resource hypergraph model integrates discrete multi-layer scheduling constraint into one, and carrying out resource matching and conflict resolution based on the hierarchical resource hypergraph model so as to execute rescheduling of the job. The task scheduling mechanism can accurately sense and model high-order dynamic association between resources to avoid suboptimal initial placement, and unify and coordinate multi-level scheduling constraint to realize quick resolution of conflict and efficient runtime optimization.

Inventors

QIU CHAO
WANG CHENGWEI
SHEN SHIHAO
Hou Chenxuan
WANG XIAOFEI

Assignees

天津大学

Dates

Publication Date: 20260512
Application Date: 20251205

Claims (10)

1. The super-graph-based crowdsourcing edge cloud task scheduling method is characterized by comprising the following steps of: A pre-scheduling stage, when a new job is received, modeling the current resource state of the cluster by utilizing a dynamic hypergraph neural network, wherein the modeling explicitly captures high-order dynamic association among resources to generate a resource state representation for determining the initial placement position of the new job; And a rescheduling stage, namely, constructing a unified hierarchical resource hypergraph model in response to cluster state change, wherein the hierarchical resource hypergraph model integrates discrete multi-layer scheduling constraint into one, and carrying out resource matching and conflict resolution based on the hierarchical resource hypergraph model so as to execute rescheduling of the job.
2. The method of claim 1, wherein modeling the current resource state of the cluster using the dynamic hypergraph neural network comprises: dynamically constructing a hypergraph, namely combining node attributes and neighbor correlations to dynamically generate a hyperedge set, wherein a K-nearest neighbor algorithm is utilized to construct local hyperedges, and a K-mean clustering algorithm is utilized to construct global hyperedges; And performing hypergraph convolution, namely learning the dynamically changed cluster state through vertex convolution and hyperedge convolution operation to generate the resource state representation.
3. The method of claim 2, wherein performing hypergraph convolution comprises computing a focus weight for each hyperedge using a self-attention mechanism when performing hyperedge convolution, and weighting the associated hyperedge features based on the focus weights to update the features of the vertices.
4. The method of claim 1, wherein the pre-dispatch stage further comprises modeling the initial placement location determination process as a Markov decision process and employing a near-end policy optimization algorithm to solve the MDP based on the resource status characterization to output the initial placement location.
5. The method of claim 1, wherein the hierarchical resource hypergraph model comprises a topology hyperedge for encoding physical topological connections between Pods and their operating nodes, a management hyperedge for capturing management dependencies between master nodes and Pods managed by them, and a service hyperedge for grouping Pods with similar service capabilities.
6. The method of claim 5, wherein the performing resource matching and conflict resolution based on the hierarchical resource hypergraph model comprises performing hierarchical rescheduling in a predetermined order by first performing topology level rescheduling, attempting resource matching within a topology hyperedge of a job origin, performing management level rescheduling, if the topology level rescheduling fails, attempting resource matching within a management hyperedge of the job origin (across different topology hyperedges), and performing service level rescheduling, if the management level rescheduling fails, attempting resource matching within other service hyperedges that match job service capabilities.
7. The utility model provides a crowdsourcing edge cloud task scheduling system based on hypergraph which characterized in that includes: A prescheduling module configured to, upon receipt of a new job, model a current resource state of the cluster using a dynamic hypergraph neural network, the modeling explicitly capturing high-order dynamic associations between resources to generate a resource state representation for determining an initial placement location of the new job; The rescheduling module is configured to respond to the cluster state change, construct a unified hierarchical resource hypergraph model which integrates discrete multi-layer scheduling constraints into one, and perform resource matching and conflict resolution based on the hierarchical resource hypergraph model so as to execute rescheduling of the job.
8. The system of claim 7, wherein the prescheduling module is configured to construct local superedges by a K-nearest neighbor algorithm and global superedges by a K-means clustering algorithm, and to generate the resource state characterization by performing vertex convolution and self-attention mechanism based superedge convolution.
9. The system of claim 7, wherein the hierarchical resource hypergraph model constructed by the rescheduling module comprises a topology hyperedge for encoding topological connections between Pod and working nodes, a management hyperedge for capturing management dependencies between master nodes and Pod, and a service hyperedge for aggregating pods with similar service capabilities.
10. The system of claim 9, wherein the rescheduling module is configured to perform rescheduling in the order of first attempting within the same topology hyperedge, after failure, attempting within the same management hyperedge, after failure again, attempting within other matching service hyperedges.

Description

Super-graph-based crowdsourcing edge cloud task scheduling method and system Technical Field The invention relates to the technical field of distributed computing, in particular to a hypergraph-based crowdsourcing edge cloud task scheduling method and system. Background With the integration of cloud computing and edge computing, a crowdsourcing edge cloud system is used as an emerging distributed computing paradigm, and by introducing edge server resources contributed by a third party, the crowdsourcing edge cloud system has obvious advantages in the aspects of providing ultra-low delay processing and the like. However, nodes of such systems may voluntarily join or leave, resulting in a system environment with a high degree of dynamics and instability. To manage such a complex environment, a container scheduling system such as Kubernetes (K8 s) is widely used for resource management and task scheduling. The task scheduling mechanism of K8s is important to guarantee the reliability of the operation and the utilization rate of cluster resources. However, existing K8s scheduling strategies face serious challenges when applied to highly dynamic crowdsourcing edge cloud scenarios. On the one hand, the existing scheduling method often ignores the high-order dynamic relevance existing between multidimensional resources (such as CPU, GPU, memory and the like) in a cluster and between different working nodes, and tends to evaluate the resources as isolated individuals. Such neglecting results in scheduling decisions that are prone to being locally optimal, e.g., nodes with sufficient CPU resources but a bottleneck of GPU load are allocated to a task, thereby reducing the execution efficiency of the job and the overall resource utilization. On the other hand, K8s itself has a multi-layer scheduling architecture (e.g., cluster level, node level, and Pod level). In the prior art, scheduling constraints and policies of each level tend to be discrete and split. When task scheduling or cluster state change (such as node fault) causes policy conflict between different layers, the system needs to be solved through multiple coordination, which not only introduces significant scheduling delay, but also may cause Service Level Agreement (SLA) default due to untimely response, and brings risk to system stability. Therefore, a new task scheduling mechanism is needed that can overcome the two problems above, namely, not only accurately perceive and model the high-order dynamic association between resources to avoid suboptimal initial placement, but also unify and coordinate multi-level scheduling constraints to achieve fast resolution of conflicts and efficient runtime optimization. Disclosure of Invention The invention mainly aims to provide a crowdsourcing edge cloud task scheduling method and system based on hypergraph, so as to solve the problems in the related technology. In order to achieve the above object, according to one aspect of the present invention, there is provided a crowdsourcing edge cloud task scheduling method based on hypergraph, including: A pre-scheduling stage, when a new job is received, modeling the current resource state of the cluster by utilizing a dynamic hypergraph neural network, wherein the modeling explicitly captures high-order dynamic association among resources to generate a resource state representation for determining the initial placement position of the new job; And a rescheduling stage, namely, constructing a unified hierarchical resource hypergraph model in response to cluster state change, wherein the hierarchical resource hypergraph model integrates discrete multi-layer scheduling constraint into one, and carrying out resource matching and conflict resolution based on the hierarchical resource hypergraph model so as to execute rescheduling of the job. Further, the modeling the current resource state of the cluster by using the dynamic hypergraph neural network comprises the following steps: dynamically constructing a hypergraph, namely combining node attributes and neighbor correlations to dynamically generate a hyperedge set, wherein a K-nearest neighbor algorithm is utilized to construct local hyperedges, and a K-mean clustering algorithm is utilized to construct global hyperedges; And performing hypergraph convolution, namely learning the dynamically changed cluster state through vertex convolution and hyperedge convolution operation to generate the resource state representation. Further, the performing hypergraph convolution comprises the steps of calculating attention weight of each hyperedge by adopting a self-attention mechanism when performing hyperedge convolution, and carrying out weighted aggregation on associated hyperedge characteristics based on the attention weight so as to update the characteristics of the vertex. Further, the pre-scheduling stage further includes modeling the initial placement location determination process as a Markov decision process, and solving the