CN-115756814-B - Industrial big data computing resource isolation and quantification system
Abstract
The invention provides an industrial big data computing resource isolation and quantification system which comprises a resource quantification dynamic integration statistics module, a resource isolation dynamic adjustment module, a DAG flow chart task operation resource quantification module, a task creation stage demand resource detection module, a task execution isolation module, a task consumption pressed mechanism module, a high concurrency scene resource dynamic management and control module, a resource logic allocation layer and physical layer decoupling module and a resource group dynamic monitoring function module. The invention ensures the stability and reliability of the distributed job scheduling and execution system, effectively solves the problems that the distributed job scheduling is reasonably distributed and controlled according to task execution resources and the cluster resources are required to monitor business in the task operation process, and can be applied to any project applying xIn Plat in the future.
Inventors
- ZHANG YUNLONG
- CAI LIMING
- Qing Linxin
- PENG SHENGJI
- HUANG MING
- TENG YILONG
- ZHOU MING
Assignees
- 上海宝信软件股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20210906
Claims (7)
- 1. The industrial big data computing resource isolation and quantification system is characterized by comprising a resource quantification dynamic integration statistics module, a resource isolation dynamic adjustment module, a DAG flow chart task operation resource quantification module, a task creation stage demand resource detection module, a task execution isolation module, a task consumption pressed mechanism module, a high concurrency scene resource dynamic management and control module, a resource logic allocation layer and physical layer decoupling module and a resource group dynamic monitoring function module; the resource quantization dynamic integration statistics module and the resource isolation dynamic adjustment module work cooperatively and trigger the DAG flow chart task operation resource quantization module to perform quantization calculation on resources used by the task flow and obtain use data of the resources, and then the task creation stage is conducted with the resource detection module to perform resource judgment; the resource isolation dynamic adjustment module is used for persisting information of a resource group of the distributed task flow scheduling platform, dynamically modifying resource configuration of the resource group, a task execution mode and performance-related influence factors; The DAG flow chart task operation resource quantifying module analyzes the component steps of the task flow, quantifies the resources required by the whole task flow when the task flow is scheduled and executed according to the DAG result obtained by analysis and the operation resource optimizing algorithm; the task creation stage demand resource detection module detects whether resources required by the current task flow operation can normally operate in a binding resource group according to the resource group information corresponding to the workflow; The resource logic allocation layer and the physical layer decoupling module are responsible for lifecycle management responsibilities of the resource group and the work group, and realize decoupling and use transparentization of binding relations between the actual computing nodes and the resource group; Establishing a resource group and a work group by using the function, and dividing a proper resource allocation resource group from a resource pool of a task flow cluster according to a proper use scene; and in the high concurrency scene resource dynamic management and control module, the application consistency of corresponding resources in task flow execution is ensured by virtue of an atomic mechanism of a DB layer according to the label of a resource group in a polled task flow execution command and the current resource state of the resource group by the TCP access persistent storage layer function.
- 2. The system for isolating and quantifying industrial big data computing resources according to claim 1, wherein the resource quantifying dynamic integration statistics module provides distributed cluster scheduling, computing node resource collection, and quantifying statistical integration presentation functions.
- 3. The system for isolating and quantifying industrial big data computing resources according to claim 1, wherein the task execution isolation module performs slotted consumption on submitted task flow instance trigger commands according to a resource group number.
- 4. The industrial big data computing resource isolation and quantification system of claim 1, wherein the task consumption is protected by a pressing module to schedule a cluster of distributed task flows to invoke type task flows at a plurality of interfaces, submitting scheduled execution.
- 5. The system for isolating and quantifying industrial big data computing resources according to claim 1, wherein the high concurrency scenario resource dynamic management and control module dynamically adjusts resource information, a super-allocation weight factor, and task flow consumption mode parameters of a resource group.
- 6. The system for isolating and quantifying industrial big data computing resources according to claim 1, wherein the resource logical deployment layer and physical layer decoupling module manage the lifecycle of the resource group and the workgroup.
- 7. The system for isolating and quantifying industrial big data computing resources according to claim 1, wherein the resource group dynamic monitoring function module monitors the use condition of the resource group resources managed by the whole distributed scheduling platform in real time.
Description
Industrial big data computing resource isolation and quantification system Technical Field The invention relates to the technical field of distributed scheduling systems, in particular to an industrial big data computing resource isolation and quantization system, and especially relates to an industrial big data computing resource isolatable and quantifiable design method and implementation. Background Along with the development of informatization and intellectualization of industry, the digitalization degree of industrial production and enterprise management processes is increased continuously, which means that more and more manual operations can be solidified and precipitated into a plurality of programs, and a task flow scheduling platform performs scheduling execution on tasks according to a specified triggering mode at a time point. The task flow scheduling platform is one of the core functions of industrial digitization and intellectualization. With this, more complex workflow analysis and execution trigger amount peak-valley variable task scheduling scenarios are followed, and specific challenges are as follows: 1. computing resource supervision challenges In the industrial informatization pushing practice, a large number of tasks are consistent with the single task, the time consumption is short, the consumed resources are less, but the trigger quantity of the task execution is large in a short time, and the task flow can be rapidly borrowed and returned to the resources when the task flow is executed by the scheduling platform. But the current resource control solutions in the industry are biased to the scenes of long time consumption of a single task and large use of resources. Therefore, the existing resource management schemes in the industry cannot meet the management and supervision of the current task flow scheduling platform on the use of resources and guarantee on the task flow scheduling performance. 2. Computing execution isolation challenges Based on the scheduling execution scenes of different task flows, the execution scenes of partial tasks have high real-time performance, for example, the timing scheduling tasks are not expected to be influenced by the interface call trigger type tasks, or the tasks under different workspaces are isolated based on workspaces. Therefore, these requirements are relatively high on the task flow scheduling platform, which is relatively challenging to realize. 3. Resource quantization calculation challenge in workflow distributed scene The task flow is formed by combining steps according to different execution dependency relationships, and serial and parallel distinction exists between the steps. Therefore, to implement the resource management function, the task flow scheduling platform needs to implement the quantization and statistics of the execution resources at the granularity of the task flow first, which means that DAG analysis is performed on the constituent steps of the task flow, and the resource usage amount required for executing the task flow is calculated according to the resource optimization algorithm. 4. Distributed cluster resource quantization challenges Architects of the task flow scheduling platform are distributed and decentralized, and all support cluster node lateral expansion at the cluster level architecture. However, such a high-scalability structure design makes it difficult to quantitatively count the resources of the whole cluster, which means that the high scalability of the cluster is also considered when the task flow is accurately and quantitatively counted in real time to schedule the resources of the cluster. 5. Dynamic lateral capacity expansion challenge for distributed cluster resources The distributed task flow scheduling platform is faced, and the distributed task flow scheduling platform needs to be guaranteed in the scene of the transverse expansion nodes of the cluster. The resource supervision function of the cluster can make a corresponding change to the cluster resource in real time, and the division of the resource can ensure the realization of dynamic modification, so that the distributed task flow dispatching platform can be ensured to provide high availability and high expansibility in the face of different production environments. In summary, the distributed task flow scheduling platform realizes the requirements in the resource sensing and isolation functions, and needs to meet the requirements of providing support for the following functional items: (1) Distributed cluster resource quantization statistics (2) Resource supervision supporting transverse dynamic expansion and contraction capacity (3) Task stream resource borrowing and returning supporting high concurrency scene (4) Supporting task execution isolation at the workspace and task flow levels (5) Computing optimal operation resources for task flows consisting of complex execution steps A monitoring device, system and method for data collection