CN-121996364-A - Automatic task scheduling method, device, equipment and medium based on HDFS

CN121996364ACN 121996364 ACN121996364 ACN 121996364ACN-121996364-A

Abstract

The application discloses an automatic task scheduling method, device, equipment and medium based on an HDFS, and relates to the technical field of data analysis, wherein the method comprises the steps of starting a main operation control script if no occupation mark exists in the current HDFS, determining a plurality of node scripts to be executed according to a service execution time period, sequencing to obtain a sequencing index, and determining whether a preset target file exists in the current HDFS; if the node script does not exist, creating a preset target file, executing a node script to be executed corresponding to a first ordering index, if the node script does exist, executing the node script to be executed corresponding to the latest ordering index in the preset target file, after the execution of any node script to be executed is finished, writing the next ordering index into the preset target file, and jumping to the step of determining whether the preset target file exists in the current HDFS or not until all the node scripts to be executed are executed. Therefore, the requirements of automatic serial execution and breakpoint running of the scheduling task can be met rapidly and conveniently.

Inventors

Zhou lanting
WANG GANG
WANG XINGEN
JIANG MINGLI
YU HAIBO

Assignees

浙江邦盛科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20241106

Claims (10)

1. An automatic task scheduling method based on an HDFS, comprising: detecting whether an occupation mark of a main operation control script exists in the current HDFS; If the occupation mark does not exist, starting the main operation control script, and traversing a target directory which is positioned at the same position as the main operation control script according to a service execution time period corresponding to a scheduling task so as to determine a plurality of node scripts to be executed in the target directory and corresponding to the service execution time period; Sequencing the plurality of node scripts to be executed to obtain sequencing subscripts corresponding to the node scripts to be executed, and determining whether a preset target file exists in the current HDFS; if the preset target file does not exist, the preset target file is created, and the node script to be executed corresponding to the first ordering index is executed; If the preset target file exists, reading the latest ordering index stored in the current preset target file, and executing the node script to be executed, which corresponds to the latest ordering index read currently; After the execution of any node script to be executed is finished, writing the corresponding next ordering index into the preset target file, and jumping to the step of determining whether the preset target file exists in the current HDFS or not until all the node scripts to be executed are finished.
2. The method for automatically scheduling tasks based on HDFS according to claim 1, wherein after detecting whether the occupation mark of the main running control script exists in the current HDFS, further comprises: and if the occupation mark exists in the current HDFS, ending the automatic task scheduling.
3. The automatic task scheduling method based on HDFS according to claim 1, wherein if the occupation flag does not exist, starting the master running control script, and then traversing a target directory co-located with the master running control script according to a service execution time period corresponding to a scheduled task, so as to determine a plurality of node scripts to be executed corresponding to the service execution time period in the target directory, including: if the occupation mark does not exist in the HDFS currently, creating the occupation mark, starting the main operation control script, and then determining whether a scheduling task is transmitted into a service execution time period parameter so as to generate a corresponding confirmation result; If the confirmation result indicates that the scheduling task is transmitted into the service execution time period parameter, taking the time period corresponding to the service execution time period parameter as the service execution time period corresponding to the scheduling task; If the confirmation result indicates that the dispatching task is not transmitted into the service execution time period parameter, detecting whether a service execution time period mark file corresponding to the dispatching task exists in the current HDFS, and determining the service execution time period according to the generated detection result; Traversing a target directory which is positioned at the same position as the main operation control script according to the service execution time period so as to determine a plurality of node scripts to be executed, which correspond to the service execution time period, in the target directory.
4. The HDFS-based automated task scheduling method according to claim 3, wherein the determining the service execution period according to the generated detection result includes: if the detection result represents that the service execution time period mark file exists, taking a time period corresponding to the target service execution time period parameter in the service execution time period mark file as the service execution time period; If the detection result indicates that the service execution time period mark file does not exist, the service execution time period mark file is created, and the current system time period is stored through the service execution time period mark file, so that the current system time period is used as the service execution time period.
5. The HDFS-based automatic task scheduling method according to claim 1, wherein the ranking the plurality of node scripts to be executed to obtain ranking indexes corresponding to the node scripts to be executed includes: Sorting the plurality of node scripts to be executed based on file name prefixes corresponding to the plurality of node scripts to be executed so as to obtain corresponding script sorting results; and generating an ordering index corresponding to each node script to be executed based on the script ordering result.
6. The HDFS based automated task scheduling method of claim 1, further comprising: After any node script to be executed finishes execution, recording an execution finishing time period corresponding to the node script to be executed after the execution finishing time period is finished, and updating the service execution time period based on the execution finishing time period.
7. The HDFS based automated task scheduling method according to any one of claims 1 to 6, further comprising: And if the node script to be executed is executed, deleting the preset target file and the occupation mark, and updating the service execution time period based on a preset date updating rule, wherein the preset date updating rule is a rule for increasing the service execution time period according to a preset time increasing threshold.
8. An automatic task scheduling device based on HDFS, comprising: The mark detection module is used for detecting whether the occupation mark of the main operation control script exists in the current HDFS; The catalog traversing module is used for starting the main operation control script if the occupation mark does not exist, and then traversing a target catalog which is positioned at the same position with the main operation control script according to the service execution time period corresponding to the scheduling task so as to determine a plurality of node scripts to be executed, which correspond to the service execution time period, in the target catalog; The script ordering module is used for ordering the plurality of node scripts to be executed to obtain ordering subscripts corresponding to the node scripts to be executed, and determining whether a preset target file exists in the current HDFS; the first script execution module is used for creating the preset target file if the preset target file does not exist, and executing the node script to be executed corresponding to the first ordering index; The second script execution module is used for reading the latest ordering index stored in the current preset target file if the preset target file exists, and executing the node script to be executed corresponding to the latest ordering index read currently; And the step jump module is used for writing the corresponding next ordering index into the preset target file after the execution of any node script to be executed is finished, and jumping to the step of determining whether the preset target file exists in the current HDFS or not until the execution of each node script to be executed is finished.
9. An electronic device, comprising: A memory for storing a computer program; A processor for executing the computer program to implement the HDFS based automated task scheduling method according to any one of claims 1 to 7.
10. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the HDFS based automated task scheduling method according to any one of claims 1 to 7.

Description

Automatic task scheduling method, device, equipment and medium based on HDFS Technical Field The invention relates to the technical field of data analysis, in particular to an automatic task scheduling method, device, equipment and medium based on an HDFS. Background In computer science, scheduling tasks generally refers to a method of managing and allocating system resources to multiple threads, processes, or data streams. For a scheduled task in an offline scenario, a manner of executing a shell command by a timed task is generally adopted, so that the requirement that the operation of the scheduled task is not affected under the offline condition is met. However, the method needs to deploy the application, has high implementation cost, strong dependence on application components, limited application scenes, only can execute a single script at a specific time point or in a time interval, can not realize serial execution when a plurality of nodes are required to be arranged, has complex operation, and can only trigger the task to continue to be executed in a manual intervention mode if the scheduling task is suddenly interrupted in the execution process, thereby increasing the operation and maintenance cost, having higher implementation cost and poor expansibility. In summary, how to quickly and conveniently meet the requirements of automatic serial execution of scheduling tasks and breakpoint running is a technical problem to be solved at present. Disclosure of Invention In view of the above, the present invention aims to provide an automatic task scheduling method, device, equipment and medium based on HDFS, which can rapidly and conveniently meet the requirements of automatic serial execution and breakpoint running of scheduled tasks. The specific scheme is as follows: in a first aspect, the present application provides an automatic task scheduling method based on HDFS, including: detecting whether an occupation mark of a main operation control script exists in the current HDFS; If the occupation mark does not exist, starting the main operation control script, and traversing a target directory which is positioned at the same position as the main operation control script according to a service execution time period corresponding to a scheduling task so as to determine a plurality of node scripts to be executed in the target directory and corresponding to the service execution time period; Sequencing the plurality of node scripts to be executed to obtain sequencing subscripts corresponding to the node scripts to be executed, and determining whether a preset target file exists in the current HDFS; if the preset target file does not exist, the preset target file is created, and the node script to be executed corresponding to the first ordering index is executed; If the preset target file exists, reading the latest ordering index stored in the current preset target file, and executing the node script to be executed, which corresponds to the latest ordering index read currently; After the execution of any node script to be executed is finished, writing the corresponding next ordering index into the preset target file, and jumping to the step of determining whether the preset target file exists in the current HDFS or not until all the node scripts to be executed are finished. Optionally, after detecting whether the occupation mark of the main operation control script exists in the current HDFS, the method further includes: and if the occupation mark exists in the current HDFS, ending the automatic task scheduling. Optionally, if the occupation mark does not exist, starting the main operation control script, and then traversing a target directory located at the same position as the main operation control script according to a service execution time period corresponding to a scheduling task to determine a plurality of node scripts to be executed corresponding to the service execution time period in the target directory, including: if the occupation mark does not exist in the HDFS currently, creating the occupation mark, starting the main operation control script, and then determining whether a scheduling task is transmitted into a service execution time period parameter so as to generate a corresponding confirmation result; If the confirmation result indicates that the scheduling task is transmitted into the service execution time period parameter, taking the time period corresponding to the service execution time period parameter as the service execution time period corresponding to the scheduling task; If the confirmation result indicates that the dispatching task is not transmitted into the service execution time period parameter, detecting whether a service execution time period mark file corresponding to the dispatching task exists in the current HDFS, and determining the service execution time period according to the generated detection result; Traversing a target directory which is positioned at the same