CN-121979694-A - Big data processing method and system based on cloud computing

CN121979694ACN 121979694 ACN121979694 ACN 121979694ACN-121979694-A

Abstract

The embodiment of the application discloses a big data processing method and a big data processing system based on cloud computing, wherein the method is realized based on a cloud computing platform and comprises the steps of decomposing a big data processing task into a plurality of micro-tasks and distributing the micro-tasks to distributed computing nodes, selecting adjacent nodes by each computing node based on a super-local algorithm, constructing a local task processing model by only utilizing local data and metadata of the adjacent nodes dynamically selected based on a fast game mechanism, wherein the metadata comprises micro-task data semantic features and task dependency relations of the adjacent nodes, the semantic features are extracted through a natural language processing technology, the task dependency relations are determined based on input-output association of the micro-tasks, independently executing the micro-tasks on each computing node according to the local task processing model, dynamically adjusting the micro-task distribution through a point-to-point local negotiation mechanism, and aggregating micro-task processing results of each computing node to generate a global data processing result. The method reduces the operation and communication cost.

Inventors

DENG ZHENGWAN
YAO YAO
WANG LIANG
LU JIAN
DONG YUAN

Assignees

江苏商贸职业学院

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (10)

1. A big data processing method based on cloud computing, wherein the method is implemented based on a cloud computing platform, the method comprising: Decomposing the big data processing task into a plurality of micro-tasks and distributing the micro-tasks to distributed computing nodes; Selecting adjacent nodes for each computing node based on a super local algorithm, and constructing a local task processing model by using local data and metadata of the adjacent nodes dynamically selected based on a rapid game mechanism, wherein the metadata comprises micro-task data semantic features and task dependency relations of the adjacent nodes, the semantic features are extracted through a natural language processing technology, and the task dependency relations are determined based on input-output association of micro-tasks; Independently executing the micro-tasks on each computing node according to the local task processing model, and dynamically adjusting the allocation of the micro-tasks through a point-to-point local negotiation mechanism; And aggregating the micro-task processing results of each computing node to generate a global data processing result.
2. The method of claim 1, wherein decomposing the big data processing task into a plurality of micro-tasks comprises: and decomposing the task into micro-tasks with granularity smaller than a preset threshold according to the data types and the processing requirements of the big data processing task, wherein the decomposition is based on semantic segmentation of the task.
3. The method of claim 1, wherein the selecting neighboring nodes for each computing node based on a super local algorithm comprises: Each computing node dynamically selects the adjacent nodes by calculating a semantic similarity threshold and a dependency chain length of a micro task, wherein the selection mechanism is based on a local topology optimization algorithm, and the number of the adjacent nodes is ensured not to exceed a preset upper limit.
4. The method of claim 1, wherein constructing the local task processing model using only local data and metadata of neighboring nodes dynamically selected based on the fast game mechanism comprises: Analyzing local data and the metadata by each computing node through a rapid game mechanism based on the correlation of the micro-tasks, and determining the execution priority of the local micro-tasks, wherein the rapid game mechanism optimizes task allocation among the nodes under the condition of no global coordination by calculating the semantic correlation and the dependency weight among the local micro-tasks; And constructing a self-adaptive local task processing model based on the execution priority and the adjacent nodes, wherein each node dynamically adjusts the cooperation relation with the adjacent nodes according to the real-time change of the micro task, and optimizes the local execution path of the micro task.
5. The method of claim 4, wherein the fast game mechanism employs an iterative algorithm based on local Nash equalization, wherein each computing node computes an optimal task allocation strategy based on semantic relevance and dependency weights within a limited number of iterations, and the number of iterations is limited to a preset number of thresholds.
6. The method of claim 1, wherein dynamically adjusting the allocation of micro-tasks via a point-to-point local negotiation mechanism comprises: each computing node exchanges micro-task state and processing capacity information with the adjacent nodes through asynchronous message transmission; Generating a local negotiation result through a weighted voting mechanism based on the micro-task correlation based on the micro-task state and the processing capacity information, wherein the weighted voting mechanism determines the allocation proportion of the micro-tasks according to the semantic correlation and the task priority allocation weight; And dynamically adjusting the allocation proportion of the micro-tasks based on the local negotiation result.
7. The method of claim 1, wherein aggregating the micro-task processing results for each computing node, generating global data processing results comprises: Each computing node generates intermediate result metadata of the micro-task processing, and gradually merges adjacent nodes through a tree aggregation protocol to generate a global data processing result, wherein the tree aggregation protocol reduces global communication overhead by limiting communication range.
8. A cloud computing-based big data processing system, the system being constructed based on a cloud computing platform, the system comprising: the task decomposition module is configured to decompose a big data processing task into a plurality of micro-tasks and distribute the micro-tasks to the distributed computing nodes; The system comprises a neighbor node selection module, a local task processing model and a task processing module, wherein the neighbor node selection module is configured to select neighbor nodes for each computing node based on a super-local algorithm, and construct a local task processing model by using only local data and metadata of the neighbor nodes dynamically selected based on a fast game mechanism, wherein the metadata comprises micro-task data semantic features and task dependency relations of the neighbor nodes, the semantic features are extracted through a natural language processing technology, and the task dependency relations are determined based on input-output association of micro-tasks; The task execution module is configured to independently execute the micro-tasks on each computing node according to the local task processing model, and dynamically adjust the allocation of the micro-tasks through a point-to-point local negotiation mechanism; And the result aggregation module is configured to aggregate the micro-task processing results of each computing node and generate a global data processing result.
9. An electronic device, comprising: And a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.

Description

Big data processing method and system based on cloud computing Technical Field The application relates to the technical field of cloud computing and big data processing, in particular to a big data processing method and system based on cloud computing. Background With the rapid development of cloud computing and big data technology, distributed computing systems are widely used in processing large-scale data tasks. Traditional big data processing methods, such as MapReduce or Spark based frameworks, typically rely on a centralized scheduling or global coordination mechanism to allocate tasks and optimize resource utilization. However, these approaches face challenges in high real-time or low bandwidth scenarios (e.g., edge computation, internet of things), mainly due to high communication overhead and latency caused by global coordination. Therefore, there is a need for a big data processing method based on localization decisions without global coordination, so as to reduce communication overhead and improve real-time performance and adaptability, especially in a distributed cloud computing environment. Disclosure of Invention The application provides a big data processing method and a big data processing system based on cloud computing, which reduce operation and communication expenditure. The application provides the following scheme: According to a first aspect, a big data processing method based on cloud computing is provided, the method is realized based on a cloud computing platform, the method comprises the steps of decomposing a big data processing task into a plurality of micro-tasks and distributing the micro-tasks to distributed computing nodes, selecting adjacent nodes for each computing node based on a super-local algorithm, constructing a local task processing model by only using local data and metadata of the adjacent nodes dynamically selected based on a fast game mechanism, wherein the metadata comprises micro-task data semantic features and task dependency relations of the adjacent nodes, the semantic features are extracted through a natural language processing technology, the task dependency relations are determined based on input-output association of micro-tasks, independently executing the micro-tasks on each computing node according to the local task processing model, dynamically adjusting the micro-task distribution through a point-to-point local negotiation mechanism, and aggregating micro-task processing results of each computing node to generate global data processing results. According to an implementation manner of the embodiment of the application, the decomposing of the big data processing task into the plurality of micro-tasks comprises decomposing the task into the micro-tasks with granularity smaller than a preset threshold according to the data type and the processing requirement of the big data processing task, wherein the decomposing is based on semantic segmentation of the task. According to an implementation manner of the embodiment of the application, the selecting of the adjacent nodes for each computing node based on the super local algorithm comprises the steps that each computing node dynamically selects the adjacent nodes by calculating a semantic similarity threshold and a dependency chain length of a micro task, wherein the selecting mechanism is based on a local topology optimization algorithm, and the number of the adjacent nodes is ensured not to exceed a preset upper limit. According to an implementation manner in the embodiment of the application, the construction of the local task processing model by using only local data and metadata of adjacent nodes dynamically selected based on a fast game mechanism comprises the steps that each computing node analyzes the local data and the metadata through the fast game mechanism based on the correlation of the micro-tasks to determine the execution priority of the local micro-tasks, wherein the fast game mechanism optimizes task allocation among the nodes without global coordination by calculating the semantic correlation and the dependency weight among the local micro-tasks, and constructs the self-adaptive local task processing model based on the execution priority and the adjacent nodes, wherein each node dynamically adjusts the cooperation relation with the adjacent nodes according to the real-time variation of the micro-tasks to optimize the local execution path of the micro-tasks. According to an implementation manner in the embodiment of the application, the fast game mechanism adopts an iteration algorithm based on local Nash equilibrium, wherein each computing node calculates an optimal task allocation strategy according to semantic relevance and dependency weight in a limited number of iterations, and the number of iterations is limited by a preset number threshold. According to an implementation manner in the embodiment of the application, the dynamic adjustment of the allocation of the micro-tasks throu