CN-121999314-A - Data processing method and system applied to distributed architecture

CN121999314ACN 121999314 ACN121999314 ACN 121999314ACN-121999314-A

Abstract

The invention provides a data processing method and a system applied to a distributed architecture, and relates to the technical field of data processing, wherein the method comprises the steps that a server monitors the processing states of a plurality of target annotation data in a database; the method comprises the steps of issuing first annotation data to a first node aiming at first annotation data with a processing state of waiting for post-processing, acquiring a target processing flow corresponding to the first annotation data by the first node, carrying out post-processing on the first annotation data by the first node by utilizing the target processing flow, generating a post-processing result and feeding the post-processing result back to a server. According to the embodiment, the processing state of the target annotation data is monitored through the server, and the automatic triggering of the post-processing flow is directly realized based on the processing state of the target annotation data. The first annotation data is distributed to the first nodes of the distributed architecture to carry out post-processing on the first annotation data, so that decoupling of a post-processing flow and a main flow is realized, and the expandability and the practicability of codes are improved.

Inventors

LIU XUECHENG
XIE SHAOXUAN
YAO GUOCAI
NI ZIQIANG

Assignees

北京智源人工智能研究院

Dates

Publication Date: 20260508
Application Date: 20251224

Claims (11)

1. A data processing method applied to a distributed architecture, where the distributed architecture includes a server and a plurality of first nodes, the method comprising: the server monitors the processing states of a plurality of target annotation data in the database; issuing the first labeling data to a first node aiming at the first labeling data which is waiting for post-processing in the processing state; the first node obtains a target processing flow corresponding to the first annotation data; And the first node performs post-processing on the first annotation data by using the target processing flow, generates a post-processing result and feeds back the post-processing result to the server.
2. A data processing method according to claim 1, wherein, After the first annotation data is issued to the first node, the method further comprises the step of changing the processing state of the first annotation data into post-processing; and/or the number of the groups of groups, After the post-processing result of the first annotation data is fed back to the server, the method further comprises the step that the server changes the processing state corresponding to the first annotation data according to the post-processing result.
3. The data processing method of claim 2, wherein the distributed architecture further comprises a plurality of second nodes, wherein the post-processing results comprise post-processing version codes, and wherein the method further comprises: Acquiring a target pre-process version code and a target post-processing version code corresponding to the second labeling data with the processing state of successful post-processing by using the server; And in response to the target pro-flow version code being greater than the target post-processing version code, issuing the second annotation data to the second node so that the second node carries out post-processing on the second annotation data again, wherein the second node and the first node are different nodes in the distributed architecture.
4. A data processing method according to claim 3, further comprising: and changing the processing state of the second labeling data into the processing state which needs to be reprocessed.
5. The data processing method according to claim 1, wherein the first node obtains a target process flow corresponding to the first annotation data, comprising: The first node determines a target processing flow corresponding to the first annotation data from a pre-configured mapping relation according to the machine model and/or the machine version included in the first annotation data; Wherein the mapping relationship indicates a correspondence among a machine model, the machine version, and a process flow.
6. The method of claim 4, wherein the distributed architecture further comprises a plurality of third nodes, wherein the post-processing result further indicates whether the first annotation data or the second annotation data is post-processed, and wherein the method further comprises: Responding to the post-processing result to indicate that the post-processing of the first annotation data or the second annotation data is successful, and changing the processing state corresponding to the first annotation data or the second annotation data into post-processing success; And responding to the post-processing result to indicate that the post-processing of the first annotation data or the second annotation data fails, and re-issuing the first annotation data or the second annotation data to the third node so as to enable the third node to post-process the second annotation data, wherein the third node is a node which is different from the first node and the second node in the distributed architecture.
7. The data processing method according to claim 1, further comprising, before the server monitors the processing states of the plurality of target annotation data in the database: The server side is utilized to acquire a plurality of marking data, and the method is executed: Carrying out hash matching on the original data in the database and the annotation data so as to determine intermediate annotation data corresponding to the original data from a plurality of annotation data; Performing time stamp alignment on task labels in the middle label data according to the data time of the original data to generate the target label data; and storing the target labeling data and the original data in the database correspondingly.
8. A data processing system applied to a distributed architecture is characterized by comprising a server side and a plurality of first nodes, wherein, The server is used for monitoring the processing states of a plurality of data in the database, and issuing the first labeling data to a first node aiming at the first labeling data waiting for post-processing; and the first node is used for acquiring a target processing flow corresponding to the first annotation data, performing post-processing on the first annotation data by utilizing the target processing flow, generating a post-processing result and feeding back the post-processing result to the server.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor implements the data processing method according to any of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the data processing method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program which, when executed by a processor, implements the data processing method according to any one of claims 1 to 7.

Description

Data processing method and system applied to distributed architecture Technical Field The present invention relates to the field of data processing technologies, and in particular, to a data processing method and system applied to a distributed architecture. Background In the existing data processing method, because the sources of the data are different, even if the data are the same, different output formats and control logic exist based on the difference of the acquisition equipment, so that when the data with different sources are processed, unified processing by unified post-processing logic is difficult. In order to solve this problem, the specific post-processing logic corresponding to the specific device is usually hard-coded in the main flow, that is, each main flow code is only applicable to a single data source data, and when a new device or processing logic is changed, the core code needs to be modified, so that the maintenance cost is high. In addition, since the existing data processing method is encoded in the main flow, when the version of the data in the database is updated, the reprocessing of the post-processing flow cannot be automatically triggered, so that the post-processed data is not the latest version of the data and cannot be used for subsequent machine training or business application. Therefore, a data processing method capable of performing different post-processing processes for data of different data sources and automatically triggering the post-processing processes is needed. Disclosure of Invention The invention provides a data processing method and a system applied to a distributed architecture, wherein the processing state of target marking data is monitored through a server, and the automatic triggering of a post-processing flow can be realized directly based on the processing state of the target marking data, so that the post-processing flow can be rapidly and repeatedly triggered when the target marking data is changed, and the availability of the post-processed data is ensured. In addition, the embodiment of the invention can realize the decoupling of the post-processing flow and the main flow by distributing the first labeling data to the first nodes of the distributed architecture to post-process the first labeling data, thereby not only increasing the efficiency of data processing, but also configuring different target processing flows for different first labeling data and increasing the expandability and practicability of codes. The invention provides a data processing method applied to a distributed architecture, which comprises a server side and a plurality of first nodes, wherein the server side monitors the processing state of a plurality of target marking data in a database, the first marking data is issued to the first nodes aiming at the first marking data of which the processing state is waiting for post-processing, the first nodes acquire target processing flows corresponding to the first marking data, the first nodes post-process the first marking data by utilizing the target processing flows, and post-processing results are generated and fed back to the server side. Optionally, after the first annotation data is issued to the first node, the method further comprises the step of changing the processing state of the first annotation data to be in post-processing; and/or the number of the groups of groups, After the post-processing result of the first annotation data is fed back to the server, the method further comprises the step that the server changes the processing state corresponding to the first annotation data according to the post-processing result. The distributed architecture further comprises a plurality of second nodes, the post-processing result comprises post-processing version codes, the method further comprises the steps of obtaining target pre-flow version codes and target post-processing version codes corresponding to second labeling data with processing states of successful post-processing by the server, and issuing the second labeling data to the second nodes to enable the second nodes to post-process the second labeling data again in response to the target pre-flow version codes being larger than the target post-processing version codes, wherein the second nodes and the first nodes are different nodes in the distributed architecture. Optionally, the method further comprises the step of changing the processing state of the second labeling data to be reprocessed. Optionally, the first node obtains a target processing flow corresponding to the first annotation data, and the target processing flow corresponding to the first annotation data is determined from a pre-configured mapping relation according to a machine model and/or a machine version included in the first annotation data, wherein the mapping relation indicates a corresponding relation among the machine model, the machine version and the processing flow. Optionally, the distribute