CN-121349710-B - Data processing method, distributed system, electronic device and storage medium

CN121349710BCN 121349710 BCN121349710 BCN 121349710BCN-121349710-B

Abstract

Embodiments of the present disclosure provide a data processing method, a distributed system, an electronic device, and a storage medium. The method is applied to a distributed system comprising a master node and a plurality of slave nodes, and comprises the steps that the master node determines task information distributed to each slave node according to the level attribute of a target model and the calculation power states of the plurality of slave nodes, wherein the task information is used for designating a model layer processed by the corresponding slave node, and each slave node executes weight quantization processing of the corresponding model layer based on the distributed task information so as to obtain the complete quantization weight of the target model. The method realizes the full-link collaborative optimization from calculation and memory to storage, remarkably improves the execution efficiency of the weight quantification task and the utilization rate of hardware resources, and provides technical support for efficient deployment of large-scale models.

Inventors

Request for anonymity
Request for anonymity
Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251216

Claims (19)

1. A data processing method, wherein the data processing method is applied to a distributed system, the distributed system including a master node and a plurality of slave nodes, the data processing method comprising: The master node determines task information allocated to each slave node according to the hierarchical attribute of the target model and the calculation power states of the plurality of slave nodes, wherein the task information is used for designating a model layer processed by the corresponding slave node; Each slave node performs a weight quantization process of a corresponding model layer based on the assigned task information to obtain a complete quantization weight of the target model, Wherein, the data processing method further comprises: The master node acquires output file description information generated by each slave node and converts the output file description information into an index file, wherein the output file description information is used for recording model layer range information and storage position information of a corresponding output file, and the index file is used for recording the storage position of quantization weight of each model layer in the target model.
2. The data processing method according to claim 1, wherein the hierarchy attribute includes hierarchy information, the master node determining task information assigned to each slave node based on the hierarchy attribute of the object model and the computational power states of the plurality of slave nodes, comprising: And the master node determines a model layer processed by each slave node according to the hierarchical structure information of the target model and the calculation force states of the plurality of slave nodes.
3. The data processing method according to claim 2, wherein the master node determines a model layer processed by each slave node based on the hierarchical structure information of the target model and the computational power states of the plurality of slave nodes, comprising: The master node performs the following operations: Determining a first layer number processed by each slave node according to the hierarchical structure information of the target model; sequencing the plurality of slave nodes according to the calculation force state to obtain a first sequencing result, and determining a second layer number processed by each slave node according to the first sequencing result; and determining a model layer processed by each slave node according to the first layer number and the second layer number corresponding to each slave node.
4. The data processing method according to claim 1, wherein the hierarchy attribute includes an intra-layer calculation amount, the master node determining task information allocated to each of the slave nodes based on the hierarchy attribute of the object model and the calculation force states of the plurality of slave nodes, comprising: The master node determines a model layer processed by each slave node according to the intra-layer calculated amount of each model layer of the target model and the calculated force states of the plurality of slave nodes.
5. The data processing method according to claim 4, wherein the computational power states include computational power and load states, the master node determining a model layer processed by each slave node based on an intra-layer computation amount of each model layer of the target model and the computational power states of the plurality of slave nodes, comprising: The master node performs the following operations: sequencing all model layers of the target model according to the in-layer calculated quantity to obtain a second sequencing result; assigning node capability weights to the plurality of slave nodes according to the computing capabilities; and determining a model layer processed by each slave node according to the second sequencing result, the load state of each slave node and the node capacity weight.
6. The data processing method according to claim 1, characterized in that the data processing method further comprises: Each slave node reads the first weight corresponding to the processed model layer from the storage space in a memory mapping mode.
7. The data processing method according to claim 1, wherein the performing the weight quantization processing of the corresponding model layer includes: And executing quantization operation on the first weight corresponding to the model layer to reduce the precision of the first weight and obtain the quantization weight corresponding to the model layer.
8. The data processing method according to claim 1, wherein the performing the weight quantization processing of the corresponding model layer includes: Performing mathematical transformation operation on the first weight corresponding to the model layer to obtain a second weight; And executing quantization operation on the second weight to reduce the precision of the second weight, and obtaining the quantization weight corresponding to the model layer.
9. The data processing method according to claim 8, wherein the performing the weight quantization processing of the corresponding model layer further comprises: And before the mathematical transformation operation is executed, executing inverse quantization operation on the first weight corresponding to the model layer so as to improve the precision of the first weight.
10. A data processing method according to claim 8 or 9, wherein the mathematical transformation operation comprises a linear transformation operation or a rotational transformation operation.
11. The data processing method according to claim 1, characterized in that the data processing method further comprises: The plurality of slave nodes each perform the following operations: converting the quantized weights obtained through the weight quantization processing into a target storage format to obtain target quantized weights; And writing the target quantization weight into a storage space.
12. The method of claim 11, wherein writing the target quantization weights into a memory space comprises: rewriting the target quantization weight into an output file, generating output file description information corresponding to the output file, The output files corresponding to the plurality of slave nodes are different from each other.
13. The data processing method according to claim 1, characterized in that the data processing method further comprises: And the master node manages the task execution states of the plurality of slave nodes.
14. The data processing method according to claim 1, wherein the master node manages task execution states of the plurality of slave nodes, including at least one of: The master node generates global sharing parameters and transmits the global sharing parameters to the plurality of slave nodes so that the plurality of slave nodes execute the weight quantization processing; The master node coordinates task execution progress of the plurality of slave nodes based on synchronization barrier or data dependency analysis; and the master node acquires the integrity check information generated by the plurality of slave nodes after the task execution is finished, and verifies the global task execution state based on the integrity check information.
15. The data processing method according to claim 1, characterized in that the data processing method further comprises: the master node monitors the task performance of each slave node, and in response to detecting the existence of an abnormal slave node, reassigns all or part of the tasks of the abnormal slave node to one or more other slave nodes except the abnormal slave node.
16. The data processing method of claim 15, wherein in response to detecting the presence of an anomalous slave node, reassigning all or a portion of the anomalous slave node's tasks to one or more other slave nodes other than the anomalous slave node, comprising: In response to detecting that a fault slave node exists, the master node records a task break point of the fault slave node, and redistributes the task of the fault slave node to a first slave node, so that the first slave node continues task execution from the task break point.
17. A distributed system, characterized in that the distributed system comprises a master node and a plurality of slave nodes, The master node is configured to determine task information allocated to each slave node according to the hierarchical attribute of the target model and the computational power states of the plurality of slave nodes, wherein the task information is used for specifying a model layer processed by the corresponding slave node; Each slave node is configured to perform weight quantization processing of a corresponding model layer based on the assigned task information to obtain a complete quantization weight of the target model; The master node is further configured to acquire output file description information generated by each slave node and convert the output file description information into an index file, wherein the output file description information is used for recording model layer range information and storage position information of a corresponding output file, and the index file is used for recording a storage position of quantization weight of each model layer in the target model.
18. An electronic device, the electronic device comprising: at least one processor; at least one memory including one or more computer program modules; wherein the one or more computer program modules are stored in the at least one memory and configured to be executed by the at least one processor, the one or more computer program modules being for implementing the data processing method of any of claims 1-16.
19. A non-transitory computer readable storage medium having computer readable instructions stored thereon, wherein the computer readable instructions when executed by at least one processor perform the data processing method of any of claims 1-16.

Description

Data processing method, distributed system, electronic device and storage medium Technical Field Embodiments of the present disclosure relate to the field of artificial intelligence, and in particular, to a data processing method, a distributed system, an electronic device, and a storage medium. Background Weight quantization (Weight Quantization) refers to the process of converting model weights obtained by model training into a low-precision format (e.g., 8-bit floating point number FP8, 8-bit integer INT8, 4-bit integer INT4, etc.). The method aims to greatly reduce the storage volume, memory occupation and calculation cost of the model on the premise of not losing the performance of the model as much as possible, thereby improving the reasoning efficiency of the follow-up model and supporting the deployment on equipment with limited resources. With the continuous expansion of large-scale model parameter scale, how to realize efficient and stable weight quantization has become one of the currently important technical issues. Disclosure of Invention The data processing method comprises the steps that the master node determines task information distributed to each slave node according to the level attribute of a target model and the calculation power states of the slave nodes, wherein the task information is used for designating a model layer processed by the corresponding slave node, and each slave node performs weight quantization processing of the corresponding model layer based on the distributed task information so as to obtain the complete quantization weight of the target model. In the data processing method provided in at least one embodiment of the present disclosure, the hierarchy attribute includes hierarchy information, the master node determines task information allocated to each slave node according to the hierarchy attribute of the target model and the computing power states of the plurality of slave nodes, and the master node determines a model layer processed by each slave node according to the hierarchy information of the target model and the computing power states of the plurality of slave nodes. In the data processing method provided by at least one embodiment of the present disclosure, the master node determines a model layer processed by each slave node according to the hierarchical structure information of the target model and the computing power states of the plurality of slave nodes, and the master node performs the following operations, namely determining a first layer number processed by each slave node according to the hierarchical structure information of the target model, sorting the plurality of slave nodes according to the computing power states to obtain a first sorting result, determining a second layer number processed by each slave node according to the first sorting result, and determining the model layer processed by each slave node according to the first layer number and the second layer number corresponding to each slave node. In the data processing method provided in at least one embodiment of the present disclosure, the hierarchical attribute includes an intra-layer calculation amount, the master node determines task information allocated to each slave node according to the hierarchical attribute of the target model and the calculation force states of the plurality of slave nodes, and the master node determines a model layer processed by each slave node according to the intra-layer calculation amount of each model layer of the target model and the calculation force states of the plurality of slave nodes. In the data processing method provided by at least one embodiment of the present disclosure, the computing power state includes computing power and load states, and the master node determines a model layer processed by each slave node according to an intra-layer computing amount of each model layer of the target model and the computing power states of the plurality of slave nodes, where the master node performs the following operations of ordering all model layers of the target model according to the intra-layer computing amount to obtain a second ordering result, distributing node capacity weights to the plurality of slave nodes according to the computing power, and determining the model layer processed by each slave node according to the second ordering result, the load state of each slave node, and the node capacity weights. In the data processing method provided in at least one embodiment of the present disclosure, the data processing method further includes that each slave node reads a first weight corresponding to the processed model layer from the storage space through a memory mapping manner. In the data processing method provided in at least one embodiment of the present disclosure, the performing the quantization processing of the weights corresponding to the model layers includes performing quantization operation on the first weights corresponding to the