CN-117151242-B - Sample acquisition method, model construction method, device, equipment and storage medium

CN117151242BCN 117151242 BCN117151242 BCN 117151242BCN-117151242-B

Abstract

The application provides a sample acquisition method, a model construction method, a device, equipment and a storage medium. The method comprises the steps of obtaining first party data, wherein the first parameter party data comprises a first sample and a first characteristic parameter corresponding to the first sample, carrying out hash processing on the first characteristic parameter corresponding to the first sample to obtain a first characteristic parameter hash value, carrying out mask processing on the first characteristic parameter hash value to obtain a first mask hash value, carrying out exclusive or processing on the first mask hash value and a second mask hash value to obtain a target mask hash value, carrying out mask processing on the first characteristic parameter hash value according to the target mask hash value to obtain a first target mask hash value, and determining target first party data from the first party data according to the first target mask hash value and the second target mask hash value. According to the method, the mask is obtained through hashing, so that the privacy of model training data is guaranteed, and the model training accuracy is improved.

Inventors

ZHANG GUOZHENG
XIE JIGANG
LIANG ZHUO
DANG PENGFEI
YANG TAO
ZHANG XIAOPING

Assignees

中国联合网络通信集团有限公司
联通数字科技有限公司
联通西部创新研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20230809

Claims (14)

1. A sample acquisition method for electrical load prediction, applied to a first party in a federal model, the method comprising: Acquiring first participant data, wherein the first participant data comprises a first sample and first characteristic parameters corresponding to the first sample, the first sample is different rows of charge data distinguished by an ID (identity), and one power load number corresponding to each row, the first characteristic parameters refer to the charge number generated by a certain power generation mode, the first characteristic parameters specifically refer to the charge number generated by wind power and the charge number generated by hydroelectric power, and at least two first characteristic parameters are provided; carrying out hash processing on a first characteristic parameter corresponding to the first sample to obtain a first characteristic parameter hash value; Masking the first characteristic parameter hash value to obtain a first masking hash value; Performing exclusive or processing on the first mask hash value and a second mask hash value to obtain a target mask hash value, wherein the second mask hash value is obtained by sequentially performing hash processing and mask processing on second party data by a second party, the second party is a party different from the first party in the federation model, and the second party data comprises a second power load sample and a corresponding second power characteristic parameter; performing mask processing on the first characteristic parameter hash value according to the target mask hash value to obtain a first target mask hash value; And determining target first party data from the first party data according to the first target mask hash value and a second target mask hash value, wherein the second target mask hash value is a mask hash value obtained by masking a second characteristic parameter hash value according to the target mask hash value by the second party.
2. The method according to claim 1, wherein masking the first characteristic parameter hash value to obtain a first masked hash value comprises: Determining a preset AES encryption function; and carrying out mask processing on the hash value of the first characteristic parameter according to the AES encryption function to obtain a first mask parameter.
3. The method of claim 2, wherein the determining target first party data from the first party data based on the first target mask hash value and the second target mask hash value comprises: determining the same target mask hash value according to the first target mask hash value and the second target mask hash value; determining a target identical sample according to the identical target mask hash value; Determining a target first sample from the first samples according to the target same samples, and determining target first characteristic parameters corresponding to the target first samples, wherein the target first samples are different from the target same samples; And determining the target first participant data according to the target first sample and the target first characteristic parameter.
4. The method of claim 1, wherein after determining target first party data from the first party data based on the first target mask hash value and a second target mask hash value, the method further comprises: Determining a first value range of the target first characteristic parameter according to the target first characteristic parameter; The first value range is sent to a model training party, so that the model training party determines a target value range after receiving the first value range and a second value range, and obtains feature box data according to the target value range, wherein the second value range is a value range of a target second feature parameter sent by the second party, the target second feature parameter is obtained according to a second sample and a target same sample, and the second sample is a sample in the second party data; Receiving the characteristic box division data; obtaining first gradient information according to the characteristic box division data and the target first participant data; And sending the first gradient information to the model training party so that the model training party carries out federal model training according to the first gradient information and second gradient information, wherein the second gradient information is gradient information obtained by the second party according to the characteristic box data and target second party data, and the target second party data is obtained by determining target second party data from the second party data according to a first target mask hash value and a second target mask hash value.
5. The method of claim 4, wherein the obtaining the first gradient information from the feature binning data and the target first participant data comprises: Determining an objective function of a gradient lifting tree and a first derivative and a second derivative of the objective function according to the first target participant data, wherein the objective function of the gradient lifting tree is obtained according to a sample tag value, and the sample tag value is a tag corresponding to the first sample; And obtaining the first gradient information according to the first derivative and the second derivative of the objective function and the characteristic binning data.
6. A model building method, applied to a model training party, comprising: Receiving a first value interval of a first target characteristic parameter sent by a first target participant and a second value interval of a second target characteristic parameter sent by a second target participant, wherein the first target participant and the second target participant are the first participant and the second participant in a sample acquisition method for power load prediction according to claims 1-5, and the first target characteristic parameter and the second target characteristic parameter are target first participant data obtained in the sample acquisition method for power load prediction according to claims 1-5; Determining a target data interval of the target characteristic parameters according to the first value interval of the first target characteristic parameters and the second value interval of the second target characteristic parameters; performing characteristic box division processing on the target data interval to obtain box division gradient interval data; The data of the box division gradient interval is sent to the first target participant and the second target participant, so that the first target participant obtains first gradient information according to the data of the box division gradient interval and first sample data, the first gradient information is sent to a model training party, the second target participant obtains second gradient information according to the data of the box division gradient and second sample data, and the second gradient information is sent to the model training party; receiving the first gradient information and the second gradient information, and constructing a gradient lifting tree according to the first gradient information and the second gradient information; and constructing a target model according to the gradient lifting tree.
7. The method of claim 6, wherein determining the target data interval for the target feature parameter based on the first value interval for the first target feature parameter and the second value interval for the second target feature parameter comprises: Determining a first maximum value and a first minimum value of the first value interval; determining a second maximum value and a second minimum value of the second value interval; Confirming a target maximum value according to the first maximum value and the second maximum value; confirming a target minimum value according to the first minimum value and the second minimum value; And obtaining a target data interval of the target characteristic parameter according to the target maximum value and the target minimum value.
8. The method of claim 6, wherein the receiving the first gradient information and the second gradient information and constructing the gradient lift tree based on the first gradient information and the second gradient information comprises: receiving the first gradient information and the second gradient information; obtaining total gradient information of the first target participant and the second target participant according to the first gradient information and the second gradient information; according to the total gradient information, determining gain information of each target characteristic parameter; and constructing the gradient lifting tree according to the gain information of each target characteristic parameter.
9. The method of claim 8, wherein constructing the gradient-boost tree based on gain information for each of the target feature parameters comprises: Determining target gain values in gain information of the target characteristic parameters; determining split nodes of the gradient lifting tree to be constructed according to target gain values in the gain information of each target characteristic parameter; and constructing the gradient lifting tree according to the split nodes of the gradient lifting tree to be constructed.
10. The method of claim 6, wherein constructing a target model from the gradient-lifted tree comprises: obtaining a current round of predicted values according to the gradient lifting tree; Updating the sample label value according to the current round of predicted value to obtain an updated sample label value; acquiring iteration times and a previous-round predicted value, wherein the previous-round predicted value is obtained according to the gradient lifting tree of the previous iteration; obtaining model loss according to the upper-round predicted value and the lower-round predicted value; According to the iteration times and the model loss, determining whether an initial target model converges or not; If the initial target model converges, determining the initial target model as a target model, and sending the target model to the first participant and the second participant for short-term charge prediction by the first participant and the second participant; and if the target model is not converged, transmitting the updated sample tag value to the first participant and the second participant to update the target function of the gradient lifting tree and the first derivative and the second derivative of the target function.
11. A sample acquisition device for electrical load prediction, the device comprising: the device comprises an acquisition module, a first generation module and a data processing module, wherein the acquisition module is used for acquiring first participant data, the first participant data comprises a first sample and first characteristic parameters corresponding to the first sample, the first sample is different rows of charge data distinguished by an ID (identity), and one power load number corresponds to each row, the first characteristic parameters refer to the number of charges generated by a certain generation mode, the first characteristic parameters specifically refer to the number of charges generated by wind power generation and the number of charges generated by water power generation, and at least two first characteristic parameters are provided; The first obtaining module is used for carrying out hash processing on the first characteristic parameters corresponding to the first sample to obtain first characteristic parameter hash values; The second obtaining module is used for carrying out mask processing on the first characteristic parameter hash value to obtain a first mask hash value; The third obtaining module is configured to perform exclusive-or processing on the first mask hash value and a second mask hash value to obtain a target mask hash value, where the second mask hash value is obtained by sequentially performing hash processing and mask processing on second party data by a second party, and the second party data includes a second power load sample and a corresponding second power characteristic parameter; A fourth obtaining module, configured to perform mask processing on the first characteristic parameter hash value according to the target mask hash value, to obtain a first target mask hash value; The first determining module is configured to determine target first party data from the first party data according to the first target mask hash value and a second target mask hash value, where the second target mask hash value is a mask hash value obtained by masking a second characteristic parameter hash value according to the target mask hash value by the second party.
12. A model building apparatus, characterized in that the apparatus comprises: A first receiving module, configured to receive a first value interval of a first target feature parameter sent by a first target participant and a second value interval of a second target feature parameter sent by a second target participant, where the first target participant and the second target participant are a first participant and a second participant in a sample acquisition method for power load prediction as set forth in claims 1-5, and the first target feature parameter and the second target feature parameter are target first participant data obtained in a sample acquisition method for power load prediction as set forth in claims 1-5; the second determining module is used for determining a target data interval of the target characteristic parameters according to the first value interval of the first target characteristic parameters and the second value interval of the second target characteristic parameters; a fifth obtaining module, configured to perform feature binning processing on the target data interval to obtain binning gradient interval data; The sending module is used for sending the box division gradient interval data to the first target participant and the second target participant, so that the first target participant obtains first gradient information according to the box division gradient interval data and first sample data, sends the first gradient information to a model training party, and enables the second target participant to obtain second gradient information according to the box division gradient data and second sample data, and sends the second gradient information to the model training party; the first construction module is used for receiving the first gradient information and the second gradient information and constructing a gradient lifting tree according to the first gradient information and the second gradient information; and the second construction module is used for constructing a target model according to the gradient lifting tree.
13. An electronic device comprising a processor and a memory communicatively coupled to the processor; The memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 10.
14. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 10.

Description

Sample acquisition method, model construction method, device, equipment and storage medium Technical Field The present application relates to the field of computer natural language processing technologies, and in particular, to a sample acquisition method, a model construction method, a device, equipment, and a storage medium. Background In the power load prediction, data are sourced from a plurality of data providers such as power companies, power grid dispatching centers, weather authorities and the like. The current load prediction method is divided into two main categories, namely traditional classical prediction and artificial intelligent prediction. Both methods aim at finding out the rule hidden by the historical data of the electric load, constructing a prediction model, and predicting the future load by using the model. The traditional classical prediction comprises a time sequence method for constructing a model of load fluctuation along with time by utilizing historical load data and determining a load formula to predict, a regression analysis method for constructing a regression equation by utilizing the relation between dependent variables and independent variables, a gray model method for regarding all random processes as fluctuation and changing random quantities into regular data, a trend extrapolation method for finding out a function to predict load fluctuation trend by utilizing the load fluctuation rule, an artificial neural network method for carrying out message transmission by utilizing the relation between neuron structures and constructing or learning different neural networks, an artificial intelligent prediction method comprising a support vector machine for finding out a plane for solving the two classification problems and carrying out global optimization on the basis of statistics, a wavelet transformation algorithm for changing sampling intervals by utilizing signal frequencies and combining the sampling interval with the neural network, a fuzzy theory method for expressing uncertain factors by utilizing functions according to experience of researchers and then converting the uncertain factors into a computer-operable method, a decision tree theory which is a method for carrying out message transmission by utilizing the relation between neuron structures and constructing or learning different neural networks, a support vector machine for finding out a decision tree by utilizing the characteristics of a plurality of classification tree which is integrated by utilizing the characteristics of the random tree classification algorithm. However, the conventional power load prediction method has a problem that data security is not good when training is performed. Disclosure of Invention The application provides a sample acquisition method, a model construction device, equipment and a storage medium, which are used for solving the problem that the existing power load prediction method is poor in data security when training is performed. In a first aspect, the present application provides a sample acquisition method comprising: Acquiring first participant data, wherein the first parameter data comprises a first sample and first characteristic parameters corresponding to the first sample, and at least two types of the first characteristic parameters are provided; carrying out hash processing on a first characteristic parameter corresponding to the first sample to obtain a first characteristic parameter hash value; Masking the first characteristic parameter hash value to obtain a first masking hash value; Performing exclusive or processing on the first mask hash value and the second mask hash value to obtain a target mask hash value, wherein the second mask hash value is obtained by sequentially performing hash processing and mask processing on second party data by a second participation method, and the second party is a party different from the first party in the federation model; Masking the first characteristic parameter hash value according to the target mask hash value to obtain a first target mask hash value; And determining target first party data from the first party data according to the first target mask hash value and a second target mask hash value, wherein the second target mask hash value is a mask hash value obtained by masking the second characteristic parameter hash value by the second party according to the target mask hash value. In the embodiment of the application, masking the hash value of the first characteristic parameter to obtain a first masking hash value includes: Determining a preset AES encryption function; and carrying out mask processing on the first hash characteristic parameters according to the AES encryption function to obtain first mask parameters. In an embodiment of the present application, determining target first participant data from the first participant data according to the first target mask hash value and the second target mask hash value inc