CN-122020941-A - Data set construction method, device, computer, storage medium and program product

CN122020941ACN 122020941 ACN122020941 ACN 122020941ACN-122020941-A

Abstract

The embodiment of the application discloses a data set construction method, a device, a computer, a storage medium and a program product, wherein the method comprises the steps of splitting a first business prediction data set into a first data set to be processed and a second data set to be processed, and constructing a data uncertainty set based on the first data set to be processed; the method comprises the steps of obtaining data poles from a data uncertain set, forming pole vectors by the data poles, constructing an aggregate mapping model based on the pole vectors, obtaining data mapping values corresponding to a second data set to be processed by adopting the aggregate mapping model, obtaining aggregate size parameters from the data mapping values corresponding to the second data set to be processed, constructing a second service prediction data set based on the aggregate size parameters, and carrying out service prediction by the second service prediction data set. By adopting the application, the resource loss can be reduced, and the efficiency of constructing the data set can be improved.

Inventors

YANG PU
DUAN JUNTAO
LI YUANZHENG
ZHOU HAITAO

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20241112

Claims (15)

1. A method of data set construction, the method comprising: splitting a first business prediction data set into a first data set to be processed and a second data set to be processed, and constructing a data uncertainty set based on the first data set to be processed; acquiring data poles from the data uncertainty set, and forming pole vectors from the data poles; constructing a set mapping model based on the pole vector, and acquiring a data mapping value corresponding to the second data set to be processed by adopting the set mapping model; Acquiring an aggregate size parameter from a data mapping value corresponding to the second data set to be processed, and constructing a second service prediction data set based on the aggregate size parameter, wherein the second service prediction data set is used for service prediction.
2. The method of claim 1, wherein splitting the first traffic prediction data set into the first pending data set and the second pending data set comprises: acquiring a data splitting parameter, and carrying out logarithmic processing on the data splitting parameter to obtain a service quantity threshold; Based on the service quantity threshold and the quantity of first service data included in the first service prediction data set, splitting the first service prediction data set into a first data set to be processed and a second data set to be processed, wherein the quantity of the first service data included in the second data set to be processed is larger than or equal to the service quantity threshold.
3. The method of claim 1, wherein the constructing a data uncertainty set based on the first set of data to be processed comprises: constructing an initial uncertainty set based on the data distribution of the first data set to be processed; And rotating the center of the initial uncertainty set to an origin point to obtain a data uncertainty set, wherein the origin point is the origin point corresponding to a characteristic axis for describing data distribution of the first business prediction data set.
4. A method according to claim 3, wherein said constructing an initial uncertainty set based on a data distribution of said first set of data to be processed comprises: Acquiring a covariance matrix and a data average value of a first data set to be processed, and combining the covariance matrix and the data average value with an elliptic function to obtain an initial uncertainty set; The step of rotating the center of the initial uncertainty set to the original point to obtain a data uncertainty set comprises the following steps: performing eigenvalue decomposition on the covariance matrix to obtain an eigenvalue matrix and a data orthogonal matrix; And rotating the center of the initial uncertainty set to the original point by adopting the eigenvalue matrix and the data orthogonal matrix to obtain a data uncertainty set.
5. The method of claim 4, wherein the obtaining data poles from the data uncertainty set, grouping the data poles into pole vectors, comprises: N eigenvalues are obtained from the eigenvalue matrix, pole coordinate conversion is carried out on the N eigenvalues respectively to obtain N positive data poles and N negative data poles, N is a positive integer, N eigenvalues correspond to N eigenvectors of the N positive data poles and the N negative data poles, and each eigenvector comprises a positive data pole and a negative data pole; forming 2 N pole matrixes by the N positive data poles and the N negative data poles, wherein each pole matrix comprises N data poles, and the N data poles in each pole matrix belong to different characteristic axes; And performing vector conversion on the 2 N pole matrixes to obtain 2 N pole vectors.
6. The method of claim 4, wherein the constructing a set mapping model based on the pole vector comprises: acquiring the data orthogonal matrix and the data average value, and forming mapping parameters by the data orthogonal matrix and the data average value based on a position rotation mode of the initial uncertain set; and constructing a set mapping model based on the pole vector and the mapping parameters.
7. The method according to claim 1, wherein the obtaining the set size parameter from the data mapping value corresponding to the second to-be-processed data set includes: sorting the data mapping values corresponding to the second data set to be processed to obtain a data mapping value sequence; constructing a limit determination model based on the quantity of the first service data and the data splitting parameter included in the second data set to be processed, and analyzing the limit determination model to obtain a data position; and determining the data mapping value at the data position in the data mapping value sequence as an aggregate size parameter.
8. The method of claim 1, wherein said constructing a second traffic prediction dataset based on said aggregate size parameter comprises: obtaining pole vectors and mapping parameters from the set mapping model; And based on the set size parameter, restraining the product of the pole vector and the mapping parameter to obtain a second service prediction data set.
9. The method of claim 1, wherein the second traffic prediction data set is used to represent error values for traffic prediction for traffic scenarios; the method further comprises the steps of: carrying out service prediction on the service scene to obtain a first service prediction result; And acquiring a service prediction error value from the second service prediction data set, and determining the sum of the first service prediction result and the service prediction error value as a second service prediction result aiming at the service scene.
10. The method of claim 9, wherein the business scenario is a data center temperature control scenario, wherein the second business prediction result is used to represent a temperature external to the data relative to the data center; the method further comprises the steps of: Acquiring a data center temperature control scene, and acquiring a delay parameter according to a data center temperature value and a temperature variation at a first moment; Determining the sum of the second service prediction result and the temperature variation as a temperature parameter; And adopting the delay parameter to carry out weighting processing on the data center temperature value at the first moment and the temperature parameter to obtain a data center temperature value at the second moment, wherein the first moment is smaller than the second moment.
11. The method of claim 10, wherein the acquiring delay parameters comprises: acquiring the time slot length between the first moment and the second moment, and acquiring the heat capacity and the heat resistance of the data center; Determining temperature influence data according to the time slot length, the heat capacity and the thermal resistance; And carrying out exponential processing on the temperature influence data to obtain delay parameters.
12. A data set construction apparatus, the apparatus comprising: the data splitting module is used for splitting the first business prediction data set into a first data set to be processed and a second data set to be processed, and constructing a data uncertainty set based on the first data set to be processed; the pole processing module is used for acquiring data poles from the data uncertainty set and forming pole vectors from the data poles; the data mapping module is used for constructing a set mapping model based on the pole vector, and acquiring a data mapping value corresponding to the second data set to be processed by adopting the set mapping model; The data construction module is used for acquiring an aggregate size parameter from the data mapping value corresponding to the second data set to be processed and constructing a second service prediction data set based on the aggregate size parameter, wherein the second service prediction data set is used for carrying out service prediction.
13. A computer device, comprising a processor, a memory, and an input-output interface; the processor is connected to the memory and the input/output interface, respectively, wherein the input/output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the computer device to execute the method of any one of claims 1-11.
14. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-11.
15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-11.

Description

Data set construction method, device, computer, storage medium and program product Technical Field The present application relates to the field of computer technologies, and in particular, to a data set construction method, apparatus, computer, storage medium, and program product. Background In data center oriented energy management, data center temperatures are typically desirably controlled to a range below a predetermined value, and during this process, an indefinite amount is involved for controlling the data center temperature. Currently, a history data set is generally obtained, an ellipse uncertainty set is constructed for the history data set, and the uncertainty is constrained based on the ellipse uncertainty set. However, when the robust optimization problem is solved in this way, the ellipse uncertainty set is converted into the second order cone constraint, so that the complexity of the robust optimization problem is increased, the calculation amount of the solution is increased, and the solution efficiency is reduced. Disclosure of Invention The embodiment of the application provides a data set construction method, a device, a computer, a storage medium and a program product, which can reduce resource consumption and improve the efficiency of data set construction. In one aspect, an embodiment of the present application provides a data set construction method, where the method includes: Splitting the first business prediction data set into a first data set to be processed and a second data set to be processed, and constructing a data uncertainty set based on the first data set to be processed; Acquiring data poles from the data uncertainty set, and forming pole vectors from the data poles; Constructing a set mapping model based on the pole vector, and acquiring a data mapping value corresponding to the second data set to be processed by adopting the set mapping model; Acquiring an aggregate size parameter from a data mapping value corresponding to the second data set to be processed, and constructing a second service prediction data set based on the aggregate size parameter, wherein the second service prediction data set is used for service prediction. In one aspect, an embodiment of the present application provides a data set construction apparatus, including: The data splitting module is used for splitting the first business prediction data set into a first data set to be processed and a second data set to be processed, and constructing a data uncertainty set based on the first data set to be processed; the pole processing module is used for acquiring data poles from the data uncertainty set and forming pole vectors from the data poles; the data mapping module is used for constructing a set mapping model based on the pole vector, and acquiring a data mapping value corresponding to the second data set to be processed by adopting the set mapping model; the data construction module is used for acquiring the set size parameter from the data mapping value corresponding to the second data set to be processed, constructing a second service prediction data set based on the set size parameter, and carrying out service prediction on the second service prediction data set. When splitting the first service prediction data set into a first data set to be processed and a second data set to be processed, the data splitting module may be configured to: acquiring data splitting parameters, and carrying out logarithmic processing on the data splitting parameters to obtain a service quantity threshold; based on the service quantity threshold and the quantity of the first service data included in the first service prediction data set, splitting the first service prediction data set into a first data set to be processed and a second data set to be processed, wherein the quantity of the first service data included in the second data set to be processed is larger than or equal to the service quantity threshold. Wherein, in constructing the data uncertainty set based on the first set of data to be processed, the data splitting module may be configured to: Constructing an initial uncertainty set based on the data distribution of the first data set to be processed; and rotating the center of the initial uncertainty set to an origin point to obtain a data uncertainty set, wherein the origin point is the origin point corresponding to a characteristic axis for describing data distribution of the first service prediction data set. Wherein, in constructing the initial uncertainty set based on the data distribution of the first set of data to be processed, the data splitting module may be configured to: Acquiring a covariance matrix and a data average value of a first data set to be processed, and combining the covariance matrix and the data average value with an elliptic function to obtain an initial uncertainty set; Upon rotating the center of the initial uncertainty set to the origin, resulting in a data uncertainty set,