CN-122020126-A - Big data offline-real-time fusion characteristic prediction method and system based on parameter transfer, electronic equipment and storage medium
Abstract
The invention provides a big data offline-real-time fusion characteristic prediction method and system based on parameter transfer, electronic equipment and a storage medium. Belongs to the technical field of big data processing and artificial intelligence. The method comprises the steps of generating an offline analysis result by using historical data through an offline model, extracting transfer parameters, transferring the transfer parameters to a real-time model, carrying out feature expansion on the real-time data by using the transfer parameters by the real-time model, predicting the real-time data, fusing the offline prediction result and the real-time prediction result through a confidence coefficient model fusion method after the offline model finishes analysis on the data in the same period, and updating the transfer parameters for the next period of real-time model through the offline model. The invention provides a parameter transmission mechanism and a three-principle, an offline-real-time cooperative closed loop is constructed, the accuracy of real-time prediction is improved by utilizing the depth characteristic of an offline model on the premise of not reducing the real-time performance, and the problem that timeliness and accuracy are difficult to be compatible in big data analysis is solved.
Inventors
- BI RAN
- ZHANG YONG
- Bai Panpan
- Jiang Borong
Assignees
- 中国信息通信研究院
Dates
- Publication Date
- 20260512
- Application Date
- 20260212
Claims (10)
- 1. The big data offline-real-time fusion characteristic prediction method based on parameter transfer is characterized by comprising the following steps of: step S1, obtaining a time period T by an offline model 1, Performing offline feature recognition on historical data to generate a time period T 1, And extracting a transfer parameter based on the offline analysis result; Step S2, transmitting the transmission parameters from an offline model to a real-time model; step S3, acquiring real-time data of a time period T by using a real-time model, performing feature expansion on the real-time data by using the transmission parameters, constructing a real-time feature vector, and performing real-time feature prediction to obtain a real-time prediction result and a first confidence coefficient of the time period T; s4, after the time period T is finished, acquiring full data of the time period T by the offline model, and performing offline feature recognition to obtain an offline prediction result and a second confidence coefficient of the time period T; s5, fusing the real-time prediction result and the offline prediction result of the time period T by using a confidence fusion model to obtain a final prediction result; and S6, updating the transmission parameters by the offline model based on the offline prediction result of the time period T, transmitting the transmission parameters to the real-time model for processing the data of the time period T+1, and circularly executing the steps.
- 2. The method for predicting big data offline-real-time fusion characteristics based on parameter delivery according to claim 1, wherein in said step S1, the extraction of said delivery parameters follows the following principle: A weak time correlation principle, namely, making the difference between the transmission parameter generated in the time period T and the transmission parameter generated in the time period T+1 smaller than a preset threshold value, namely, P (T+1) ≡P (T); A strong feature correlation principle that the transfer parameters contain rules or thresholds which can be directly mapped to real-time data low-order features; And the real-time calculation principle is that the data structure of the transmission parameters meets the requirements of low time delay and resource occupation stability of a real-time calculation engine.
- 3. The method for predicting big data offline-real-time fusion characteristics based on parameter transmission according to claim 2, wherein the strong characteristic correlation principle specifically means that the content of the transmission parameters is an identification list, a numerical range or a status bit obtained based on offline data statistics, and the real-time model can map the transmission parameters into real-time characteristics through field matching, range filtering or simple logic operation; the real-time calculability principle specifically means that the transfer parameter adopts a fixed-length or convergent data structure, and the calculation logic does not contain a variable-length cycle needing to dynamically allocate the memory.
- 4. The method for predicting the offline-real-time fusion characteristics of big data based on parameter delivery according to claim 1, wherein in the step S3, the feature expansion of the real-time data using the delivery parameters comprises: Receiving x-dimensional transfer parameters P 1 to P x transferred by the offline model; Defining the x-dimensional transfer parameters as x-dimensional expansion features R 1 to R x of a real-time model; Extracting y-dimensional native real-time features from the real-time data; And combining the x-dimensional expansion feature with the y-dimensional native real-time feature to construct a real-time feature vector Bn with a dimension of n, wherein n=x+y.
- 5. The method for predicting the offline-real-time fusion characteristics of big data based on parameter transfer according to claim 1, wherein in the step S5, the fusing of the real-time prediction result and the offline prediction result of the time period T specifically comprises: when the verification set cannot be acquired, adopting an arithmetic average method for fusion: ; wherein, C final is the final confidence after fusion, C 1 is the first confidence, and C 2 is the second confidence; the final prediction result is determined based on the final confidence level C final .
- 6. The method for predicting the offline-real-time fusion characteristics of big data based on parameter transfer according to claim 1, wherein in step S5, the fusing of the real-time prediction result and the offline prediction result of the time period T specifically comprises: when the verification set is provided, adopting a weighted average method for fusion; Respectively calculating the accuracy ACC 1 and the recall rate RCC 1 of the real-time model, and the accuracy ACC 2 and the recall rate RCC 2 of the offline model according to the verification set; calculating a real-time model weight alpha and an offline model weight beta based on the accuracy rate and the recall rate; The specific formula is as follows: and calculating the final confidence after fusion, wherein the formula is as follows: the final prediction result is determined based on the final confidence level C final .
- 7. The method for predicting big data offline-real-time fusion characteristics based on parameter transfer according to claim 6, wherein the calculation of weights α and β is proportional to the performance index product or accuracy of the model on the verification set, and satisfies α+β=1.
- 8. A big data offline-real-time fusion feature prediction system based on parameter delivery, characterized in that the system employs the method of any of claims 1-7, the system comprising: The offline analysis module is used for carrying out depth feature mining on the historical period data, generating an offline prediction result, and extracting transfer parameters meeting weak time correlation, strong feature correlation and real-time calculability; The real-time analysis module is used for receiving the transmission parameters, carrying out feature expansion and real-time prediction by combining the real-time data flow of the current period, and outputting a real-time prediction result and a confidence coefficient; the model fusion module is used for carrying out weighted fusion on the real-time prediction result and the offline prediction result based on the confidence coefficient after the offline analysis module finishes the analysis of the current period data and outputting a final prediction result; and the parameter transmission interface module is used for periodically transmitting the transmission parameters updated by the offline analysis module to the real-time analysis module.
- 9. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps in a parameter transfer based big data offline-real time fusion feature prediction method according to any of claims 1 to 7 when executing the computer program.
- 10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of a big data offline-real-time fusion feature prediction method based on parameter delivery as claimed in any of claims 1 to 7.
Description
Big data offline-real-time fusion characteristic prediction method and system based on parameter transfer, electronic equipment and storage medium Technical Field The invention relates to the technical fields of big data processing, machine learning and data mining, in particular to a big data offline-real-time fusion characteristic prediction method, a big data offline-real-time fusion characteristic prediction system, electronic equipment and a storage medium based on parameter transmission. Background In the field of big data feature prediction, processing modes are generally classified into two types, off-line feature recognition (batch processing) and real-time feature recognition (stream processing). The offline prediction mode has abundant calculation time and complete historical original data, and can carry out deep cleaning, processing and feature extraction, so that richer feature dimensions can be extracted generally, and the model accuracy is higher, but the model has the defect of poor timeliness, and only data of T-1 day or even earlier can be processed. The real-time prediction mode is based on the data stream generated in real time for quick processing, so that the timeliness is extremely high, and the service requirements of millisecond or second level can be met. However, limited by extremely low time delay requirements, real-time processing is difficult to perform complex multi-table correlations or depth history feature mining, resulting in single feature dimensions and typically lower accuracy than offline analysis. In the prior art, offline systems and real-time systems are typically split (e.g., SPEED LAYER and Batch layers in Lambda architecture) or simply result in a splice. If parameter transmission is not performed, tuning is performed only by means of the respective inherent parameters, and great improvement of accuracy is difficult to achieve on the premise of not reducing comprehensiveness. How to inherit the high accuracy advantage of the off-line model to the real-time model and simultaneously maintain the high timeliness of the real-time model is a technical problem to be solved in the current big data field. Disclosure of Invention The invention aims to solve the technical problems and provides a big data offline-real-time fusion characteristic prediction method based on parameter transmission, which realizes the complementary advantages of offline and real-time prediction by constructing a parameter transmission mechanism and a confidence fusion model. The invention discloses a big data offline-real-time fusion characteristic prediction method based on parameter transfer, which comprises the following steps: step S1, obtaining a time period T by an offline model 1, Performing offline feature recognition on historical data to generate a time period T1, And extracting a transfer parameter based on the offline analysis result; Step S2, transmitting the transmission parameters from an offline model to a real-time model; step S3, acquiring real-time data of a time period T by using a real-time model, performing feature expansion on the real-time data by using the transmission parameters, constructing a real-time feature vector, and performing real-time feature prediction to obtain a real-time prediction result and a first confidence coefficient of the time period T; s4, after the time period T is finished, acquiring full data of the time period T by the offline model, and performing offline feature recognition to obtain an offline prediction result and a second confidence coefficient of the time period T; s5, fusing the real-time prediction result and the offline prediction result of the time period T by using a confidence fusion model to obtain a final prediction result; and S6, updating the transmission parameters by the offline model based on the offline prediction result of the time period T, transmitting the transmission parameters to the real-time model for processing the data of the time period T+1, and circularly executing the steps. The step S6 is to update the transfer parameters to eliminate the accumulated error caused by the time misalignment of the offline model and the real-time model processing data. Preferably, in said step S1, the extraction of said transfer parameters follows the following principle: A weak time correlation principle, namely, making the difference between the transmission parameter generated in the time period T and the transmission parameter generated in the time period T+1 smaller than a preset threshold value, namely, P (T+1) ≡P (T); A strong feature correlation principle that the transfer parameters contain rules or thresholds which can be directly mapped to real-time data low-order features; And the real-time calculation principle is that the data structure of the transmission parameters meets the requirements of low time delay and resource occupation stability of a real-time calculation engine. Preferably, the strong characteristic corr