US-12627606-B2 - Low complexity cellular traffic prediction
Abstract
The technology described herein is directed towards accurate low-complexity decision tree-based traffic predictor models, such as decision tree regressor models for use by base stations. Each model is rapidly retrained per base station using data relevant to the base station. To improve accuracy, statistically significant feature data is selected by performing hypothesis testing on candidate features to filter out features that cannot satisfy a statistical significance threshold (e.g., p-value). The decision tree regressor model is recursively grown based on the selected features' feature values and their traffic level labels. Predicted traffic level data is determined by traversing the trained decision tree to reach a leaf node associated with the prediction data. Resource allocation can be based on the prediction. In addition to time-series training data, spatial training data can be used. Real time traffic monitoring by a radio unit for operating in an autonomous management mode is also facilitated.
Inventors
- Jeebak Mitra
- Nour Mohamed Hussein Kamaly
Assignees
- DELL PRODUCTS L.P.
Dates
- Publication Date
- 20260512
- Application Date
- 20240506
Claims (20)
- 1 . A system, comprising: at least one processor; and at least one memory that stores executable instructions that, when executed by the at least one processor, facilitate performance of operations, the operations comprising: obtaining a time series data representation generated based on measured cellular traffic associated with a first base station; generating training data based on the time series data representation associated with the first base station and spatiotemporal data associated with a second base station, the training data comprising respective cellular traffic level labels for respective datasets of selected features of the cellular traffic and associated feature values for the selected features, wherein the time series data representation corresponds to respective first available candidate features, wherein the spatiotemporal data corresponds to respective second available candidate features, and wherein the generating of the training data comprises performing hypothesis testing on the respective first available candidate features and the respective second available candidate features to determine respective probability values as respective statistical significance values associated with the respective first available candidate features and the respective second available candidate features, and determining whether and which of the respective statistical significance values satisfy a defined threshold statistical significance value to determine whether and which of the respective first available candidate features and the respective second available candidate features are to be selected to be included as part of the training data; constructing a decision tree regressor model based on the training data, the decision tree regressor model comprising respective leaf nodes associated with respective cellular traffic level data corresponding to the respective cellular traffic level labels; obtaining an independent data point comprising an input dataset of feature values, corresponding to the selected features, for which cellular traffic level is to be predicted; and predicting cellular traffic level based on the independent data point, comprising, inputting the input dataset into the decision tree regressor model to traverse the decision tree regressor model based on the input dataset until reaching a matching leaf node of the respective leaf nodes, obtaining, from the matching leaf node, cellular traffic levels that corresponds to a predicted cellular traffic, and determining the predicted cellular traffic level based on the cellular traffic level data from the matching leaf node.
- 2 . The system of claim 1 , wherein the time series data representation corresponds to the respective first available candidate features, and wherein the operations further comprise determining the respective statistical significance values for the respective first available candidate features and the respective second available candidate features, and filtering based on the respective statistical significance values to determine the respective datasets of the selected features.
- 3 . The system of claim 2 , wherein the determining of the respective statistical significance values comprises the performing of the hypothesis testing to determine the respective probability values, as the respective statistical significance values for the respective first available candidate features and the respective second available candidate features, and wherein the filtering based on the respective statistical significance values comprises determining which of the respective first available candidate features and the respective second available candidate features satisfy the defined threshold statistical significance value.
- 4 . The system of claim 2 , wherein the respective available candidate features are based on a day of week name, a day of month, an hour of day, and a minute of hour of day obtained from the time series data representation.
- 5 . The system of claim 4 , wherein the operations further comprise one-hot encoding the day of week name into seven encoded feature values, and wherein the respective first available candidate features comprise the day of month, the hour of the day, the minute of the hour of the day, and the seven encoded feature values.
- 6 . The system of claim 1 , wherein the time series data representation corresponds to radio frequency signal statistics comprising physical resource block usage data collected for a radio unit over a defined timeframe.
- 7 . The system of claim 1 , wherein the cellular traffic level from the matching leaf node comprises multiple cellular traffic level values, and wherein the determining of the predicted cellular traffic level comprises determining a median of the multiple cellular traffic level values, or determining a mean of the multiple cellular traffic level values.
- 8 . The system of claim 1 , wherein the time series data representation corresponds to the cellular traffic of the first base station, and wherein the decision tree regressor model is constructed for the first base station.
- 9 . The system of claim 1 , wherein the time series data representation representative of the cellular traffic corresponds to first cellular traffic of the first base station and second cellular traffic of at least one neighbor base station, comprising the second base station, and wherein the decision tree regressor model is constructed for at least one of the first base station or the second base station.
- 10 . The system of claim 1 , wherein the time series data representation comprises respective time series data representation of respective cellular traffic corresponding to respective base stations, wherein the predicted cellular traffic level comprises respective predicted traffic level values for the respective base stations, wherein the operations further comprise allocating base station resources among the respective base stations based on the respective predicted traffic level values, and wherein the respective base stations comprise the first base station and the second base station.
- 11 . The system of claim 1 , wherein the constructing of the decision tree regressor model comprises recursively building respective branches of the decision tree regressor model, and wherein the recursively building of at least some of the respective branches occurs in parallel.
- 12 . The system of claim 11 , wherein the branches comprise respective branches terminating at respective leaf nodes, and wherein the operations further comprise pruning at least one branch of the respective branches.
- 13 . The system of claim 1 , wherein the operations further comprise assessing performance of the decision tree regressor model based on test data and evaluation metrics data.
- 14 . A method, comprising: obtaining, by a system comprising at least one processor, time series data representative of first cellular traffic level data associated with a first base station; based on the time series data, extracting, by the system, respective first candidate features associated with respective first feature values and respective traffic level data labels; based on spatiotemporal data representative of second cellular traffic level data associated with a second base station, extracting, by the system, respective second candidate features associated with respective second feature values; performing, by the system, hypothesis testing on the respective first candidate features and the respective second candidate features based on the respective first feature values and the respective second feature values to determine respective statistical significance values for the respective first candidate features and the respective second candidate features; based on the respective statistical significance values and a threshold statistical significance value, selecting, by the system, some of the respective first candidate features and some of the respective second candidate features as respective selected features; training, by the system, a model based on the respective selected features, the respective first feature values and the respective second feature values associated with the respective selected features, and the respective traffic level data labels associated with the respective selected features; and determining, by the system using the model, a predicted traffic level associated with the first base station based on a subsequent dataset comprising subsequent feature values of features corresponding to the respective selected features.
- 15 . The method of claim 14 , wherein the training of the model comprises training a decision tree regressor model.
- 16 . The method of claim 14 , wherein the obtaining of the time series data comprises obtaining radio frequency signal statistics representative of traffic for a radio unit of the first base station that is collected over a defined timeframe, wherein the training of the model comprises training a decision tree regressor model, and wherein the method further comprises: adapting, by the system, power usage of the radio unit based on the predicted traffic level.
- 17 . A non-transitory machine-readable medium, comprising executable instructions that, when executed by at least one processor, facilitate performance of operations, the operations comprising: performing hypothesis testing on respective candidate features, based on time-series data representative of first cellular traffic level data associated with a first base station and collected over a time range, and based on spatiotemporal data representative of second cellular traffic level data associated with a second base station, to determine, from the respective candidate features, respective selected features that satisfy a threshold statistical significance value; constructing a decision tree regressor model based on the respective selected features and respective datasets of the selected features, the respective datasets comprising respective feature values and respective traffic level data labels; and determining a predicted future traffic level data value associated with the first base station based on a result of an analysis, using the decision tree regressor model, of a subsequent dataset, comprising respective subsequent feature values, input into the decision tree regressor model.
- 18 . The non-transitory machine-readable medium of claim 17 , wherein the operations further comprise at least one of: collecting the time-series data based on physical resource usage of a radio unit of the first base station, or obtaining the time-series data based on call data records associated with the first base station.
- 19 . The non-transitory machine-readable medium of claim 17 , wherein the performing of the hypothesis testing further comprises performing the hypothesis testing on respective candidate spatiotemporal features based on correlated spatiotemporal data of neighboring base stations, comprising the second base station, to determine at least one of the respective selected features, wherein the neighboring base stations are within a same geographic coordinate area as the first base station, wherein some of the respective candidate features are the respective candidate spatiotemporal features, and wherein the correlated spatiotemporal data comprises the spatiotemporal data.
- 20 . The non-transitory machine-readable medium of claim 17 , wherein the predicted future traffic level data value is a first predicted future traffic level data, and wherein the operations further comprise determining a second predicted future traffic level data value for the second base station, and allocating resources to the first base station and the second base station based on the first predicted future traffic level data value and the second predicted future traffic level data value.
Description
BACKGROUND Wireless network traffic has increased rapidly, whereby network operators upgrade their networks based on careful planning, with a general goal of providing adequate coverage and capacity as needed. However, with increasing network traffic, the overall network becomes more complex to manage due to new technical features being built into the network and the scale of the network itself. Network management systems often rely on traffic forecasting by leveraging various network measurement data, however traffic forecasting approaches that have been attempted suffer from various drawbacks. BRIEF DESCRIPTION OF THE DRAWINGS The technology described herein is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which: FIG. 1 is a block diagram showing an example system/architecture of a cellular network comprised of base stations with associated per-base station low complexity cellular traffic predictors, in accordance with various embodiments and implementations of the subject disclosure. FIG. 2 is a block diagram/dataflow diagram showing an example pipeline for training a model based on cellular traffic statistics data, in accordance with various embodiments and implementations of the subject disclosure. FIG. 3 is a block diagram/dataflow diagram showing details of example pipeline for training a decision tree regressor model based on selected features of time series cellular traffic statistics data, in accordance with various embodiments and implementations of the subject disclosure. FIG. 4 is a block diagram/dataflow diagram showing details of inputting an independent data point of feature data values into a trained decision tree regressor model to obtain a traffic level prediction value, in accordance with various embodiments and implementations of the subject disclosure. FIG. 5 is a representation of resource allocation among respective base stations based on respective traffic prediction levels, in accordance with various embodiments and implementations of the subject disclosure. FIG. 6 is a graphical representation of example actual cellular traffic data, collected for another base station deployment over a time period (e.g., showing a fourteen day representation in FIG. 6), versus cellular traffic data predicted by a decision tree regressor model, (along with autoregressive-based prediction data), in accordance with various embodiments and implementations of the subject disclosure. FIG. 7 is a graphical representation of example actual cellular traffic data, collected for one base station deployment over a time period (e.g., showing a fourteen day representation in FIG. 7), versus cellular traffic data predicted by a decision tree regressor model, in accordance with various embodiments and implementations of the subject disclosure. FIG. 8 is a graphical representation of example actual cellular traffic data, collected for the base station deployment of FIG. 7, versus cellular traffic data predicted by a decision tree regressor model in which training data features were selected based on hypotheses testing, in accordance with various embodiments and implementations of the subject disclosure. FIG. 9 is a block diagram showing an example of a radio unit operating in an autonomous managed mode for radio unit performance optimization, in accordance with various embodiments and implementations of the subject disclosure. FIGS. 10 and 11 comprise a flow diagram showing example operations related to constructing a decision tree regressor model based on training data, and predicting cellular traffic level data via the decision tree regressor model, in accordance with various embodiments and implementations of the subject disclosure. FIG. 12 is a flow diagram showing example operations related to obtaining a predicted cellular traffic level from a model trained with feature data of features selected via hypothesis testing, in accordance with various embodiments and implementations of the subject disclosure. FIG. 13 is a flow diagram showing example operations related to obtaining a predicted future traffic level data value based on inputting a new dataset of respective feature values into a decision tree regressor model trained with feature data of statistically significant features, in accordance with various embodiments and implementations of the subject disclosure. FIG. 14 is a block diagram representing an example computing environment into which embodiments of the subject matter described herein may be incorporated. FIG. 15 depicts an example schematic block diagram of a computing environment with which the disclosed subject matter can interact/be implemented at least in part, in accordance with various embodiments and implementations of the subject disclosure. DETAILED DESCRIPTION The technology described herein is generally directed towards a low complexity decision tree-based traffic predictor model that has acc