US-20260128959-A1 - ML-BASED SCHEDULING WITH PER-SIGNAL PATH INFERENCE IN A WIRELESS COMMUNICATION SYSTEM

US20260128959A1US 20260128959 A1US20260128959 A1US 20260128959A1US-20260128959-A1

Abstract

The present subject matter relates to a method comprising: determining, for a specific time unit, an available set of frequency resource units, for each signal path of multiple signal paths, determine at least one feature vector descriptive of a set of devices, the set of frequency resource units and the signal path, perform per feature vector of the at least one feature vector an inference pass of a reinforcement machine learning agent model to obtain an individual output using the feature vector as input, and use the at least one individual output to determine an individual scheduling configuration of the set of frequency resource units to the set of devices; using the individual scheduling configurations to determine a scheduling decision to assign the set of frequency resource units to the set of devices, such that a frequency resource unit can be assigned to one or more signal paths.

Inventors

Kalle Petteri Kela
Bryan Liu
Alvaro Valcarce Rial

Assignees

NOKIA SOLUTIONS AND NETWORKS OY

Dates

Publication Date: 20260507
Application Date: 20251021
Priority Date: 20241107

Claims (18)

1 . An apparatus for a wireless communication system, the apparatus comprising at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus to: determine, for a specific time unit, an available set of frequency resource units for scheduling a set of devices of the wireless communication system using multiple signal paths for data transmission; for each signal path of the signal paths, determine at least one feature vector descriptive of the set of devices, the set of frequency resource units and the signal path, perform per feature vector of the at least one feature vector an inference pass of a machine learning model to obtain an individual output using the feature vector as input, and use the at least one individual output to determine an individual scheduling configuration of the set of frequency resource units to the set of devices, wherein determine the individual scheduling configuration for a current inference pass is by further using the one or more individual scheduling configurations of one or more previous inference passes respectively, and wherein determine the individual scheduling configuration is such that it excludes previous assignments of devices to frequency resource units in the one or more previous inference passes; use the individual scheduling configurations to determine a scheduling decision to assign the set of frequency resource units to the set of devices, such that a frequency resource unit can be assigned to one or more signal paths.
2 . The apparatus of claim 1 , wherein the machine learning model comprises a first model and a second model, the first model being configured to perform the one or more inference passes corresponding to a first signal path of the signal paths and the second model is configured to perform the remaining inference passes.
3 . The apparatus of claim 1 , wherein the one or more inference passes corresponding to the first signal path are performed without user pairing, wherein the remaining inference passes are performed with user pairing.
4 . The apparatus of claim 1 , wherein the machine learning model comprises a neural network, the neural network comprises an input layer comprising an input block per device of the set of devices, wherein execution of the instructions further causes the apparatus to determine an input vector per device of the set of devices, the input vector being descriptive of the device and the set of frequency resource units, wherein execution of the instructions further causes the apparatus to determine the feature vector by including in a specific arrangement the input vectors within the feature vector, wherein the specific arrangement is descriptive of the signal path.
5 . The apparatus of claim 4 , the specific arrangement of the input vectors being a random arrangement of the input vectors.
6 . The apparatus of claim 4 , wherein execution of the instructions further causes the apparatus to sort the set of devices in accordance with a device metric, wherein the specific arrangement of the input vectors is provided according to the sorting.
7 . The apparatus of claim 6 , the device metric being at least one of: quality of service (QoS), proportional fair metric or lowest spatial cross correlation of the device.
8 . The apparatus of claim 1 , wherein execution of the instructions further causes the apparatus to determine the scheduling decision in accordance with a multi-user multiple input, multiple output (MU-MIMO) technique, wherein the signal paths represent the MU-MIMO user layers respectively.
9 . The apparatus of claim 1 , wherein the machine learning model comprises a neural network, the neural network comprising an output layer whose dimension is equal to the number of the set of frequency resource units multiplied by the number of the set of devices plus one.
10 . The apparatus of claim 1 , wherein the signal paths represent respectively data streams for transmission to the set of devices.
11 . The apparatus of claim 1 , the feature vector comprising values of features, the features comprising for each device of the set of devices a device related feature, the features further comprising per device and per frequency resource unit a channel related feature, wherein the device related feature comprises at least one of: a buffer status of the device or a past throughput of the device or a user pairing enabling metric, wherein the channel related feature of a device and a frequency resource unit comprises at least: a channel quality indicator (CQI) of a frequency channel to the device which is defined by the frequency resource unit.
12 . A method comprising: determining, for a specific time unit, an available set of frequency resource units for scheduling a set of devices of a wireless communication system using multiple signal paths for data transmission; for each signal path of the signal paths, determining at least one feature vector descriptive of the set of devices, the set of frequency resource units and the signal path, performing per feature vector of the at least one feature vector an inference pass of a machine learning model to obtain an individual output using the feature vector as input, and using the at least one individual output to determine an individual scheduling configuration of the set of frequency resource units to the set of devices, wherein determining the individual scheduling configuration for a current inference pass is by further using the one or more individual scheduling configurations of one or more previous inference passes respectively, and wherein determining the individual scheduling configuration is such that it excludes previous assignments of devices to frequency resource units in the one or more previous inference passes; using the individual scheduling configurations to determine a scheduling decision to assign the set of frequency resource units to the set of devices, such that a frequency resource unit can be assigned to one or more signal paths.
13 . A computer program product comprising processor executable instructions for causing an apparatus for performing the method of claim 12 .
14 . The apparatus of claim 1 , wherein the apparatus is for training a reinforcement learning agent in accordance with a reinforcement learning algorithm using as environment a wireless communication system, and wherein execution of the instructions further causes the apparatus in each training step to: determine for the training step a time unit, a set of devices, a set of frequency resource units, a signal path of signal paths for transmission in the time unit; use by the reinforcement learning agent a state defined by a feature vector descriptive of the set of devices, the set of frequency resource units and the signal path to perform an action comprising individual actions, wherein each individual action is defined by an assignment of a respective frequency resource unit of the set of frequency units, and determine a reward which is a combination of individual rewards of the individual actions respectively for adapting the reinforcement learning agent based on the reward.
15 . The apparatus of claim 14 , the reinforcement learning agent comprising a neural network, the reinforcement learning algorithm being a soft actor critic (SAC) algorithm, wherein critic and target networks involved in the SAC algorithm are distributional networks, wherein each network of the critic and target networks is configured to output for an input action and state pair a quantile distribution, wherein the reinforcement learning agent is adapted in each training step using a policy loss function, wherein the policy loss function is defined using for each action-state pair a coefficient-wise minimum of a first Q-value vector having an element per individual action and a second Q-value vector having an element per individual action, wherein each element of the first Q-value vector is a combination of quantiles which are obtained for the associated individual action and the state by one of the critic networks, wherein each element of the second Q-value vector is a combination of quantiles which are obtained for the associated individual action and the state by the other critic network.
16 . The apparatus of claim 14 , wherein the individual reward of a given individual action on an associated specific frequency resource unit is a combination of a first value and a second value of a performance metric, the first value being evaluated for the given individual action and the second value being evaluated for a previous individual action on the specific frequency unit.
17 . The method of claim 12 further comprising training a reinforcement learning agent in accordance with a reinforcement learning algorithm using as environment a wireless communication system, the method comprising in each training step: determining for the training step a time unit, a set of devices, a set of frequency resource units, a signal path of signal paths for transmission in the time unit; using by the reinforcement learning agent a state defined by a feature vector descriptive of the set of devices, the set of frequency resource units and the signal path to perform an action comprising individual actions, wherein each individual action is defined by an assignment of a respective frequency resource unit of the set of frequency units; and determining a reward which is a combination of individual rewards of the individual actions respectively for adapting the reinforcement learning agent based on the reward.
18 . A computer program product comprising processor executable instructions for causing an apparatus for performing the method of claim 17 .

Description

TECHNICAL FIELD Various example embodiments relate to telecommunication systems, and more particularly to an apparatus for machine learning (ML) based scheduling in a wireless communication system. BACKGROUND In modern wireless communication systems, the demand for higher data rates, efficient resource utilization, and the ability to serve a growing number of devices simultaneously is ever-increasing. As networks become more complex, dynamic resource management becomes critical for ensuring that available spectrum is used efficiently, while maintaining high-quality service for all users. Hence, advanced techniques that account for real-time network conditions, such as varying interference levels, user mobility, and fluctuating traffic loads, may be essential for optimizing performance. SUMMARY Example embodiments provide an apparatus (first apparatus) for a wireless communication system, the first apparatus comprising at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the first apparatus to: determine, for a specific time unit, an available set of frequency resource units for scheduling a set of devices of the wireless communication system using multiple signal paths for data transmission; for each signal path of the signal paths, determine at least one feature vector descriptive of the set of devices, the set of frequency resource units and the signal path, perform per feature vector of the at least one feature vector an inference pass of a machine learning model to obtain an individual output using the feature vector as input, and use the at least one individual output to determine an individual scheduling configuration of the set of frequency resource units to the set of devices; use the individual scheduling configurations to determine a scheduling decision to assign the set of frequency resource units to the set of devices, such that a frequency resource unit can be assigned to one or more signal paths. Example embodiments provide a method (first method) comprising determining, for a specific time unit, an available set of frequency resource units for scheduling a set of devices of a wireless communication system using multiple signal paths for data transmission; for each signal path of the signal paths, determine at least one feature vector descriptive of the set of devices, the set of frequency resource units and the signal path, perform per feature vector of the at least one feature vector an inference pass of a machine learning model to obtain an individual output using the feature vector as input, and use the at least one individual output to determine an individual scheduling configuration of the set of frequency resource units to the set of devices; using the individual scheduling configurations to determine a scheduling decision to assign the set of frequency resource units to the set of devices, such that a frequency resource unit can be assigned to one or more signal paths. Example embodiments provide a computer program product comprising processor executable instructions for causing an apparatus for performing at least the first method. Example embodiments provide a non-transitory computer readable medium comprising program instructions that, when executed by an apparatus, cause the apparatus to perform at least the first method. Example embodiments provide an apparatus (second apparatus) for training a reinforcement learning agent in accordance with a reinforcement learning algorithm using as environment a wireless communication system, the second apparatus comprising at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the second apparatus in each training step to: determine for the training step a time unit, a set of devices, a set of frequency resource units, a signal path of signal paths for transmission in the time unit; use by the reinforcement learning agent a state defined by a feature vector descriptive of the set of devices, the set of frequency resource units and the signal path to perform an action comprising individual actions, wherein each individual action is defined by an assignment of a respective frequency resource unit of the set of frequency units, and determine a reward which is a combination of individual rewards of the individual actions respectively for adapting the reinforcement learning agent based on the reward. Example embodiments provide a method (second method) for training a reinforcement learning agent in accordance with a reinforcement learning algorithm using as environment a wireless communication system, the second method comprising in each training step: determining for the training step a time unit, a set of devices, a set of frequency resource units, a signal path of signal paths for transmission in the time unit; using by the reinforcement learning agent a state defined by a feature vector descriptive of t