US-20260124545-A1 - SYSTEMS AND METHODS FOR UTILIZING AN AI FRAMEWORK TO DESIGN TIME-AWARE GAMES

US20260124545A1US 20260124545 A1US20260124545 A1US 20260124545A1US-20260124545-A1

Abstract

Systems and methods are described for using a deep learning model to conform a computing session for an electronic game to a time window. The disclosed methods may determine a time window for completing the computing session. The determined time window is input into the deep learning model, which includes a policy network trained to suggest in-game actions, a value network trained to determine a particular outcome from a specific game state and an upper confidence bound (UCB) for guiding a search algorithm of a data structure. Based on an output of the deep learning model, the disclosed methods may determine an in-game action to perform that will advance the electronic game towards a desired outcome of the computing session, within the time window. Thus, computing sessions may be configured to fit into a user's busy schedule, and critical computing resources may be conserved.

Inventors

Zhiyun Li
Ning Xu

Assignees

ADEIA GUIDES INC.

Dates

Publication Date: 20260507
Application Date: 20241101

Claims (20)

1 . A method for conforming a computing session of an electronic game to a time window, the method comprising: initiating the computing session for the electronic game; determining the time window for the computing session; determining a particular action to perform in the electronic game to advance the computing session to a particular outcome within the time window, wherein the determining the particular action comprises: applying a current game state and the time window to a policy network to identify one or more possible actions that may be performed in the electronic game; applying a value network to determine one or more probabilities of reaching the particular outcome from the current game state, and to estimate a number of actions that may be performed from the current game state to reach the particular outcome; and applying outputs of the policy network and the value network to a search algorithm to identify the particular action to perform in the electronic game to advance the computing session to the particular outcome within the time window, wherein the search algorithm is configured for analyzing one or more actions represented by nodes in a data structure, wherein edges between the nodes of the data structure represent transitions from a first game state to a second game state, and wherein the analyzing is constrained by an upper confidence bound (UCB) that includes a time constraint bias.
2 . The method of claim 1 , wherein the time window is one of a maximum time duration or a range of time.
3 . The method of claim 1 , further comprising: automatically determining the time window by at least one of accessing a calendar associated with a user, or accessing historical data related to previous computing sessions.
4 . The method of claim 1 , further comprising: concatenating the current game state and the estimated number of actions to advance the computing session to the particular outcome within the time window; and providing the concatenated current game state and estimated number of remaining steps as an input to the policy network, wherein the policy network is trained using backpropagation and gradient descent, and wherein the policy network is configured to output a probability distribution of a plurality of in-game moves determined to advance the computing session to the particular outcome within the time window.
5 . The method of claim 4 , wherein the policy network comprises a first loss function that includes a binary cross-entropy loss between a plurality of predicted in-game move probabilities and a plurality of target in-game move probabilities, and wherein the policy network is trained based at least in part on a plurality of game states, the one or more possible actions that may be performed from a given game state to reach a given outcome, and the one or more probabilities associated with the one or more possible actions.
6 . The method of claim 5 , wherein the value network is configured to output a probability of achieving the particular outcome and the estimated number of actions to advance the computing session to the particular outcome within the time window based at least in part on the current game state, wherein the value network comprises a second loss function that includes a binary cross-entropy loss associated with the probability of reaching the particular outcome and a mean squared error loss associated with the estimated number of actions, wherein the value network is trained based at least on part on a plurality of games states, wherein each game state of the plurality of games states is associated one or more outcomes that may be performed from a respective game state to reach the given outcome, and wherein each action of the estimated number of actions is associated with an amount of time.
7 . The method of claim 1 , further comprising: directing the search algorithm to search the data structure, wherein the data structure comprises a plurality of nodes, and wherein each node of the plurality of nodes is associated with an in-game action of a plurality of in-game actions; successively selecting child nodes of the plurality of nodes; based at least in part on determining that a selected child node is not in a terminal state, adding one or more new child nodes to the data structure; performing a game session simulation from the one or more new child nodes; determining that the one or more new child nodes is in the terminal state; and updating a reward value associated with each node of the plurality of nodes along a path from the one or more new child nodes in the terminal state to a root of the data structure using backpropagation, wherein the reward value is determined by a reward function that includes penalties for exceeding the time window and rewards for finishing within the time window.
8 . The method of claim 7 , further comprising: computing the UCB for each node of the plurality of nodes in the data structure by: determining a penalty for each in-game action of the plurality of in-game actions that are expected to extend the computing session beyond the time window; and determining a level of urgency to conclude the computing session within the time window based at least in part on a weight associated with the penalty; and updating the search algorithm to include the UCB for each node of the plurality of nodes in the data structure, wherein the UCB includes an average win rate for each node of the plurality of nodes, wherein the UCB provides an indication, to the search algorithm, of which other nodes, of the plurality of nodes that have been visited less frequency, should be searched, and wherein a search depth of the search algorithm is dynamically adjusted based on the estimated number of actions.
9 . The method of claim 1 , further comprising: determining a level of difficulty of the computing session for the electronic game corresponding to the time window, wherein the level of difficulty is determined based at least in part on historical data related to previous computing sessions or a plurality of crowd-sourced statistics; and updating a deep learning model to include the level of difficulty, wherein the level of difficulty remains consistent throughout the computing session.
10 . The method of claim 1 , wherein the particular action to advance the computing session towards the particular outcome within the time window is at least one of: recommending interaction with a portion of content of the electronic game that is determined to be suitable for advancing the computing session towards the particular outcome within the time window, providing dynamic hints for a human player, or providing suggested moves for a computer-based opponent.
11 . The method of claim 1 , wherein the electronic game is an electronic turn-based strategy game, an electronic real-time strategy game, an electronic role-playing game, an electronic puzzle game, an electronic board game, or any other game that comprises a physical manifestation of computer-based actions.
12 . The method of claim 1 , further comprising: generating for display a user interface, wherein the electronic game is presented on at least a portion of the user interface; receiving a first user-interface input, via the user interface, to begin the computing session; receiving, via the user interface, a notification indicating the time window for advancing the computing session; receiving, via the user interface, a second user-interface input, via the user interface, indicating the particular outcome; inputting the time window into a deep learning model trained to conform the computing session to the time window; and performing the particular action to advance the electronic game towards the particular outcome of the computing session within the time window, wherein performing the particular action comprises at least one of: generating for display, via the user interface, a recommendation for interacting with a portion of content of the electronic game that is determined to be suitable for advancing the computing session to the particular outcome within the time window; generating for display, via the user interface, a dynamic hint for a human player; and generating for display, via the user interface, an indication that suggested moves for a computer-based opponent have been provided.
13 . A system comprising: a memory; and a control circuitry configured to: initiate a computing session for an electronic game; determine the time window for the computing session; determine a particular action to perform in the electronic game to advance the computing session to a particular outcome within the time window, wherein the control circuitry is configured to determine the particular action by: applying a current game state and the time window to a policy network to identify one or more possible actions that may be performed in the electronic game; applying a value network to determine one or more probabilities of reaching the particular outcome from the current game state, and to estimate a number of actions that may be performed from the current game state to reach the particular outcome; and applying outputs of the policy network and the value network to a search algorithm to identify the particular action to perform in the electronic game to advance the computing session to the particular outcome within the time window, wherein the search algorithm is configured for analyzing one or more actions represented by nodes in a data structure, wherein edges between the nodes of the data structure represent transitions from a first game state to a second game state, wherein the data structure is stored in the memory, and wherein the analyzing is constrained by an upper confidence bound (UCB) that includes a time constraint bias.
14 - 15 . (canceled)
16 . The system of claim 13 , wherein the control circuitry is further configured to: concatenate the current game state and the estimated number of actions to advance the computing session to the particular outcome within the time window; and provide the concatenated current game state and estimated number of remaining steps as an input to the policy network, wherein the policy network is trained using backpropagation and gradient descent, and wherein the policy network is configured to output a probability distribution of a plurality of in-game moves determined to advance the computing session to the particular outcome within the time window.
17 . The system of claim 16 , wherein the policy network comprises a first loss function that includes a binary cross-entropy loss between a plurality of predicted in-game move probabilities and a plurality of target in-game move probabilities, and wherein the policy network is trained based at least in part on a plurality of game states, the one or more possible actions that may be performed from a given game state to reach a given outcome, and the one or more probabilities associated with the one or more possible actions.
18 . The system of claim 17 , wherein the value network is configured to output a probability of achieving the particular outcome and the estimated number of actions to advance the computing session to the particular outcome within the time window based at least in part on the current game state, wherein the value network comprises a second loss function that includes a binary cross-entropy loss associated with the probability of reaching the particular outcome and a mean squared error loss associated with the estimated number of actions, wherein the value network is trained based at least on part on a plurality of games states, wherein each game state of the plurality of games states is associated one or more outcomes that may be performed from a respective game state to reach the given outcome, and wherein each action of the estimated number of actions is associated with an amount of time.
19 . The system of claim 13 , wherein the control circuitry is further configured to: direct the search algorithm to search the data structure, wherein the data structure comprises a plurality of nodes, and wherein each node of the plurality of nodes is associated with an in-game action of a plurality of in-game actions; successively select child nodes of the plurality of nodes; based at least in part on determining that a selected child node is not in a terminal state, add one or more new child nodes to the data structure; perform a game session simulation from the one or more new child nodes; determine that the one or more new child nodes is in the terminal state; and update a reward value associated with each node of the plurality of nodes along a path from the one or more new child nodes in the terminal state to a root of the data structure using backpropagation, wherein the reward value is determined by a reward function that includes penalties for exceeding the time window and rewards for finishing within the time window.
20 . The system of claim 19 , wherein the control circuitry is further configured to: compute the UCB for each node of the plurality of nodes in the data structure by: determining a penalty for each in-game action of the plurality of in-game actions that are expected to extend the computing session beyond the time window; and determining a level of urgency to conclude the computing session within the time window based at least in part on a weight associated with the penalty; and update the search algorithm to include the UCB for each node of the plurality of nodes in the data structure, wherein the UCB includes an average win rate for each node of the plurality of nodes, wherein the UCB provides an indication, to the search algorithm, of which other nodes, of the plurality of nodes that have been visited less frequency, should be searched, and wherein a search depth of the search algorithm is dynamically adjusted based on the estimated number of actions.
21 . The system of claim 13 , wherein the control circuitry is further configured to: determine a level of difficulty of the computing session for the electronic game corresponding to the time window, wherein the level of difficulty is determined based at least in part on historical data related to previous computing sessions or a plurality of crowd-sourced statistics; and update a deep learning model to include the level of difficulty, wherein the level of difficulty remains consistent throughout the computing session.

Description

FIELD OF DISCLOSURE The present disclosure relates to applying machine learning (ML) or artificial intelligence (AI) to manage game play. The present disclosure further relates to adapting and applying an AI or ML model to conform a computing session of an electronic game to conclude or otherwise reach a particular game state within a specified time frame or window. SUMMARY Artificial intelligence and/or machine learning models are often used as opponents in gaming sessions. Typically, such AI models can be trained to outmatch players easily. AI models, especially in turn-based electronic games, can analyze thousands of in-game moves (or more) in real-time, as well as probable consequences of those moves, and choose the best strategy for any given instant. Similarly, AI-based opponent models can be trained to play at certain levels of competition (e.g., difficulties) such as novice/easy, intermediate/medium, experienced/hard, and beyond/master. In some cases, games may offer AI-based training for players, which gradually elevates the difficulty to encourage player development and replay. However, dynamically changing the difficulty of a gaming session can squander valuable resources that may be otherwise conserved if the AI model did not change. For example, an AI gaming model-even models initially set to “novice”-requires computing resources such as processors, memory, input/output connections, and more. Using resources for two or more AI-based opponent models, operating at different difficulty levels (e.g., novice and experienced), during one gaming session is inefficient. Computing resources, e.g., local resources and/or shared cloud resources, can be better optimized by scheduling an electronic gaming session with an AI-opponent to fit within a particular time window. Moreover, an AI-based opponent behaving inconsistently throughout the electronic gaming session time window may undermine the game experience. There exists a need to manage computing resources using a specified time window for a gaming session. In some approaches, game developers have created an AI-based opponent that is conditioned to attempt to win the game as quickly as possible. In some approaches, game developers have created an AI-based opponent that is conditioned to take additional time to make in-game moves. In some approaches, game developers have created an AI-based opponent that might behave erratically and jump to different difficulty levels at different times, e.g., playing as an expert one second and then as a novice the next turn. Some approaches to manage video game time include managing the “units of content” that may be present in a video game. The system selects certain units of content according to a user's available playing time. For instance, a game system may present a particular quest or challenge that is expected to fit into an allotted amount of time. However, this approach lacks the flexibility and granularity required for providing impromptu gaming sessions, since the units of content have to be defined and configured in the initial game design stage. Additionally, it is questionable to skip some units of content without compromising the integrity of the game, unless those units of content are some kind of introductory cutscenes/tutorial in the game, which can typically be skipped manually. Furthermore, this technique may not apply to a wider range of games, such as turn-based games, which do not typically have the flexibility to rearrange the actions or only select certain actions. Other approaches may dynamically adjust the game's difficulty level. The goal of this approach is to design a game with a difficulty level that is most likely to keep a user engaged for a longer period of time. For example, some players with a higher skill set prefer to play the game at a more challenging difficulty, while more beginner players may prefer to play the game at an easier difficulty. Although a game's difficulty level may be associated with a duration of a gaming session, it does not control the duration of the gaming session directly. For example, if the difficulty level is too low or too high for the user, in either case the game may end early. One of the major issues that arise in this regard is the unpredictability in allocating resources to AI-based opponents. For example, a game of chess can last anywhere from a few minutes to several hours, making it difficult for the game system to gauge how many resources to dedicate to a particular computing session. Additionally, it is often desirable to play a game against an AI-based opponent at higher difficulty levels, but without the game system allocating too many resources to the AI model, so that the game system may utilize the excess resources for other game functions. Switching between different AI models or game difficulty levels may require a game system to expend more resources than are necessary to restrain a computing session to a particular time period. To hel