JP-7856810-B2 - Attentional neural network with short-term memory units

JP7856810B2JP 7856810 B2JP7856810 B2JP 7856810B2JP-7856810-B2

Inventors

アンドレア・バニーノ
アドリア・プイドメネチ・バディア
ジェイコブ・チャールズ・ウォルカー
ジョヴァナ・ミトロヴィッチ
チャールズ・ブランデル
ティモシー・アンソニー・ジュリアン・ショルテス

Assignees

ジーディーエム・ホールディング・エルエルシー

Dates

Publication Date: 20260511
Application Date: 20250305
Priority Date: 20210205

Claims (19)

A system for performing tasks, One or more computers, The system comprises one or more storage devices that store instructions that, when executed by the one or more computers, cause the one or more computers to implement a neural network configured to perform the task , The aforementioned neural network is An attention subnetwork is configured to receive, in each of a plurality of time steps, an attention subnetwork input generated from the input received by the neural network for the time step , and to generate an attention subnetwork output based on at least applying an attention mechanism to the attention subnetwork input, The recurrent subnetwork is configured to receive a recurrent subnetwork input generated from the attention subnetwork output and generate a recurrent subnetwork output in each of the plurality of time steps to update the current hidden state of the recurrent subnetwork corresponding to the time step , wherein the current hidden state is generated by processing one or more previous recurrent subnetwork inputs, and updating the current hidden state modifies the current hidden state by processing the received recurrent subnetwork input . A system comprising, in each of the plurality of time steps, an output subnetwork configured to receive an output subnetwork input generated from the recurrent subnetwork output , and to process the output subnetwork input to generate an output for the task.
The aforementioned neural network is The system according to claim 1, further comprising an encoder subnetwork configured to process the input received by the neural network for each of a plurality of time steps, and to generate an encoded representation of the input.
The system according to claim 2, wherein the attention subnetwork input includes the encoded representation of the input.
The system according to any one of claims 1 to 3, wherein the attention mechanism is a masked attention mechanism.
The system according to any one of claims 1 to 4, wherein the recurrent subnetwork comprises one or more long short-term memory (LSTM) layers.
The output for the task includes a numerical probability value for each output item in the set of possible output items. The system according to any one of claims 1 to 5, wherein performing the task includes selecting an output item for the time step from the set of possible output items based on the respective numerical probability values.
The system according to any one of claims 2 to 6, wherein the neural network further comprises a gating layer configured to apply a gating mechanism to i) an encoded representation of the input and ii) the attention subnetwork output, in order to generate the recurrent subnetwork input.
The system according to claim 7, wherein applying the gating mechanism to the encoded representation of the input and the attention subnetwork output comprises applying a gated recurrent unit (GRU) to the encoded representation of the input and the attention subnetwork output.
The system according to any one of claims 1 to 8, wherein in each of the plurality of time steps, the attention subnetwork input includes an encoded representation of the input and an encoded representation of one or more previous inputs received by the neural network for one or more previous time steps.
A method performed by one or more computers, The steps include: using an attention subnetwork of a neural network configured to perform a task, processing the attention subnetwork input generated from the input received by the neural network, and generating an attention subnetwork output based on applying an attention mechanism to the attention subnetwork input; A step of generating a recurrent subnetwork output by processing a recurrent subnetwork input generated from the attention subnetwork output in order to update the current hidden state of the neural network, wherein the current hidden state is generated by processing one or more previous recurrent subnetwork inputs, and updating the current hidden state modifies the current hidden state by processing the received recurrent subnetwork input. The steps include: using the output subnetwork of the neural network to process the output subnetwork input generated from the recurrent subnetwork output to generate an output for the task; Methods that include...
The encoder subnetwork of the aforementioned neural network, The method according to claim 10, further comprising the step of processing the input received by the neural network to generate an encoded representation of the input.
The method according to claim 11, wherein the attention subnetwork input includes the encoded representation of the input.
The method according to claim 10, wherein the attention mechanism is a masked attention mechanism.
The method according to claim 10, wherein the recurrent subnetwork comprises one or more long short-term memory (LSTM) layers.
The output for the task includes a numerical probability value for each output item in the set of possible output items. The method according to claim 10, wherein performing the task includes selecting an output item for a time step from the set of possible output items based on the respective numerical probability values.
The method according to claim 11, further comprising the steps of applying a gating mechanism to i) an encoded representation of the input and ii) the attention subnetwork output, in order to generate the recurrent subnetwork input by a gating layer of the neural network.
The method according to claim 16, wherein applying the gating mechanism to the encoded representation of the input and ii) the attention subnetwork output comprises applying a gated recurrent unit (GRU) to the encoded representation of the input and ii) the attention subnetwork output.
The method according to claim 11, wherein the attention subnetwork input includes an encoded representation of the input and an encoded representation of one or more previous inputs received by the neural network.
One or more computer storage media storing instructions that, when executed by one or more computers, cause one or more computers to perform each of the operations of any one of the methods described in any one of claims 10 to 18.

Description

Cross-reference of related applications: This application claims priority to U.S. Provisional Patent Application No. 63/146,361, filed on 5 February 2021. The disclosures of the prior applications are, by reference, deemed to be part of the disclosures of this application and incorporated therein. This specification relates to reinforcement learning. In a reinforcement learning system, an agent interacts with the environment by performing an action selected by the reinforcement learning system in response to receiving observations that characterize the current state of the environment. Some reinforcement learning systems select an action to be performed by an agent in response to a given observation, based on the output of the neural network. A neural network is a machine learning model that uses one or more layers of nonlinear units to predict an output based on an incoming input. Some neural networks are deep neural networks that include one or more hidden layers in addition to the output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer, or the output layer. Each layer of the network generates an output from the incoming input, depending on the current values of the parameters of its respective set. Vaswani et al., Attention Is All You Need, arXiv:1706.03762Parisotto et al., Stabilizing transformers for reinforcement learning, arXiv:1910.06764Song et al., V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control, arXiv:1909.12238Kapturowski et al., Recurrent experience replay in distributed reinforcement learning. In International conference on learning representations, 2018DeepMind Lab(https://arxiv.org/abs/1612.03801) This is a diagram illustrating an exemplary reinforcement learning system.This flowchart shows an exemplary process for controlling an agent.This flowchart illustrates an exemplary process for determining the update of parameter values in an attention-selection neural network.This diagram illustrates the process of deciding how to update the parameter values of an attention-selection neural network.This figure shows a quantitative example of the performance gain that can be achieved by using the control neural network system described herein. Similar reference numerals and names in various drawings refer to the same elements. This specification describes a reinforcement learning system that controls an agent interacting with an environment by processing data characterizing the current state of the environment at each of a number of time steps (i.e., "observations") in order to select the action that the agent should perform at each of those time steps. In each time step, the state of the environment at that time step depends on the state of the environment in the previous time step and the actions performed by the agent in the previous time step. In some implementations, the environment is a real-world environment, and the agent is a machine agent that interacts with the real-world environment, such as a robot moving through the environment, or an autonomous or semi-autonomous ground, air, or sea vehicle. In these implementations, observations may include, for example, images, object position data, and sensor data—one or more of these—captured as the agent interacts with the environment, such as sensor data from an image sensor, distance sensor, or position sensor, or from an actuator. For example, in the case of a robot, the observation may include one or more data characterizing the robot's current state, such as joint position, joint velocity, joint force, torque or acceleration, such as gravity-compensated torque feedback, or the overall or relative posture of the item being held by the robot. In the case of robots or other mechanical agents or vehicles, observations may similarly include one or more of the following: position, linear velocity or angular velocity, force, torque or acceleration, and the overall or relative orientation of one or more parts of the agent. Observations may be defined in one, two, or three dimensions, and may be absolute and/or relative observations. Furthermore, observations may include, for example, sensed electronic signals such as motor current or temperature signals, and/or image data or video data from, for example, a camera or LiDAR sensor, such as data from the agent's sensors or data from sensors positioned separately from the agent in the environment. In these implementations, the action may be a control input for controlling a robot, such as torque on the robot's joints or a high-level control command, or a control input for controlling an autonomous or semi-autonomous ground, air, or sea vehicle, such as torque on the vehicle's control surface or other control elements or a high-level control command. In other words, an action can include, for example, positional data, velocity data, or force/torque/acceleration data relating to one or more jo