EP-4741006-A1 - SYSTEMS AND METHODS FOR AUTOMATIC RADIOTHERAPY TREATMENT PLAN GENERATION USING REINFORCEMENT LEARNING BOOSTED WITH A MACHINE LEARNING DOSE PREDICTION MODEL

EP4741006A1EP 4741006 A1EP4741006 A1EP 4741006A1EP-4741006-A1

Abstract

Embodiments described herein provide for radiotherapy treatment plan generation using reinforcement learning. A processor can use (202) a machine learning model (e.g., a neural network, random forest, a support vector machine, etc.) to predict a three-dimensional dose distribution based on the patient's treatment attributes. The processor can generate (204) a cost function with weighted dose-volume objectives from the predicted three-dimensional dose distribution. The processor can determine (206) a first three-dimensional dose distribution that reduces a first cost value based on the weighted dose-volume objectives. The processor can determine (208) a difference between the predicted and determined dose distributions. The processor can adjust (210), using a reinforcement learning agent, the dose-volume objectives of the cost function. The processor can reduce (212) a cost value of the adjusted cost function to a second cost value. If (214) the difference between the first and second cost values meets a given threshold, the processor can generate (216) a radiotherapy treatment plan.

Inventors

BASIRI, Shahab
CZEIZLER, Elena

Assignees

Siemens Healthineers International AG

Dates

Publication Date: 20260513
Application Date: 20251106

Claims (15)

A method comprising: executing, by a processor, a dose prediction machine learning model using one or more treatment attributes for a patient to generate a predicted three-dimensional dose distribution for the patient; generating, by the processor, a set of weighted dose-volume objectives of a cost function based on the predicted three-dimensional dose distribution for the patient; determining, by the processor, a first three-dimensional dose distribution that reduces a first cost value of the cost function; determining, by the processor, a difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; adjusting, by the processor using a reinforcement learning agent, the set of weighted dose-volume objectives of the cost function based on the difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; determining, by the processor, a second three-dimensional dose distribution that reduces a second cost value based on the adjusted set of weighted dose-volume objectives of the cost function; and responsive to determining a difference between the first cost value and the second cost values satisfies a threshold, generating, by the processor, a radiotherapy treatment plan for the patient based on the adjusted set of weighted dose-volume objectives.
The method of claim 1, further comprising: executing, by the processor, the dose prediction machine learning model using one or more second treatment attributes for a second patient to generate a second predicted three-dimensional dose distribution for the second patient; generating, by the processor, a second set of weighted dose-volume objectives of a second cost function based on the second predicted three-dimensional dose distribution for the second patient; determining, by the processor, a third three-dimensional dose distribution that reduces a third cost value of the second cost function; determining, by the processor, a second difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution; determining, by the processor, a reward value at least according to the difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution; and training, by the processor, the reinforcement learning agent based on the reward value.
The method of claim 2, further comprising: receiving, by the processor, third one or more treatment attributes of a third radiotherapy treatment plan for a third patient; generating, by the processor, a third set of weighted dose-volume objectives based on the third one or more treatment attributes for the third patient; executing, by the processor, the trained reinforcement learning agent to adjust the third set of weighted dose-volume objectives; and generating, by the processor, a third radiotherapy treatment plan for the third patient based on the third adjusted set of weighted dose-volume objectives.
The method of claim 2 or claim 3, wherein determining the reward value comprises determining, by the processor, the reward value based on a comparison of the difference between the third three-dimensional dose distribution and a second threshold.
The method of any of claim 2 to claim 4, wherein determining the reward value comprises: applying, by the processor, a set of criteria to the third three-dimensional dose distribution; and determining, by the processor, the reward value based on the application of the set of criteria to the third three-dimensional dose distribution.
The method of any of claim 2 to claim 5 wherein determining the reward value comprises: responsive to determining the third three-dimensional dose distribution is within the threshold of the second predicted three-dimensional dose distribution, applying, by the processor, a set of criteria to the third three-dimensional dose distribution; and determining, by the processor, the reward value based on the application of the set of criteria to the third three-dimensional dose distribution.
The method of any preceding claim, wherein generating the set of weighted dose-volume objectives of the cost function comprises assigning, by the processor, one or more weights according to a stored template of weights that indicates weights to apply to different structures of the patient; and/or: wherein generating the set of weighted dose-volume objectives of the cost function comprises assigning, by the processor, one or more weights according to a stored a ranked list of targets for the radiotherapy treatment plan.
The method of any preceding claim, wherein adjusting the set of weighted dose-volume objectives comprises inserting, by the processor using the reinforcement learning agent, one or more second objectives and corresponding weights into the set of weighted dose-volume objectives, the one or more second objectives corresponding to different structures of the patient; and/or: wherein each objective of the set of weighted dose-volume objectives corresponds to a different structure within the patient and a different reinforcement learning agent of a plurality of reinforcement learning agents, and wherein adjusting the set of weighted dose-volume objectives of the cost function comprises adjusting, by the processor, the set of weighted dose-volume objectives using the plurality of reinforcement learning agents.
A system comprising: one or more processors coupled with memory, the memory comprising instructions that, when executed by the one or more processors, cause the one or more processors to: execute a dose prediction machine learning model using one or more treatment attributes for a patient to generate a predicted three-dimensional dose distribution for the patient; generate a set of weighted dose-volume objectives of a cost function based on the predicted three-dimensional dose distribution for the patient; determine a first three-dimensional dose distribution that reduces a first cost value of the cost function; determine a difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; adjust, using a reinforcement learning agent, the set of weighted dose-volume objectives of the cost function based on the difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; determine a second three-dimensional dose distribution that reduces a second cost value based on the adjusted set of weighted dose-volume objectives of the cost function; and responsive to determining a difference between the first cost value and the second cost values satisfies a threshold, generate a radiotherapy treatment plan for the patient based on the adjusted set of weighted dose-volume objectives.
The system of claim 9, wherein the instructions further cause the one or more processors to: execute the dose prediction machine learning model using one or more second treatment attributes for a second patient to generate a second predicted three-dimensional dose distribution for the second patient; generate a second set of weighted dose-volume objectives of a second cost function based on the second predicted three-dimensional dose distribution for the second patient; determine a third three-dimensional dose distribution that reduces a third cost value of the second cost function; determine a second difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution; determine a reward value at least according to the difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution; and train the reinforcement learning agent based on the reward value.
The system of claim 10, wherein the instructions further cause the one or more processors to: receive third one or more treatment attributes of a third radiotherapy treatment plan for a third patient; generate a third set of weighted dose-volume objectives based on the third one or more treatment attributes for the third patient; execute the trained reinforcement learning agent to adjust the third set of weighted dose-volume objectives; and generate a third radiotherapy treatment plan for the third patient based on the third adjusted set of weighted dose-volume objectives.
The system of claim 10 or claim 11, wherein the instructions cause the one or more processors to determine the reward value by determining the reward value based on a comparison of the difference between the third three-dimensional dose distribution and a second threshold.
The system of any of claim 10 to claim 12, wherein the instructions cause the one or more processors to determine the reward value by: applying a set of criteria to the third three-dimensional dose distribution; and determining the reward value based on the application of the set of criteria to the third three-dimensional dose distribution; and/or: wherein the instructions cause the one or more processors to determine the reward value by responsive to determining the third three-dimensional dose distribution is within the threshold of the second predicted three-dimensional dose distribution, applying a set of criteria to the third three-dimensional dose distribution; and determining the reward value based on the application of the set of criteria to the third three-dimensional dose distribution.
The system of any of claim 9 to claim 13, wherein the instructions cause the one or more processors to generate the set of weighted dose-volume objectives of the cost function by assigning one or more weights according to a stored template of weights that indicates weights to apply to different structures of the patient; and/or: wherein the instructions cause the one or more processors to generate the set of weighted dose-volume objectives of the cost function by assigning one or more weights according to a stored a ranked list of targets for the radiotherapy treatment plan.
The system of any of claim 9 to claim 14, wherein the instructions cause the one or more processors to adjust the set of weighted dose-volume objectives by inserting, using the reinforcement learning agent, one or more second objectives and corresponding weights into the set of weighted dose-volume objectives, the one or more second objectives corresponding to different structures of the patient; and/or: wherein each objective of the set of weighted dose-volume objectives corresponds to a different structure within the patient and a different reinforcement learning agent of a plurality of reinforcement learning agents, and wherein the instructions cause the one or more processors to adjust the set of weighted dose-volume objectives of the cost function by adjusting the set of weighted dose-volume objectives using the plurality of reinforcement learning agents.

Description

TECHNICAL FIELD This application relates generally to generating a radiotherapy treatment plan using reinforcement learning. BACKGROUND Radiation therapy treatment planning (RTTP) is a complex process that contains specific guidelines, protocols, and instructions adopted by different medical professionals, such as the clinicians, the medical device manufacturers, and the like. Typically, identifying and applying guidelines to implement radiation therapy treatment are performed by complex computer models that receive treatment objectives from a treating physician and identify suitable attributes of the RTTP. For instance, the treating physicians may identify the treatment modality (e.g., choose between the volumetric modulated arc therapy (VMAT) or intensity-modulated radiation therapy (IMRT)). The treating physician may then input various objectives and goals to be achieved via the treatment, such as dose objectives to be achieved for one or more structures of the patient. A software solution may then use various methods to calculate attributes of the patient's treatment, such as determining beam-limiting device angles and radiation-emitting attributes. In the case of IMRT, the beam delivery directions and number of beams are the specifically relevant variables that must be decided, whereas, for VMAT, the software solution may need to choose the number of arcs and their corresponding start and stop angles. In personalized radiation therapy plan optimization, achieving and/or scoring the trade-offs between target coverage and OAR sparing heavily depends on the formulation of the cost function. Formulating the cost function can require case-specific optimization objectives, which are not known to the planner prior to the optimization. Therefore, the planner needs to find the objectives through an iterative process. Accordingly, generating a personalized plan can be a time-consuming and resource-intensive process, even if all of the components in the plan generation pipeline are fully automated. SUMMARY In accordance with a first aspect of the invention, there is provided a method as defined by claim 1. In accordance with a second aspect of the invention, there is provided a system as defined by claim 9. Optional features are defined by the dependent claims. A computer model can be configured to generate radiotherapy treatment plans using a cost function to determine radiation dose distributions among patient structures. A user can input initial objectives for the cost function, and the computer model can iteratively adjust the objectives to identify or determine an optimal dose distribution for treating a patient. This process can involve a large amount of time and computer resources depending on how close the initial objectives are to the optimal dose distribution and/or the number of iterations of adjustments the computer model performs to identify the optimal dose distribution. A computer implementing the systems and methods described herein can use machine learning and reinforcement learning techniques to improve efficiency in generating a radiotherapy treatment plan. The computer can do so using a reinforcement learning model and a dose prediction machine learning model. For example, the computer can receive patient treatment attributes (e.g., computed tomography (CT) images, field geometry settings, dose prescriptions, etc.) and use the treatment attributes as input into the dose prediction machine learning model. The computer can execute the dose prediction machine learning model to generate a predicted three-dimensional dose distribution. The computer can use the predicted three-dimensional dose distribution to create a cost function with weighted objectives for different patient structures (e.g., organs, bones, tumors, etc.). The computer can execute or apply an optimization algorithm on the cost function containing the objectives to generate a first three-dimensional dose distribution that reduces (e.g., minimizes) the cost function to a first cost value of implementing the first three-dimensional dose distribution (e.g., treating the patient using the first three-dimensional dose distribution). The computer can implement a reinforcement learning agent to adjust the objectives' values and/or weights of the cost function to identify optimal objectives which can be used to generate an optimal plan for treating the patient. For example, the reinforcement learning agent can determine a difference (e.g., determine a distance using a distance function) between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution. The reinforcement learning agent can adjust the objectives' values and/or weights of the cost function based on the difference, such as to make the three-dimensional dose distribution of the cost function closer to the predicted three-dimensional dose distribution. The reinforcement learning agent can further adjust the objectives' values and/or weig