CN-122006144-A - System and method for automated radiotherapy treatment plan generation using reinforcement learning

CN122006144ACN 122006144 ACN122006144 ACN 122006144ACN-122006144-A

Abstract

Embodiments described herein provide systems and methods for automated radiotherapy treatment plan generation using reinforcement learning. The processor may predict the three-dimensional dose distribution based on the therapy attributes for the patient using a machine learning model (e.g., neural network, random forest, support vector machine, etc.). The processor may generate a cost function with a weighted dose-volume target from the predicted three-dimensional dose distribution. The processor may determine a first three-dimensional dose distribution that reduces the first generation value based on the weighted dose-volume targets. The processor may determine a difference between the predicted dose distribution and the determined dose distribution. The processor may use a reinforcement learning agent to adjust the dose-volume target of the cost function. The processor may reduce the cost value of the adjusted cost function to a second cost value. The processor may generate a radiotherapy treatment plan if a difference between the first generation value and the second cost value meets a given threshold.

Inventors

S. Brasili
E. Chezler

Assignees

西门子医疗国际股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251107
Priority Date: 20241111

Claims (20)

1.A method, comprising: executing, by a processor, a dose prediction machine learning model using one or more treatment attributes for a patient to generate a predicted three-dimensional dose distribution for the patient; Generating, by the processor, a weighted dose-volume target set of cost functions based on the predicted three-dimensional dose distribution for the patient; determining, by the processor, a first three-dimensional dose distribution that reduces a first generation value of the cost function; Determining, by the processor, a difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; Adjusting, by the processor, the weighted dose-volume target set of the cost function based on the difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution using a reinforcement learning agent; determining, by the processor, a second three-dimensional dose distribution that reduces a second cost value based on the adjusted weighted dose-volume target set of the cost function, and Responsive to determining that a difference between the first generation value and the second cost value meets a threshold, generating, by the processor, a radiation therapy treatment plan for the patient based on the adjusted weighted dose-volume target set.
2. The method of claim 1, further comprising: executing, by the processor, the dose predictive machine learning model using one or more second therapy attributes for a second patient to generate a second predicted three-dimensional dose distribution for the second patient; Generating, by the processor, a second weighted dose-volume target set of a second cost function based on the second predicted three-dimensional dose distribution for the second patient; determining, by the processor, a third three-dimensional dose distribution that reduces a third generation value of the second cost function; Determining, by the processor, a second difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution; determining, by the processor, a prize value based at least on the difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution, and Training, by the processor, the reinforcement learning agent based on the reward value.
3. The method of claim 2, further comprising: Receiving, by the processor, a third one or more treatment attributes for a third radiation therapy treatment plan for a third patient; Generating, by the processor, a third weighted dose-volume target set based on the third one or more treatment attributes for the third patient; executing, by the processor, the trained reinforcement learning agent to adjust the third weighted dose-volume target set, and A third radiation therapy treatment plan for the third patient is generated by the processor based on the adjusted third weighted dose-volume target set.
4. The method of claim 2, wherein determining the prize value includes determining, by the processor, the prize value based on a comparison of the difference between the third three-dimensional dose distribution and a second threshold.
5. The method of claim 2, wherein determining the prize value comprises: Applying, by the processor, a set of criteria to the third three-dimensional dose distribution, and The prize value is determined by the processor based on the application of the set of criteria to the third three-dimensional dose distribution.
6. The method of claim 2, wherein determining the prize value comprises: Applying, by the processor, a set of criteria to the third three-dimensional dose distribution in response to determining that the third three-dimensional dose distribution is within the threshold of the second predicted three-dimensional dose distribution, and The prize value is determined by the processor based on the application of the set of criteria to the third three-dimensional dose distribution.
7. The method of claim 1, wherein generating the weighted dose-volume target set of the cost function comprises assigning, by the processor, one or more weights according to a stored weight template, the weight template indicating weights to apply to different structures of the patient.
8. The method of claim 1, wherein generating the weighted dose-volume target set of the cost function comprises assigning, by the processor, one or more weights according to a stored ranked list of targets for the radiotherapy treatment plan.
9. The method of claim 1, wherein adjusting the weighted dose-volume target set comprises inserting, by the processor, one or more second targets and corresponding weights into the weighted dose-volume target set using the reinforcement learning agent, the one or more second targets corresponding to different structures of the patient.
10. The method of claim 1, wherein each target in the weighted dose-volume target set corresponds to a different structure within the patient and a different reinforcement learning agent of a plurality of reinforcement learning agents, and Wherein adjusting the weighted dose-volume target set of the cost function includes adjusting, by the processor, the weighted dose-volume target set using the plurality of reinforcement learning agents.
11. A system, comprising: One or more processors coupled with a memory, the memory comprising instructions that, when executed by the one or more processors, cause the one or more processors to: Executing a dose prediction machine learning model using one or more treatment attributes for a patient to generate a predicted three-dimensional dose distribution for the patient; Generating a weighted dose-volume target set of cost functions based on the predicted three-dimensional dose distribution for the patient; Determining a first three-dimensional dose distribution that reduces a first generation value of the cost function; Determining a difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; Using a reinforcement learning agent to adjust the weighted dose-volume target set of the cost function based on the difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution; Determining a second three-dimensional dose distribution reducing a second cost value based on the adjusted weighted dose-volume target set of the cost function, and In response to determining that a difference between the first generation value and the second cost value meets a threshold, a radiation therapy treatment plan for the patient is generated based on the adjusted weighted dose-volume target set.
12. The system of claim 11, wherein the instructions further cause the one or more processors to: Executing the dose prediction machine learning model using one or more second treatment attributes for a second patient to generate a second predicted three-dimensional dose distribution for the second patient; Generating a second weighted dose-volume target set of a second cost function based on the second predicted three-dimensional dose distribution for the second patient; determining a third three-dimensional dose distribution that reduces a third generation value of the second cost function; determining a second difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution; Determining a prize value based at least on the difference between the third three-dimensional dose distribution and the second predicted three-dimensional dose distribution, and Training the reinforcement learning agent based on the reward value.
13. The system of claim 12, wherein the instructions further cause the one or more processors to: receive a third one or more treatment attributes of a third radiation therapy treatment plan for a third patient; Generating a third weighted dose-volume target set based on the third one or more treatment attributes for the third patient; Executing the trained reinforcement learning agent to adjust the third weighted dose-volume target set, and A third radiation therapy treatment plan for the third patient is generated based on the adjusted third weighted dose-volume target set.
14. The system of claim 12, wherein the instructions cause the one or more processors to determine the bonus value by comparing based on the difference between the third three-dimensional dose distribution and a second threshold.
15. The system of claim 12, wherein the instructions cause the one or more processors to determine the prize value by: Applying a set of criteria to the third three-dimensional dose distribution, and The prize value is determined based on the application of the set of criteria to the third three-dimensional dose distribution.
16. The system of claim 12, wherein the instructions cause the one or more processors to determine the prize value by: Applying a set of criteria to the third three-dimensional dose distribution in response to determining that the third three-dimensional dose distribution is within the threshold of the second predicted three-dimensional dose distribution, and The prize value is determined based on the application of the set of criteria to the third three-dimensional dose distribution.
17. The system of claim 11, wherein the instructions cause the one or more processors to generate the weighted dose-volume target set of the cost function by assigning one or more weights according to a stored weight template that indicates weights to apply to different structures of the patient.
18. The system of claim 11, wherein the instructions cause the one or more processors to generate the weighted dose-volume target set of the cost function by assigning one or more weights according to a stored ranked list of targets for the radiotherapy treatment plan.
19. The system of claim 11, wherein the instructions cause the one or more processors to adjust the weighted dose-volume target set by inserting one or more second targets and corresponding weights into the weighted dose-volume target set using the reinforcement learning agent, the one or more second targets corresponding to different structures of the patient.
20. The system of claim 11, wherein each target in the weighted dose-volume target set corresponds to a different structure within the patient and a different reinforcement learning agent of a plurality of reinforcement learning agents, and Wherein the instructions cause the one or more processors to adjust the weighted dose-volume target set of the cost function by adjusting the weighted dose-volume target set using the plurality of reinforcement learning agents.

Description

System and method for automated radiotherapy treatment plan generation using reinforcement learning Technical Field The present application relates generally to generating radiation therapy treatment plans using reinforcement learning. Background Radiation Therapy Treatment Planning (RTTP) is a complex process that contains specific guidelines, protocols, and instructions employed by different medical professionals, such as clinicians, medical device manufacturers, etc. Typically, the identification and application of guidelines for effecting radiation therapy treatment is performed by a complex computer model that receives treatment targets from the treating physician and identifies the appropriate attributes of the RTTP. For example, the treating physician may identify a treatment modality (e.g., choose between volume modulated arc therapy (volumetric modulated ARC THERAPY, VMAT) or intensity modulated radiation therapy (intensity-modulated radiation therapy, IMRT)). The treating physician may then enter various targets and objectives to be achieved via treatment, such as a dosage target to be achieved for one or more structures of the patient. The software solution may then calculate the attributes of the patient treatment using various methods, such as determining the beam limiting device angle and the radiation emission attributes. In the case of IMRT, the beam transport direction and the number of beams are specific relevant variables that must be decided, whereas for VMAT, a software solution may need to select the number of arcs and their corresponding start and end angles. In personalized radiotherapy plan optimization, the trade-off between achieving and/or assessing target coverage and OAR (organ at risk) preservation is largely dependent on how the cost function (cost function) is constructed. Constructing the cost function may require a situation-specific optimization objective that was not known to the planner prior to optimization. Thus, the planner needs to find the target through an iterative process. Accordingly, even if all components in the plan generation pipeline are fully automated, generating a personalized plan can still be a time consuming and resource intensive process. Disclosure of Invention The computer model may be configured to generate a radiation therapy treatment plan using the cost function to determine a radiation dose distribution among the patient structures. The user may input an initial target for the cost function and the computer model may iteratively adjust the target to identify or determine an optimal dose distribution for treating the patient. This process may involve a significant amount of time and computer resources, depending on how close the initial target is to the optimal dose distribution and/or the number of adjustment iterations the computer model performs in order to identify the optimal dose distribution. A computer implementing the systems and methods described herein may use machine learning techniques and reinforcement learning techniques to improve the efficiency of generating radiation therapy treatment plans. The computer may use a reinforcement learning model and a dose prediction machine learning model to do so. For example, the computer may receive patient treatment attributes (e.g., computed tomography (computed tomography, CT) images, field geometry (field geometry) settings, dose prescription, etc.) and use the treatment attributes as input to a dose prediction machine learning model. The computer may execute a dose predictive machine learning model to generate a predicted three-dimensional dose distribution. The computer may use the predicted three-dimensional dose distribution to create a cost function with weighted targets for different patient structures (e.g., organs, bones, tumors, etc.). The computer may perform or apply an optimization algorithm on the cost function containing the target to generate a first three-dimensional dose distribution that reduces (e.g., minimizes) the cost function to a first generation value that achieves a first three-dimensional dose distribution (e.g., treats the patient using the first three-dimensional dose distribution). The computer may implement a reinforcement learning agent to adjust the target values and/or weights of the cost function to identify optimal targets that may be used to generate an optimal plan for treating the patient. For example, the reinforcement learning agent may determine a difference between the first three-dimensional dose distribution and the predicted three-dimensional dose distribution (e.g., determine a distance using a distance function). The reinforcement learning agent may adjust the target value and/or weight of the cost function based on the difference, such as to bring the three-dimensional dose distribution of the cost function closer to the predicted three-dimensional dose distribution. The reinforcement learning agent may further adjust the target value and/