CN-121986383-A - Potential spatial training of operating room data

CN121986383ACN 121986383 ACN121986383 ACN 121986383ACN-121986383-A

Abstract

Various disclosed embodiments provide systems and methods for training a machine learning system to anticipate semantic features of operating room data even in the absence of fully tagged data. In particular, embodiments may compress and then reconstruct available operating room data, even if the data is unlabeled, and in so doing, teach the encoding and decoding machine learning system to recognize semantic features that are significant to downstream training and applications. In addition, many of the disclosed embodiments accommodate a wide variety of operating room data formats, such as paired visual intensity video operating room data and depth frame video operating room data. Masking, loss, and other disclosed features may further improve the ability of the system to infer semantic dependencies. Using the various disclosed embodiments, machine learning applications may be enabled that would otherwise not be feasible in low data operating room data conditions.

Inventors

M. A. Jamal
O. Mohareri

Assignees

直观外科手术操作公司

Dates

Publication Date: 20260505
Application Date: 20240826
Priority Date: 20230828

Claims (20)

1. A method for preparing a machine learning system configured to receive operating room data, the method comprising: Performing a first training session on the machine learning system with a first set of operating room data, the machine learning system comprising: a first portion configured to create a potential spatial representation from an operating room data input, and A second portion configured to create a reconstructed representation of the operating room data from the potential spatial representation.
2. The method of claim 1, wherein the operating room data input comprises: First operating room data input of a first modality, and And a second operating room data input of a second modality.
3. The method of claim 2, wherein the method further comprises: Masking at least a portion of the first operating room data input, and Masking at least a portion of the second operating room data input.
4. The method of claim 3, wherein, The first modality is operating room global visual intensity image video data, and wherein, The second modality is operating room global depth frame video data.
5. The method of claim 3, wherein, Each of the first modality and the second modality is a different one of the following modalities: operating room global depth data; Global visual intensity data of the operating room; Operating room kinematic data: Operating room hearing data; Operating room system event data; Operating room text data; operating room visual intensity data depicting the interior of a patient, and Operating room depth data, the operating room depth data depicting an interior of a patient.
6. The method of claim 3, wherein, Masking at least the portion of the first operating room data input produces a first masked representation, wherein, The first masking representation comprises a first tensor, at least one dimension of which corresponds to time, wherein, Masking said at least said portion of said second operating room data input produces a second masked representation, and wherein, The second masking representation includes a second tensor, at least one dimension of which corresponds to time.
7. The method of claim 6, wherein performing a first training session comprises: Providing the first masking representation as input to the machine learning system, and The second masking representation is provided as an input to the machine learning system.
8. The method of claim 7, wherein, Masking the at least the portion of the first operating room data input includes masking at least one complete temporal data frame of the first tensor, and wherein, Masking the at least the portion of the second operating room data input includes masking at least one complete time data frame of the second tensor.
9. The method of claim 7, wherein, Masking the at least the portion of the first operating room data input includes masking the same first spatial portion of the first tensor across consecutive temporal data frames of the first tensor, and wherein, Masking the at least the portion of the second operating room data input includes masking the same second spatial portion of the second tensor across consecutive temporal data frames of the second tensor.
10. The method of claim 9, wherein, The first space portion and the second space portion are the same space portion.
11. A method according to claim 3, the method comprising: modifying the machine learning system with one or more neural network layers, and A second training session is performed on the modified machine learning system.
12. The method of claim 11, wherein the second training session is directed to an application involving identifying a contextual state in an operating room.
13. The method of claim 11, wherein the second training session is directed to an application involving detection of objects present in an operating room.
14. The method according to any one of claims 2 to 13, wherein, The first portion of the machine learning system includes an encoder neural network, and wherein, The second portion of the machine learning system includes a decoder neural network.
15. The method of claim 14, wherein the first portion of the machine learning system and the second portion of the machine learning system are part of a same automatic encoder neural network.
16. The method of claim 14, wherein the method further comprises: determining a total loss for updating the machine learning system, wherein determining the total loss comprises determining: a first difference between the first operating room input and the first instance of reconstruction operating room data, and A second difference between the second operating room input and a second reconstructed operating room data instance.
17. The method of claim 16, wherein determining the total loss further comprises: determining a contrast loss based at least in part on a difference determined from: A first potential spatial representation of the first operating room data input, and A second potential spatial representation of the second operating room data input.
18. The method of claim 17, wherein determining the contrast loss comprises: determining a first average value of the first potential spatial representation; Determining a second average value of the second potential spatial representation, and The contrast loss is determined based on the first average and the second average.
19. The method of claim 18, wherein determining the total loss further comprises: Determining a match loss, wherein determining the match loss comprises: a binary indication of whether the first and second potential spatial representations correspond is determined.
20. The method of claim 19, wherein determining the binary indication comprises: A SoftMax classifier is applied, which itself is updated based at least in part on the total loss.

Description

Potential spatial training of operating room data Cross Reference to Related Applications The present application claims the benefit and priority of U.S. provisional application No. 63/535,060 entitled "potential spatial training of operating room data (LATENT SPACE TRAINING FOR SURGICAL THEATER DATA)" filed on month 28 of 2023, and is incorporated herein by reference in its entirety FOR all purposes. Technical Field Various disclosed embodiments relate to systems and methods for improving machine learning system training based on operating room data. Background Recent advances in machine learning, and particularly deep learning, have brought great promise for a wide variety of operating room applications. In combination with data acquisition from sensors in the operating room, such applications can identify or predict adverse configurations in the operating room, identify or predict adverse patient states, provide guidance for improving team workflow, identify operating room states in a preferred taxonomy of state characterization, provide comparative analysis with other hospitals and operating teams, provide guidance for teams transitioning from a non-robotic operating room to a robotic operating room, and vice versa, and the like. These applications not only improve patient prognosis, but also increase the efficiency of the surgical team, help reduce costs, and make healthcare more predictable, consistent, and cost-effective. Unfortunately, the unique conditions of the operating room and the richness of the collected data often make manual tagging and annotation of data impractical. The lack of such marker data in turn complicates the training of the machine learning system. For example, if a machine learning system is desired to identify one of a plurality of surgical tasks from operating room data (such as surgical video), a trained expert annotator familiar with the details of the surgery and tasks is obliged to manually review the operating room data and manually annotate each time slice as corresponding to one of the plurality of tasks. Naturally, annotation of such people in loops risks changes in the annotator's subjective signature, is limited by the fatigue of the annotators, and is limited by the number of professionally trained annotators available to review and sign such data. In fact, the situation becomes more complex due to the multi-modal nature of many operating room data. It has been difficult enough to require specialized human annotators to segment only surgical video data into discrete taxonomic states because video, unlike still images, has dense time-varying information. Such professional annotators are additionally required to annotate different auditory data, kinematic data, depth data, etc. acquired in parallel with video annotation in an operating room, while also identifying the unique features of those different types of data modalities, which may not be possible at all. This is particularly unfortunate, as such information-intensive data types may be particularly useful for machine learning. Many of the above-described applications require or greatly benefit from the ability to perform subsequent online training of the machine learning system as additional operating room data becomes available, even when an annotator seeks to accomplish the difficult task of annotating a sufficient amount of operating room data for training of the machine learning system. Such new data may be specific to particular conditions of a healthcare environment in which the machine learning system is now deployed (e.g., a particular hospital or operating room in which the system has been deployed). Thus, training of this new data can greatly facilitate localization of the deployed machine learning system to specific features of its local environment. Unfortunately, few hospitals have resources to annotate these newly acquired data again for additional training of the machine learning system. Accordingly, there is a need for systems and methods that overcome challenges and difficulties such as those described above. For example, systems are needed that facilitate various downstream operating room analysis applications, rather than always requiring heavy participation by a large number of human annotators. Drawings The various embodiments herein may be better understood by reference to the following detailed description in conjunction with the accompanying drawings, in which like reference numerals identify identical or functionally-similar elements: FIG. 1A is a schematic view of various elements that may occur in some embodiments that occur in an operating room during a surgical procedure; FIG. 1B is a schematic diagram of various elements that may occur in some embodiments that occur in an operating room during a surgical procedure employing a robotic surgical system; FIG. 2A is a schematic depth map rendered from an example operating room global sensor perspective that may be used in some embodime