KR-102961402-B1 - METHOD AND APPARATUS FOR RECOGNIZING SURGICAL PHASES BASED ON VISUAL MULTIMODALITY

KR102961402B1KR 102961402 B1KR102961402 B1KR 102961402B1KR-102961402-B1

Abstract

A method and apparatus for recognizing surgical steps based on visual multiple modalities are disclosed. A method for recognizing surgical steps based on visual multiple modalities according to one embodiment of the present disclosure may include: a step of extracting a plurality of visual kinematics-based indices based on a surgical image composed of a plurality of frames corresponding to a plurality of surgical steps; a step of acquiring first feature data for said surgical image and acquiring second feature data for said plurality of visual kinematics-based indices; a step of acquiring fused third feature data by applying a fusion module trained to fuse data to said connected first feature data and said second feature data; and a step of training a first artificial intelligence (AI) model to recognize each of said plurality of surgical steps based on said third feature data.

Inventors

박보규
지현규
박보경
이지원
최민국

Assignees

(주)휴톰

Dates

Publication Date: 20260511
Application Date: 20221122

Claims (10)

A method for recognizing surgical steps based on visual multiple modalities performed by a device, wherein the method comprises: A step of extracting multiple visual kinematics-based indices based on a surgical video composed of multiple frames corresponding to multiple surgical stages; A step of acquiring first feature data for the above surgical image and acquiring second feature data for the plurality of visual kinematics-based indices; A step of obtaining fused third feature data by applying a fusion module trained to fuse data to the first feature data and the second feature data; and It includes the step of training a first artificial intelligence (AI) model to recognize each of the plurality of surgical steps based on the third feature data above, and The step of extracting the above plurality of visual kinematics-based indices is, A step of acquiring semantic segmentation mask data by inputting the surgical image composed of the plurality of frames into a second AI model trained to perform a semantic segmentation algorithm; and A method comprising the step of extracting a plurality of visual kinematics-based indices from semantic segmentation mask data corresponding to one or more surgical instruments included in the surgical image among the semantic segmentation mask data.
delete
In paragraph 1, A method in which the plurality of visual kinematics-based indices include movement and interrelationship information of one or more surgical tools.
In paragraph 3, The step of acquiring the first feature data and the second feature data is The method includes the step of inputting each of the above surgical image and the plurality of visual kinematics-based indices into a third AI model to obtain the first feature data and the second feature data, The above third AI model comprises at least one of a transformer, a CNN (convolutional neural network) model, and an LSTM (long short term memory) model.
In paragraph 4, The step of acquiring the above-mentioned third feature data is, A step of concatenating the first feature data and the second feature data; and The method includes the step of obtaining the third feature data by applying the fusion module to the connected first feature data and the second feature data. A method comprising a fusion module, wherein the fusion module comprises a multi-layer perceptron-based fusion module.
In paragraph 4, The above fusion module is, A stop-gradient algorithm is applied to the first feature data and the second feature data to obtain reinforcement data for strengthening the interaction between the first feature data and the second feature data, and A method for obtaining the third feature data by performing a convolution operation on the above reinforcement data.
In paragraph 6, A method further comprising the step of calculating a user’s surgical skill score of at least one surgical tool based on the path and movement pattern of the movement of at least one surgical tool associated with the plurality of visual kinematics-based indices.
In Paragraph 7, The first AI model trained based on the third feature data above is, A method for outputting information about a surgical step represented by a specific frame based on the input of a specific frame of another surgical image by the above device.
In a device for recognizing surgical steps based on visual multiple modality, the device comprises: One or more communication modules; One or more memories; and Includes one or more processors, The above one or more processors extract a plurality of visual kinematics-based indices based on a surgical image composed of a plurality of frames corresponding to a plurality of surgical steps, acquire first feature data for the surgical image, acquire second feature data for the plurality of visual kinematics-based indices, acquire fused third feature data by applying a fusion module trained to fuse data to the first feature data and the second feature data, and train a first artificial intelligence (AI) model to recognize each of the plurality of surgical steps based on the third feature data. The device comprises one or more processors, wherein, when extracting the plurality of visual kinematics-based indices, the surgical image composed of the plurality of frames is input into a second AI model trained to perform a semantic segmentation algorithm to obtain semantic segmentation mask data, and the plurality of visual kinematics-based indices are extracted from the semantic segmentation mask data corresponding to one or more surgical instruments included in the surgical image.
A computer program stored on a computer-readable recording medium to execute a method for recognizing surgical steps based on the visual multiple modality of any one of claims 1 and 3 through 8, combined with a hardware device.

Description

Method and apparatus for recognizing surgical phases based on visual multimodality The present disclosure relates to a method and apparatus for recognizing surgical steps. More specifically, the present disclosure relates to a method and apparatus for recognizing surgical steps based on visual multiple modalities. Accurate recognition and analysis of surgical stages can optimize the progress of the surgery by enabling efficient communication and accurate situational judgment between the parties involved. In addition, accurately recognizing surgical stages can be useful when monitoring the patient after surgery and when classifying general surgical procedures to provide educational materials. However, recognizing surgical stages is a challenging task that involves the interaction of surgical instruments, organs within the operating area, and activities such as camera cleaning and bleeding management. While technologies for automatically recognizing surgical stages by analyzing surgical images have been studied previously, they had limitations in that they could not account for all the aforementioned interactions related to the surgical stages. FIG. 1 is a schematic diagram of a system for implementing a method for recognizing surgical steps based on visual multiple modalities according to one embodiment of the present disclosure. FIG. 2 is a block diagram illustrating the configuration of a device for recognizing surgical steps based on visual multiple modalities according to one embodiment of the present disclosure. FIG. 3 is a flowchart illustrating a method for recognizing surgical steps based on visual multiple modalities according to one embodiment of the present disclosure. Figure 4 is a diagram showing the overall structure of a method for recognizing surgical steps based on visual multiple modalities. FIG. 5 is a diagram illustrating a process for extracting feature data for a surgical image to recognize a surgical step according to one embodiment of the present disclosure. FIG. 6 is a diagram illustrating the process of extracting third feature data through a fusion module according to one embodiment of the present disclosure. FIG. 7 is a diagram illustrating the process of a device recognizing surgical steps through a trained AI model according to one embodiment of the present disclosure. Throughout this disclosure, the same reference numerals denote the same components. This disclosure does not describe all elements of the embodiments, and general content in the art to which this disclosure pertains or content that overlaps between embodiments is omitted. The terms 'part, module, component, block' as used in the specification may be implemented in software or hardware, and depending on the embodiments, a plurality of 'parts, modules, components, blocks' may be implemented as a single component, or a single 'part, module, component, block' may include a plurality of components. Throughout the specification, when a part is described as being "connected" to another part, this includes not only cases where they are directly connected but also cases where they are indirectly connected, and indirect connections include connections made via a wireless communication network. Furthermore, when it is stated that a part "includes" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Throughout the specification, when it is stated that a component is located "on" another component, this includes not only cases where a component is in contact with another component, but also cases where another component exists between the two components. The terms first, second, etc. are used to distinguish one component from another, and the components are not limited by the aforementioned terms. Singular expressions include plural expressions unless there is an obvious exception in the context. In each step, identification codes are used for convenience of explanation and do not describe the order of the steps; the steps may be performed differently from the specified order unless a specific order is clearly indicated in the context. The operating principles and embodiments of the present disclosure will be described below with reference to the attached drawings. In this specification, the term "device according to the present disclosure" includes all various devices capable of performing computational processing and providing results to a user. For example, the device according to the present disclosure may include all of a computer, a server device, and a portable terminal, or may be in the form of any one of these. Here, the computer may include, for example, a notebook, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser. The above server device is a server that processes information by communicating with an external device, and may include an application server, a computing server, a database server, a