CN-122023531-A - Three-dimensional positioning method, device, terminal and medium for surgical instrument

CN122023531ACN 122023531 ACN122023531 ACN 122023531ACN-122023531-A

Abstract

The invention provides a three-dimensional positioning method, a device, a terminal and a medium for a surgical instrument, wherein the method comprises the steps of inputting a current frame image and a historical multi-frame image which are acquired by an endoscope camera and comprise the surgical instrument into an instrument key point detection network based on a transducer architecture to obtain two-dimensional key point coordinates of the surgical instrument in the image; based on the two-dimensional key point coordinates in the current frame image and the historical multi-frame image, calculating the three-dimensional key point observation coordinates of the surgical instrument under a camera coordinate system by utilizing a multi-frame triangulation method, and inputting the three-dimensional coordinates into an extended Kalman filtering model for time sequence fusion to obtain the three-dimensional pose estimation of the surgical instrument. According to the invention, three-dimensional positioning of the surgical instrument is realized by fusion of deep learning detection, multi-frame triangulation and extended Kalman filtering, and the high-precision and high-robustness three-dimensional positioning can be maintained in a scene with complex illumination of the abdominal cavity, frequent shielding and rapid instrument movement, and a stereo camera or a marker is not needed.

Inventors

ZHAO YUQI
LIU MINGXIANG
WANG LILI
TAN JIANI
FU MINYUE

Assignees

南方科技大学

Dates

Publication Date: 20260512
Application Date: 20260206

Claims (10)

1. A method of three-dimensional positioning of a surgical instrument, the method comprising: Acquiring a current frame image and a historical multi-frame image which are acquired by an endoscope camera and contain surgical instruments, and inputting the current frame image and the historical multi-frame image into a pre-trained instrument key point detection network to detect two-dimensional key point coordinates of the surgical instruments in the current frame image and two-dimensional key point coordinates of the surgical instruments in the historical multi-frame image, wherein the instrument key point detection network is a neural network model based on a Transformer architecture; Based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the history multi-frame image, calculating the three-dimensional key point observation coordinates of the surgical instrument under a camera coordinate system by utilizing a multi-frame triangulation method; and inputting the three-dimensional key point observation coordinates into a pre-constructed extended Kalman filtering model for time sequence fusion to obtain three-dimensional pose estimation of the surgical instrument, wherein the state quantity of the extended Kalman filtering model comprises the three-dimensional pose of the surgical instrument.
2. The surgical instrument three-dimensional positioning method according to claim 1, wherein the instrument keypoint detection network is a YOLOX-STrans network and the YOLOX-STrans network comprises a convolution-based backbone network, a Swin transducer-embedded feature extraction module, a feature pyramid structure, and a detection head for predicting two-dimensional keypoint coordinates.
3. The method according to claim 1, wherein the calculating the three-dimensional key point observation coordinates of the two-dimensional key point pixel coordinates of the surgical instrument in the camera coordinate system based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the history multi-frame image by using multi-frame triangulation method comprises: calculating camera pose transformation between adjacent time image frames based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the history multi-frame image; Constructing a triangulated projection matrix according to a camera internal reference matrix and the camera pose transformation, wherein the camera internal reference matrix is a matrix determined when the endoscope camera is subjected to internal reference calibration; And performing multi-frame triangulation operation on the two-dimensional key point coordinates of the surgical instrument in the continuous multi-frame images formed by the current frame image and the historical multi-frame images based on the projection matrix to obtain three-dimensional key point observation coordinates of the two-dimensional key point coordinates of the surgical instrument under a camera coordinate system.
4. The method of claim 1, wherein the three-dimensional homogeneous coordinates are solved using linear algebra in performing multi-frame triangulation operations.
5. The method according to claim 1, wherein the extended kalman filter model uses an approximately uniform motion model or a uniform acceleration motion model as a state prediction model.
6. The method for three-dimensional positioning of a surgical instrument according to any one of claims 1 to 5, wherein the performing time-series fusion on the three-dimensional key point observation coordinates by using the extended kalman filter model, after obtaining the three-dimensional pose estimation of the surgical instrument, further comprises: Calculating the expected pose of the endoscope camera based on the three-dimensional pose estimation of the surgical instrument and a preset view angle planning strategy, wherein the preset view angle planning strategy is a strategy for keeping the tail end of the surgical instrument positioned in the central area of the imaging plane of the camera and limiting the included angle between the optical axis of the camera and the axis of the surgical instrument within a preset range; And generating joint control instructions based on the expected pose, and sending the joint control instructions to a surgical robot controller so that the surgical robot controller drives an endoscope mechanical arm according to the joint control instructions to enable the endoscope camera to move to the expected pose.
7. The method of three-dimensional positioning of a surgical instrument according to claim 6, wherein the joint control command includes a joint position control command or a joint velocity control command.
8. A surgical instrument three-dimensional positioning device, the device comprising: The device comprises a two-dimensional key point detection module, a sensor module and a display module, wherein the two-dimensional key point detection module is used for acquiring a current frame image and a historical multi-frame image which are acquired by an endoscope camera and contain a surgical device, inputting the current frame image and the historical multi-frame image into a pre-trained device key point detection network so as to detect two-dimensional key point coordinates of the surgical device in the current frame image and two-dimensional key point coordinates of the surgical device in the historical multi-frame image; the three-dimensional key point calculation module is used for calculating three-dimensional key point observation coordinates of the surgical instrument under a camera coordinate system by utilizing a multi-frame triangulation method based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the history multi-frame image; the three-dimensional pose determining module is used for inputting the three-dimensional key point observation coordinates into a pre-constructed extended Kalman filtering model for time sequence fusion to obtain three-dimensional pose estimation of the surgical instrument, and the state quantity of the extended Kalman filtering model comprises the three-dimensional pose of the surgical instrument.
9. A terminal comprising a memory, a processor and a surgical instrument three-dimensional positioning program stored on the memory and operable on the processor, which when executed by the processor, implements the steps of the surgical instrument three-dimensional positioning method of any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that it stores a computer program executable for implementing the steps of the surgical instrument three-dimensional positioning method according to any one of claims 1 to 7.

Description

Three-dimensional positioning method, device, terminal and medium for surgical instrument Technical Field The invention relates to the technical field of vision and intelligent control of medical robots, in particular to a three-dimensional positioning method, device, terminal and medium for surgical instruments. Background With the development of robot-assisted surgery, the endoscopic vision system has become the only visual source for doctors to observe tissue structures, position surgical instruments, and perform fine operations. The prior art generally relies on manual stepping or manual operation equipment to adjust the position of an endoscope camera so as to keep the visual field in a proper area, but the mode has obvious limitations that doctors need to frequently switch between operating surgical instruments and controlling the endoscope to cause extra cognitive burden, the camera adjustment is delayed or inaccurate, the surgical efficiency is possibly reduced, and the surgical rhythm is disturbed by manual actions to influence the operation continuity. However, although the existing automatic adjustment technology combining the automatic camera adjustment and the intelligent camera following technology can solve the problems existing when the position of the endoscope camera is adjusted by relying on manual stepping or manual operation equipment, the automatic adjustment technology still has the defects that the hardware of a stereoscopic vision scheme is complex and is easy to interfere, a marker method increases sterilization risk and is easy to fall off, single-frame deep learning only obtains two-dimensional positions and is not robust to shielding, and detection jump is caused by lack of a time domain fusion mechanism, so that the control stability of the camera is affected. Disclosure of Invention The invention aims to provide a three-dimensional positioning method, a device, a terminal and a medium for surgical instruments, which can maintain high-precision and high-robustness three-dimensional positioning in a scene with complex illumination of abdominal cavity, frequent shielding and rapid instrument movement, and do not need a stereo camera or a marker. The technical scheme adopted for solving the technical problems is as follows: In a first aspect, the invention discloses a three-dimensional positioning method of a surgical instrument, wherein the method comprises the following steps: Acquiring a current frame image and a historical multi-frame image which are acquired by an endoscope camera and contain surgical instruments, and inputting the current frame image and the historical multi-frame image into a pre-trained instrument key point detection network to detect two-dimensional key point coordinates of the surgical instruments in the current frame image and two-dimensional key point coordinates of the surgical instruments in the historical multi-frame image, wherein the instrument key point detection network is a neural network model based on a Transformer architecture; Based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the history multi-frame image, calculating the three-dimensional key point observation coordinates of the surgical instrument under a camera coordinate system by utilizing a multi-frame triangulation method; and inputting the three-dimensional key point observation coordinates into a pre-constructed extended Kalman filtering model for time sequence fusion to obtain three-dimensional pose estimation of the surgical instrument, wherein the state quantity of the extended Kalman filtering model comprises the three-dimensional pose of the surgical instrument. Optionally, the instrument keypoint detection network is a YOLOX-STrans network and the YOLOX-STrans network comprises a convolution-based backbone network, a Swin Transformer embedded feature extraction module, a feature pyramid structure, and a detection head for predicting two-dimensional keypoint coordinates. Optionally, the calculating, based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the historical multi-frame image, and by using a multi-frame triangulation method, the three-dimensional key point observation coordinates of the two-dimensional key point pixel coordinates of the surgical instrument under a camera coordinate system includes: calculating camera pose transformation between adjacent time image frames based on the two-dimensional key point coordinates of the surgical instrument in the current frame image and the two-dimensional key point coordinates of the surgical instrument in the history multi-frame image; Constructing a triangulated projection matrix according to a camera internal reference matrix and the camera pose transformation, wherein