CN-122005085-A - Multi-mode fusion puncture tissue characteristic identification system

CN122005085ACN 122005085 ACN122005085 ACN 122005085ACN-122005085-A

Abstract

The application provides a multi-mode fusion puncture tissue characteristic identification system, which comprises a model training module. The method comprises the steps of obtaining unlabeled force, vibration and impedance signals and sample data with rough tissue type labels, training an encoder network through cross-modal comparison learning, generating a robust fusion characteristic representation of irrelevant modes by taking different modal signal characteristics at the same moment as positive samples and different moments as negative samples, determining causal and non-causal characteristic subsets through a causal discovery algorithm, constructing a regularized loss function based on causal knowledge according to the causal fusion characteristic representation, comprising two items enabling causal attention vectors to approach 1 in causal characteristic subset dimension weight and approach 0 in non-causal characteristic subset dimension weight, and finally performing supervision fine tuning on the encoder network through relevant data to generate a tissue recognition model. The application can construct a tissue recognition model with high precision and high generalization capability without massive accurate labeling data, and solves the problem of difficult model training application caused by difficult acquisition of high-quality labeling data.

Inventors

ZHANG CONGHUI
CUI HONGPING

Assignees

上海市东方医院（同济大学附属东方医院）

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. A multi-mode fusion puncture tissue characteristic recognition system is characterized by comprising a model training module, The model training module comprises the steps of acquiring unlabeled force signals, unlabeled vibration signals, unlabeled impedance signals and sample data with rough tissue type labels; according to the force signal, the vibration signal and the impedance signal, a cross-modal contrast learning encoder network is used for generating a robust fusion characteristic representation irrelevant to modes, wherein the cross-modal contrast learning is used for acquiring positive samples of the modal characteristics corresponding to the force signal, the vibration signal and the impedance signal at the same moment, and the modal characteristics corresponding to the signals acquired at different moments are negative samples; determining, by a causal discovery algorithm, a causal feature subset and a non-causal feature subset from the causal feature independent robust fusion feature representation, the causal feature subset consisting of feature dimensions having causal links to tissue types, the non-causal feature subset consisting of non-causal link feature dimensions, constructing a causal knowledge based regularization loss function from the causal feature subset and the non-causal feature subset, wherein the causal knowledge based regularization loss function comprises a first regularization term for approximating a weight of a learnable causal attention vector on a corresponding dimension of the causal feature subset to 1, and a second regularization term for approximating a weight of the learnable causal attention vector on a corresponding dimension of the non-causal feature subset to 0, performing a fine-tuning of the encoder network from the sample data with coarse tissue type tags, the causal knowledge based regularization loss function, and the learnable causal attention vector, a tissue recognition model is generated.
2. The system of claim 1, wherein acquiring an unlabeled force signal, an unlabeled vibration signal, an unlabeled impedance signal comprises: acquiring the label-free force signal acquired by an optical fiber Fabry-Perot sensor, wherein the optical fiber Fabry-Perot sensor is integrated at a first distance from the side wall of the puncture needle to the needle point; Acquiring the label-free vibration signal excited and acquired by a piezoelectric ceramic element, wherein the piezoelectric ceramic element is integrated at a second distance from the side wall of the puncture needle to the needle tip, and the second distance is larger than the first distance; acquiring the label-free impedance signal obtained by measurement of a microelectrode pair, wherein the microelectrode pair comprises a working electrode integrated on the needle point of a puncture needle and a counter electrode positioned at a third distance behind the working electrode; Wherein the piezoelectric ceramic element is positioned between the optical fiber Fabry-Perot sensor and the microelectrode pair.
3. The system of claim 1, wherein training the encoder network by cross-modal contrast learning comprises: performing time resampling and amplitude scaling on the force signal to obtain an enhanced force signal; and performing the cross-modal contrast learning based on the enhanced force signal, the vibration signal, and the impedance signal to train the encoder network.
4. The system of claim 1, wherein training the encoder network by cross-modal contrast learning comprises: Broadband noise with preset signal-to-noise ratio is added to the vibration signal to obtain an enhanced vibration signal; the cross-modal contrast learning is performed to train the encoder network based on the force signal, the vibration signal, and the impedance signal.
5. The system of claim 1, wherein the encoder network comprises: the first one-dimensional convolution layer, the second one-dimensional convolution layer, the third one-dimensional convolution layer, the global average pooling layer and the multi-layer perceptron projection head are sequentially connected; The number of convolution kernels of the first one-dimensional convolution layer, the second one-dimensional convolution layer and the third one-dimensional convolution layer is sequentially increased, and the sizes of the convolution kernels of the first one-dimensional convolution layer, the second one-dimensional convolution layer and the third one-dimensional convolution layer are sequentially decreased.
6. The system of claim 1, wherein determining the causal feature subset and the non-causal feature subset by a causal discovery algorithm comprises: Performing conditional independence test on different feature dimensions of the modality-independent robust fusion feature representation by adopting substitution test based on Hilbert-Schmitt independence criterion; and determining the causal feature subset and the non-causal feature subset according to the result of the conditional independence test.
7. The system of claim 1, wherein constructing a causal knowledge-based regularized loss function comprises: constructing the first regularization term, wherein the first regularization term is a norm of a difference between weights of the learnable causal attention vectors in corresponding dimensions of the causal feature subset and a first numerical value, and the first numerical value is 1; Constructing a second regularization term, wherein the second regularization term is a norm of a difference between weights of the learnable causal attention vectors in corresponding dimensions of the non-causal feature subset and a second numerical value, and the second numerical value is 0; And constructing the regularization loss function based on causal knowledge according to the first regularization term and the second regularization term.
8. The system of claim 1, further comprising: Performing real-time recognition according to the tissue recognition model, including: Intercepting a force signal, a vibration signal and an impedance signal which are acquired in real time by a sliding window with a fixed length; Inputting the force signal, the vibration signal, and the impedance signal intercepted by the sliding window into the tissue identification model; wherein the step size of the sliding window is far smaller than the length of the sliding window.
9. The system of claim 1, wherein training an encoder network by cross-modal contrast learning from the force signal, the vibration signal, and the impedance signal comprises: Performing time-frequency transformation on the vibration signal and the impedance signal to generate vibration time-frequency diagram representation and impedance time-frequency diagram representation; Splicing the vibration time-frequency diagram representation, the impedance time-frequency diagram representation and the time sequence representation of the force signal in a channel dimension to generate a fusion multi-channel signal; Inputting the fusion multichannel signal into the encoder network to perform the cross-modal contrast learning.
10. The system of claim 8, wherein real-time recognition is based on the tissue recognition model, further comprising: Acquiring the recognition results and the recognition confidence of the tissue recognition model on the signals in the continuous sliding windows; when the identification result is a specific high-risk tissue type and the identification confidence coefficient exceeds a preset threshold value, generating a safety intervention instruction; The safety intervention instruction comprises a visual early warning instruction sent to the augmented reality equipment and a flexible control instruction or a soft locking instruction sent to the surgical robot motion controller.

Description

Multi-mode fusion puncture tissue characteristic identification system Technical Field The application belongs to the field of ophthalmic characteristic identification, and particularly relates to a multi-mode fusion puncture tissue characteristic identification system. Background In the fine ophthalmic puncture operations such as glaucoma drainage valve implantation, vitreous cavity injection, intraocular tumor biopsy and the like, the prior art aims at realizing the identification of different biological tissue characteristics (such as sclera, ciliary body, vitreous body, retina and the like) on a puncture path by combining preoperative planning with intraoperative sensing so as to assist in surgical navigation and improve surgical safety. The current clinical practice mainly depends on two parts, namely static path planning based on images such as preoperative Optical Coherence Tomography (OCT), ultrasonic Biological Microscope (UBM) and the like, and empirical subjective judgment of the operator on tissue resistance change encountered by the puncture needle by means of hand feeling in the operation process. The preoperative image is used to provide a "static map" of tissue structure for the procedure for planning the penetration path. During the operation, the operator perceives and interprets the change in the force signal by tactile feedback of the hand-held lancet, thereby deducing the type of tissue that the needle tip is likely to contact. In addition, there have been studies attempting to quantify the "feel" of the operator, by integrating force sensors on surgical instruments (e.g., lancets), converting tissue resistance into a one-dimensional force signal curve for analysis, and attempting to provide an objective basis for tissue identification. However, in the process of achieving tissue identification in the above manner, there are significant drawbacks: First, the preoperative image is static, and can not reflect the dynamic deformation of tissues caused by the intervention of instruments and the change of intraocular pressure in the operation, so that the navigation accuracy is reduced. Second, depending on a single sensing mode, i.e., a one-dimensional force signal, the information dimension is severely insufficient, and different tissues may exhibit similar mechanical properties (such as hardness), so that the system is difficult to distinguish specifically and early warn accurately. Thirdly, the artificial intelligent model for deep learning and the like has strong multi-mode information fusion and mode recognition potential, but the supervised training of the artificial intelligent model is seriously dependent on massive and high-precision pairing labeling samples of 'input signal-output organization types'. In a real ophthalmic puncture surgery scene, an 'input signal' is multi-mode time sequence data such as force, vibration, impedance and the like generated in real time in surgery, and a corresponding 'output label' (namely the accurate tissue type contacted by a needle point) almost cannot carry out 'gold standard' level in-vivo verification and marking on the premise of not interfering surgery and not increasing risks. Therefore, the extreme lack of high quality training data has become a fundamental obstacle to applying artificial intelligence techniques to this field to achieve accurate, objective, real-time tissue recognition. Disclosure of Invention The application aims to overcome the defects in the prior art and provide a multi-mode fusion puncture tissue characteristic identification system. The application provides a multi-mode fusion puncture tissue characteristic recognition system, which comprises a model training module, The model training module comprises the steps of acquiring unlabeled force signals, unlabeled vibration signals, unlabeled impedance signals and sample data with rough tissue type labels; according to the force signal, the vibration signal and the impedance signal, a cross-modal contrast learning encoder network is used for generating a robust fusion characteristic representation irrelevant to modes, wherein the cross-modal contrast learning is used for acquiring positive samples of the modal characteristics corresponding to the force signal, the vibration signal and the impedance signal at the same moment, and the modal characteristics corresponding to the signals acquired at different moments are negative samples; determining, by a causal discovery algorithm, a causal feature subset and a non-causal feature subset from the causal feature independent robust fusion feature representation, the causal feature subset consisting of feature dimensions having causal links to tissue types, the non-causal feature subset consisting of non-causal link feature dimensions, constructing a causal knowledge based regularization loss function from the causal feature subset and the non-causal feature subset, wherein the causal knowledge based regularization loss function