CN-121983059-A - Automatic acquisition and storage method and system for bronchus image data based on voice interaction

CN121983059ACN 121983059 ACN121983059 ACN 121983059ACN-121983059-A

Abstract

The invention discloses a bronchus image data automatic acquisition and storage method and system based on voice interaction, and relates to the technical field of intelligent voice interaction. The method comprises the steps of collecting operator voice signals, inputting the voice signals into a pre-training voice recognition model to generate a text sequence, carrying out word segmentation and keyword extraction on the text sequence to obtain instruction keywords, matching the instruction keywords with a preset instruction template library, determining control instructions including image collection type instructions and image labeling instructions, processing real-time video streams according to the control instructions, and storing processing results in a bronchus image database. By introducing voice interaction, automatic control of image acquisition, labeling and storage is realized, dependence of manual operation and personnel cooperation is reduced, an operator can directly send out instructions in the operation process to complete image management, labor cost is effectively saved, and image acquisition efficiency is improved.

Inventors

TANG FEI
ZHAO DAHAI
ZHA XIANKUI
LV XIAOMEI

Assignees

安徽省胸科医院

Dates

Publication Date: 20260505
Application Date: 20260228

Claims (10)

1. The automatic acquisition and storage method for the bronchus image data based on voice interaction is characterized by comprising the following steps of: collecting voice signals sent by an operator; inputting the voice signal into a pre-trained voice recognition model, and outputting a corresponding text sequence; performing word segmentation and keyword extraction on the text sequence to obtain instruction keywords; matching the instruction keywords with a preset instruction template library to determine corresponding control instructions, wherein the control instructions comprise image acquisition type instructions and/or image labeling instructions; And processing the real-time video stream according to the control instruction, and storing the processing result in a preset bronchus image database.
2. The method for automatically acquiring and storing bronchus image data based on voice interaction according to claim 1, wherein the image acquisition type instruction comprises a screenshot instruction, the screenshot instruction triggers a screenshot and enhancement processing process, the enhancement processing process is implemented through a pre-trained image enhancement model, and the image enhancement model comprises: the color space conversion module is used for converting an input image into a YUV color space to obtain a Y-channel image, a U-channel image and a V-channel image; A brightness processing branch, which is used for processing the Y channel image to obtain brightness perception characteristics F Y ; the chrominance processing branch is used for processing the U-channel image and the V-channel image to obtain chrominance perception characteristics F UV ; The fusion generation network is used for processing the brightness perception feature and the chromaticity perception feature to obtain an enhanced YUV image; And the color space inverse conversion module is used for converting the YUV image back to an RGB space to obtain a final enhanced image.
3. The automatic acquisition and storage method of bronchial image data based on voice interaction according to claim 2, wherein the brightness processing branches comprise an initial convolution layer, a global feature extraction module, a local feature extraction module, an adaptive fusion module and a first branch supervision module, wherein: The initial convolution layer processes an input Y-channel image by adopting a layer of 1 multiplied by 1 convolution, expands the channel dimension and obtains a characteristic diagram Y 1 ; the global feature extraction module adopts two Swin converterlers to carry out global feature extraction on the feature map Y 1 to obtain global features Y 2 ; the local feature extraction module adopts four residual blocks to perform feature extraction on the feature map Y 1 to obtain local features Y 3 ; The self-adaptive fusion module performs weighted summation on the global feature Y 2 and the local feature Y 3 to obtain a feature map Y 4 ; The first branch supervision module adopts a residual attention mechanism to fuse the input Y-channel image with the feature map Y 4 so as to obtain a brightness perception feature F Y .
4. The method for automatically acquiring and storing bronchus image data based on voice interaction according to claim 3, wherein the operation procedure of the first branch supervision module comprises: Processing the feature map Y 4 by adopting a layer of 3X 3 convolution, and compressing the channel dimension to obtain a feature map F 1 ; After adding the feature map F 1 to the Y-channel image element by element, generating an attention weight map A 1 through a layer of 3X 3 convolution and sigmoid function; Processing the feature map Y 4 by adopting a layer of 3X 3 convolution to obtain a feature map F 2 ; Weighting the feature map F 2 by using the attention weight map a 1 to obtain a feature map F 3 ; processing the input Y-channel image by adopting a layer of 3×3 convolution, and expanding the channel dimension to obtain a feature map F 4 ; The feature maps F 3 、F 4 and Y 4 are added element by element to obtain a luminance perception feature F Y .
5. The method for automatically acquiring and storing bronchus image data based on voice interaction according to claim 2, wherein the chrominance processing branch comprises a first feature denoising module, a second feature denoising module, a detail enhancement module and a second branch supervision module, wherein: The first characteristic denoising module and the second characteristic denoising module have the same structure, and process the U channel image and the V channel image by adopting a multi-layer convolution and attention mechanism in parallel to obtain characteristic diagrams U 1 and V 1 ; the detail enhancing module converts the characteristic diagram T 1 into frequency domain characteristics by adopting discrete wavelet transformation, enhances the frequency domain characteristics by adopting a residual channel attention module, and reconstructs the enhanced frequency domain characteristics back to a space domain to obtain a characteristic diagram T 2 ; The second branch supervision module adopts a residual attention mechanism to fuse the input UV channel image with the characteristic diagram T 2 to obtain a chromaticity perception characteristic F UV , and the second branch supervision module is consistent with the first branch supervision module in structure.
6. The method for automatically acquiring and storing bronchus image data based on voice interaction according to claim 5, wherein the operation process of any one of the feature denoising modules comprises: processing the input image by adopting four layers of 3×3 convolution to obtain a multi-scale feature map X 1 、X 2 、X 3 、X 4 , wherein the convolution step length from the second layer to the fourth layer is 2; Processing the feature map X 4 by adopting a multi-head attention mechanism, modeling global dependence, and obtaining a feature map X 5 ; Adding the feature maps X 1 、X 2 、X 3 and X 5 element by element through upsampling to obtain a feature map X 6 ; Processing X 6 by adopting a layer of 3X 3 convolution, and carrying out residual connection to obtain a characteristic diagram X 7 ; And processing the characteristic diagram X 7 by adopting two layers of 3X 3 convolution to obtain a denoised characteristic diagram X 8 and outputting X 8 .
7. The method for automatically acquiring and storing bronchus image data based on voice interaction according to claim 5, wherein the operation process of the detail enhancing module comprises: performing three-level discrete wavelet transformation on the input characteristic diagram T 1 to obtain a sub-band of each level; Processing the sub-bands of each stage by adopting a residual channel attention module to obtain an enhanced frequency domain sub-band; performing inverse discrete wavelet transform on the frequency domain sub-band strengthened by the third stage to obtain a strengthened sub-band ; For strengthening sub-bands Performing inverse discrete wavelet transform on the second-stage reinforced frequency domain sub-band to obtain a reinforced sub-band ; For strengthening sub-bands Performing inverse discrete wavelet transform on the first-stage enhanced frequency domain sub-band to obtain a spatial domain characteristic T 0 ; and processing the spatial domain feature T 0 by adopting a residual channel attention module to obtain a feature map T 1 .
8. The method for automatically collecting and storing bronchus image data based on voice interaction according to claim 2, wherein the fusion generation network comprises a fusion module and a generation module, wherein: The fusion module adopts a multi-level attention mechanism to interactively fuse the brightness perception feature F Y and the chromaticity perception feature F UV to obtain a feature map Z 1 ; the generating module adopts a plurality of layers of 3X 3 convolution layers to extract the characteristics of the characteristic diagram Z 1 to obtain a characteristic diagram Z 2 , then adopts a layer of 3X 3 convolution layers and a sigmoid function to compress the number of channels of the characteristic diagram Z 2 to 3, and carries out inverse normalization to obtain an enhanced YUV image.
9. The automatic acquisition and storage method for bronchus image data based on voice interaction according to claim 8, wherein the operation process of the fusion module comprises: Respectively carrying out 3×3 convolution on the luminance perception feature F Y and the chrominance perception feature F UV to adjust the channel number to obtain feature graphs S Y and S UV ; Adding the feature maps S Y and S UV element by element to obtain an initial fusion feature S 1 ; Performing channel attention and spatial attention on the initial fusion feature S 1 in parallel to obtain a channel attention feature S 2 and a spatial attention feature S 3 ; Combining the channel attention feature S 2 and the spatial attention feature S 3 with the initial fusion feature S 1 respectively, and distributing independent weights to each pixel position through pixel attention to obtain a channel attention weight W 1 and a spatial attention weight W 2 ; Dynamically fusing the channel attention weight W 1 and the space attention weight W 2 through the learnable parameters to obtain a final refined attention weight W 3 ; And adjusting the contribution ratio of the brightness and chromaticity characteristics by adopting the attention weight W 3 , and combining residual connection to obtain a final fusion characteristic Z 1 .
10. A bronchus image data automatic acquisition and storage system based on voice interaction, the system comprising: the voice acquisition module is used for acquiring voice signals sent by an operator; The voice recognition module is used for inputting the voice signal into a pre-trained voice recognition model and outputting a corresponding text sequence; the semantic analysis module is used for carrying out word segmentation and keyword extraction on the text sequence to obtain instruction keywords; The instruction matching module is used for matching the instruction keywords with a preset instruction template library to determine corresponding control instructions, wherein the control instructions comprise image acquisition type instructions and/or image labeling instructions; and the instruction execution module is used for processing the real-time video stream according to the control instruction and storing the processing result in a preset bronchus image database.

Description

Automatic acquisition and storage method and system for bronchus image data based on voice interaction Technical Field The invention relates to the technical field of intelligent voice interaction, in particular to a method and a system for automatically acquiring and storing bronchus image data based on voice interaction. Background Bronchoscopy and treatment are important means for diagnosis and intervention of respiratory diseases, and in clinical process, real-time observation of structures in bronchi cavities, lesion forms and operation processes is usually required, and key images are collected, marked and stored for subsequent diagnosis analysis, case archiving and teaching research. At present, a cable transmission mode is adopted for clinical bronchus image acquisition, cables are scattered vertically, the problems of poor contact of joints, signal interruption and the like are easy to occur, and then image storage failure and data loss are caused. In the acquisition operation, the existing equipment mainly depends on manual keys or touch screens of operators to complete control, and clinical procedures generally need cooperation of two medical staff, wherein one medical staff is responsible for bronchoscope intervention operation, and the other staff performs works such as video starting and stopping, image screenshot, data storage, labeling and the like at a computer end. Although the mode can lighten the workload of a main operation doctor, the links of image acquisition, labeling and storage are highly dependent on manual intervention, so that the labor cost is increased, the degree of automation and intellectualization is low, and the problems of complex operation, redundant flow, low acquisition efficiency and the like exist. . Disclosure of Invention The invention aims to solve the problem that the image acquisition, processing and storage processes in the bronchoscopy process depend on manual operation, automation and intelligent degree in the background art, and provides a method and a system for automatically acquiring and storing bronchus image data based on voice interaction. According to a first aspect of the invention, a method for automatically collecting and storing bronchus image data based on voice interaction is provided, and the method comprises the following steps: collecting voice signals sent by an operator; inputting the voice signal into a pre-trained voice recognition model, and outputting a corresponding text sequence; performing word segmentation and keyword extraction on the text sequence to obtain instruction keywords; matching the instruction keywords with a preset instruction template library to determine corresponding control instructions, wherein the control instructions comprise image acquisition type instructions and/or image labeling instructions; And processing the real-time video stream according to the control instruction, and storing the processing result in a preset bronchus image database. By implementing the technical scheme, the automatic control of the acquisition, labeling and storage processes of the bronchoscope images is realized, the requirements of manual intervention and cooperation of operators are reduced, and the clinical acquisition efficiency is improved. Optionally, the image acquisition type instruction includes a screenshot instruction, the screenshot instruction triggers a screenshot and enhancement processing process, the enhancement processing process is implemented through a pre-trained image enhancement model, and the image enhancement model includes: the color space conversion module is used for converting an input image into a YUV color space to obtain a Y-channel image, a U-channel image and a V-channel image; A brightness processing branch, which is used for processing the Y channel image to obtain brightness perception characteristics F Y; the chrominance processing branch is used for processing the U-channel image and the V-channel image to obtain chrominance perception characteristics F UV; The fusion generation network is used for processing the brightness perception feature and the chromaticity perception feature to obtain an enhanced YUV image; And the color space inverse conversion module is used for converting the YUV image back to an RGB space to obtain a final enhanced image. By implementing the technical scheme, the natural restoration of the color can be maintained while the brightness, contrast and detail definition are improved, and the color distortion problem possibly caused by the traditional brightness enhancement is improved. Optionally, the brightness processing branch includes an initial convolution layer, a global feature extraction module, a local feature extraction module, an adaptive fusion module, and a first branch supervision module, where: The initial convolution layer processes an input Y-channel image by adopting a layer of 1 multiplied by 1 convolution, expands the channel dimension and obtains a c