CN-122023908-A - Chinese herbal medicine image recognition method based on triple receptive fields

CN122023908ACN 122023908 ACN122023908 ACN 122023908ACN-122023908-A

Abstract

The invention relates to the technical field of image recognition and discloses a triple receptive field-based Chinese herbal medicine image recognition method which comprises the following steps of performing data enhancement processing on Chinese herbal medicine images, constructing a multi-level data enhancement system through geometric transformation, color adjustment spatial domain and frequency domain combined processing based on Fourier transformation, constructing a triple receptive field module, a deep triple receptive field neural network model and training a loss function of a neural network. The triple receptive field depth model combines intensive residual block learning, attention mechanism and frequency domain feature learning, and can synchronously capture micro textures and macro forms of Chinese herbal medicine images through feature collection of various different receptive fields, so that low-level details and high-level semantic information are effectively integrated, adaptability of the model to scale change is enhanced, feature learning capability of the model is enhanced, and accuracy of Chinese herbal medicine image recognition is improved.

Inventors

XIAO QINGGUO
HAN ZHIYUAN
LIU NING
ZHANG XIANCONG
XU MENG
CHEN BOWEN
ZHANG HAN

Assignees

临沂大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (4)

1. The Chinese herbal medicine image recognition method based on the triple receptive fields is characterized by comprising the following steps of: s1, performing data enhancement processing on a Chinese herbal medicine image, and constructing a multi-level data enhancement system through geometric transformation, color adjustment spatial domain and frequency domain joint processing based on Fourier transformation; converting the image from a space domain to a frequency domain, extracting the frequency and information of the image, and calculating the following formula: (1) Wherein, the As a result of the frequency spectrum of the fourier transform, A sequence of images is input for the spatial domain, Is a discrete pixel value of the spatial domain image, Is a frequency index in the frequency domain, takes on a value of Different from Corresponding to the different frequency components of the antenna, For the total number of pixels, Is an imaginary unit; S2, constructing a triple receptive field module, which comprises the following steps: s2.1, local receptive field based on dense residual errors: (2) Wherein, the Represent the first The output of the layer is provided with, Represent the first The non-linear transformation function of the layer, Representing the output of the previous layer or all layers in front, An output representing a previous layer; s2.2, a global receptive field based on covariance attention: Covariance attention calculation, namely processing the normalized features by using a covariance attention mechanism, and capturing long-distance information features of the image; The covariance matrix is adopted to construct an attention mechanism to capture a global receptive field to extract long-distance information features of the image, and the covariance attention is calculated as follows: (3) In the formula, And Representing a query matrix and a key matrix respectively, Representing a matrix of values; constructing a transducer block using covariance attention, GELU < 2 > activation function, layerNorm normalization, and multi-layer perceptron MLP bonding, wherein The structural formula of the block is as follows: (4) wherein LayerNorm denotes a layer normalization operation, the fusion features are normalized by LayerNorm, multiHeadAttention denotes a multi-head attention mechanism, Representing a multi-layer perceptron, further processing the covariance attention output characteristics through the multi-layer perceptron, extracting higher-level characteristic representation, and taking the characteristics processed by the multi-layer perceptron as output characteristics; s2.3, frequency domain receptive field based on fast Fourier convolution: FFC (FastFourierConvolution ) was used to cover the receptive field over the entire image. The FFC divides the channel into local and global branches, the local branches update the local feature images by convolution, and the global branches perform Fourier transformation on the feature images and update the feature images in a frequency domain; wherein to ensure that the output is real, the FFC applies a real FFT of half the frequency spectrum and calculates an inverse real FFT accordingly; s3, constructing a deep neural network model for Chinese herbal medicine recognition based on the triple receptive field module: Capturing multi-scale characteristics by a triple receptive field network integrating local, global and frequency domain receptive fields, and extracting and outputting advanced characteristics; the deep neural network model is constructed by adopting the triple receptive field modules, each layer comprises one triple receptive field module, and a plurality of layers of deep neural network models are constructed by stacking layers. Unlike conventional convolutional neural network models, each layer of the deep neural network model is not a single convolutional layer, but a multiple sense module layer, including a dense residual block constructed by convolutional layers and dense residual learning, a Transformer block constructed by covariance attention mechanisms, and a fast fourier convolutional layer constructed by FFC. Different from the traditional neural network model, each layer only has the local receptive field learning capability of the convolution layer, and each layer of the deep neural network model constructed by the method has various different receptive field learning capabilities, has stronger characteristic characterization capability, and effectively improves the performance of the neural network model.
2. The triple receptive field based herbal image recognition method of claim 1 wherein said data enhancement process comprises: a geometric transformation comprising a horizontal flip and a vertical flip, the horizontal flip mirror the image along a vertical central axis and the vertical flip mirror the image along a horizontal central axis; color adjustment including brightness adjustment, contrast adjustment, and saturation transformation, the brightness adjustment being represented by a linear transformation formula To achieve brightness enhancement or reduction, wherein For a preset brightness scaling factor, For a preset brightness offset, the contrast adjustment is based on the global average value of the image for contrast enhancement, and the formula is as follows In which, in the process, As the mean value of the image, For a preset contrast ratio, the saturation transformation is represented by the formula Independently adjusting the saturation channels in the HSV color space to obtain adjusted saturation values , wherein, As the original saturation value is to be taken as, A preset saturation scaling factor; And (3) frequency domain enhancement, namely obtaining frequency domain information of an image based on Fourier transformation, and fusing the original input spatial domain image information and the frequency domain information to obtain fusion characteristics, wherein the fusion adopts a splicing mode to complete the frequency domain enhancement.
3. The method for recognizing a Chinese herbal medicine image based on a triple receptive field according to claim 1, wherein the triple receptive field module comprises: Dense residual blocks constructed from convolutional layers and dense residual learning, transform blocks constructed from covariance attention mechanisms, and fast fourier convolutional layers constructed from FFCs. The dense residual block is realized by dense residual learning, namely the formula (2) has local receptive field learning capability, a core component in the transform block is a covariance attention mechanism, the covariance attention mechanism is obtained by covariance attention calculation, namely the formula (3), the multi-head attention mechanism is realized through a plurality of groups of different attention, then an integral transform module is constructed through LayerNorm normalization and an MLP multi-layer perceptron, the dense residual block has long-distance characteristic learning capability, the receptive field covering the whole image is started by utilizing a fast Fourier convolution FFC, the FFC divides a channel into a local characteristic diagram and a global branch local branch, the local characteristic diagram is updated by using traditional convolution, the global branch performs Fourier transformation on the characteristic image, and updates the characteristic image in a frequency domain, in order to ensure that the output is a real value, a real FFT with only half of frequency spectrum is applied, and an inverse real FFT valuable signal is calculated correspondingly. The three module components work in parallel flow, connect and fuse the features by using linear projection, fuse by adopting self-learning weight parameters, embed TRFM with three different receptive fields in parallel, have the feature extraction capability of a plurality of different receptive fields in the same layer at the same time, and have better feature learning and characterization capability.
4. The triple receptive field based herbal image recognition method of claim 1, further comprising the steps of: training and optimizing the model to improve the accuracy and generalization capability of the model for Chinese herbal medicine image recognition; And an evaluation and verification step, wherein the identification result is evaluated and verified to ensure the reliability and accuracy of the identification result.

Description

Chinese herbal medicine image recognition method based on triple receptive fields Technical Field The invention relates to the technical field of image recognition, in particular to a Chinese herbal medicine image recognition method based on triple receptive fields. Background Along with the continuous development of traditional Chinese medicine and the improvement of global health consciousness, the Chinese herbal medicine image recognition technology plays an increasingly important role in the modernization of traditional Chinese medicine, quality control and resource protection. Existing image recognition techniques rely mainly on deep learning models, especially Convolutional Neural Networks (CNNs), which have met with some success in the recognition task. However, these techniques still face some challenges and limitations in the field of herbal image recognition. Firstly, the traditional CNN relies on convolution kernels with fixed sizes, and can effectively capture local detail features such as veins, saw teeth and the like, but has defects in modeling global structures of images such as the whole plant shape, petal distribution and the like, so that the distinguishing capability of similar medicinal materials is limited. Secondly, the frequency domain of the natural image contains abundant texture, periodic modes and other information, but most models only perform feature learning in the spatial domain, and neglect the joint characterization potential of the frequency domain and the spatial domain, so that the frequency information in the image cannot be fully utilized. In addition, the computational complexity of the traditional attention mechanism is secondarily increased along with the length of the sequence, and long-distance dependence of a cross region in a Chinese herbal medicine image, such as spatial association of leaves and rhizomes, is difficult to capture efficiently, so that understanding of a model on a complex structure is limited. Furthermore, the Chinese herbal medicine image data set has a limited scale, especially for rare species, and the professional labeling requires the participation of Chinese medicine experts, so that training data is difficult to cover the full life cycle form and the complex scene, and the lack of data and the high labeling cost become important factors for restricting the improvement of the model performance. For this reason, a triple receptive field-based Chinese herbal medicine image recognition method is proposed by those skilled in the art to solve the above problems. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a Chinese herbal medicine image recognition method based on triple receptive fields, which comprises the following steps of: s1, performing data enhancement processing on a Chinese herbal medicine image, and constructing a multi-level data enhancement system through geometric transformation, color adjustment spatial domain and frequency domain joint processing based on Fourier transformation; converting the image from a space domain to a frequency domain, extracting the frequency and information of the image, and calculating the following formula: (1) Wherein, the As a result of the frequency spectrum of the fourier transform,A sequence of images is input for the spatial domain,Is a discrete pixel value of the spatial domain image,Is a frequency index in the frequency domain, takes on a value ofDifferent fromCorresponding to the different frequency components of the antenna,For the total number of pixels,Is an imaginary unit; S2, constructing a triple receptive field module, which comprises the following steps: s2.1, local receptive field based on dense residual errors: (2) Wherein, the Represent the firstThe output of the layer is provided with,Represent the firstThe non-linear transformation function of the layer,Representing the output of the previous layer or all layers in front,A global receptive field based on covariance attention, representing the output S2.2 of the previous layer: Covariance attention calculation, namely processing the normalized features by using a covariance attention mechanism, and capturing long-distance information features of the image; The covariance matrix is adopted to construct an attention mechanism to capture a global receptive field to extract long-distance information features of the image, and the covariance attention is calculated as follows: (3) In the formula, AndRepresenting a query matrix and a key matrix respectively,Representing a matrix of values; constructing a transducer block using covariance attention, GELU < 2 > activation function, layerNorm normalization, and multi-layer perceptron MLP bonding, wherein The structural formula of the block is as follows: (4) wherein LayerNorm denotes a layer normalization operation, the fusion features are normalized by LayerNorm, multiHeadAttention denotes a multi-head attention mechanism, Representing a multi-la