Search

CN-122021630-A - Crop pest and disease damage named entity recognition system based on door control fusion unit

CN122021630ACN 122021630 ACN122021630 ACN 122021630ACN-122021630-A

Abstract

The invention provides a crop pest and disease damage named entity recognition system based on a gating fusion unit, which comprises a feature coding module, a multi-scale feature extraction module, an attention enhancement module, a dynamic gating fusion module and an output decoding module, wherein the feature coding module is used for converting an input text into a word vector sequence, the multi-scale feature extraction module is used for extracting local features through parallel multi-granularity convolution, the attention enhancement module is used for generating global attention features through a multi-head attention mechanism, the dynamic gating fusion module is used for adaptively fusing the local features and the global attention features through trainable parameters, and the output decoding module is used for decoding and outputting entity recognition results. The method and the device effectively reduce the dependence on manual labeling and improve the robustness of the model by constructing training data through the prompting template and the countermeasure training, remarkably improve the recognition accuracy of the composite entity through the cooperative work of the multi-scale feature extraction and the attention mechanism, and solve the problem of insufficient feature fusion of the traditional method through the dynamic gating fusion mechanism and self-adaptively integrating different features.

Inventors

  • HU ZELIN
  • YANG RUIYU
  • LIU MINJUE
  • ZHANG WENQIANG
  • DING XINRU
  • CHEN REN
  • WANG WENFU

Assignees

  • 赣南师范大学
  • 江西鲜行者数智科技有限公司

Dates

Publication Date
20260512
Application Date
20251203

Claims (10)

  1. 1. Crop pest and disease damage named entity recognition system based on door control fusion unit, which is characterized by comprising the following components connected in sequence: The feature coding module is used for receiving an input text sequence and converting the input text into a corresponding word vector sequence by utilizing a pre-training language model; The input end of the multi-scale feature extraction module is connected with the output end of the feature coding module and is used for carrying out parallel multi-granularity convolution operation on the word vector sequence to extract local features with different scales; The input end of the attention enhancement module is connected with the output end of the feature encoding module and is used for modeling the global dependency relationship of the word vector sequence to generate global attention features; The input end of the dynamic gating fusion module is respectively connected with the output ends of the multi-scale feature extraction module and the attention enhancement module, and is used for receiving the local feature and the global attention feature, carrying out self-adaptive weighted fusion on the local feature and the global attention feature through a trainable gating weight parameter, outputting a fused feature vector, and And the input end of the output decoding module is connected with the output end of the dynamic gating fusion module, and is used for carrying out sequence decoding on the fused feature vectors and outputting the recognition result of the crop disease and pest named entity.
  2. 2. The crop pest named entity recognition system based on the gating fusion unit according to claim 1, wherein the multi-scale feature extraction module comprises three parallel convolutional neural networks respectively configured to perform feature extraction by using convolution kernels of 2-gram, 3-gram and 4-gram, and the output of each convolutional neural network is a 256-channel feature map.
  3. 3. The crop pest named entity recognition system based on the gating fusion unit according to claim 2, wherein the multi-scale feature extraction module further comprises a pooling layer for performing maximum pooling and average pooling on the feature map output by the convolutional neural network respectively to generate pooled local features with prominent features and pooled local features with global distribution information maintained, and the pooled local features are input to the dynamic gating fusion module.
  4. 4. The crop pest named entity recognition system based on the gating fusion unit according to claim 1, wherein the attention enhancement module is a multi-head attention mechanism embedded based on a rotation position and is used for capturing long-distance dependency relations and cross-sentence entity association in texts.
  5. 5. The crop pest named entity recognition system of claim 3, wherein the inputs to the dynamic gating fusion module include a maximum pooling feature and an average pooling feature from the multi-scale feature extraction module, and a global attention feature from the attention enhancement module, the dynamic gating fusion module calculating a fusion weight by the following equation (1): (1); In formula (1): 、 、 The trainable gating weight parameters are represented, and the model automatically learns and adjusts three scalar values in the training process; Representing a maximum pooling feature; Representing an average pooling feature; representing a attention feature; Representing an activation function; Representing the fusion weight coefficient.
  6. 6. The crop pest named entity recognition system based on the gating fusion unit of claim 1, wherein the output decoding module is a conditional random field decoder and uses BIOES labeling system for sequence tag prediction.
  7. 7. The crop pest named entity recognition system based on a gate fusion unit of claim 1, further comprising a training data construction module for: generating initial labeling data according to the preset various intention prompt templates and And enhancing the countermeasure data for the standard entity names in part of the initial annotation data, and replacing the standard entity names with corresponding dialects or wrongly written word variants to form a final training set.
  8. 8. The crop pest named entity recognition system based on the gating fusion unit according to claim 1, wherein in a model reasoning stage, the system further comprises a preprocessing module for performing word segmentation and dialect normalization processing on the input text before the feature encoding module processes the input text.
  9. 9. The crop pest named entity recognition system based on a gate fusion unit of claim 1, wherein the system is model optimized at deployment time, the optimization including at least one of quantization or pruning.
  10. 10. The crop pest named entity recognition system based on a gating fusion unit of claim 1, wherein the pre-trained language model is a BERT model.

Description

Crop pest and disease damage named entity recognition system based on door control fusion unit Technical Field The invention relates to the technical field of agricultural artificial intelligence, in particular to a crop pest and disease damage named entity recognition system based on a gating fusion unit. Background Named Entity Recognition (NER) is a core underlying task in natural language processing, aimed at recognizing and classifying predefined entity categories from unstructured text. In the fields of agricultural informatization and intellectualization, the accurate identification of entities related to crop diseases and insect pests, such as disease names, pathogens, pests, symptoms, therapeutic agents and the like, is a key technical premise for constructing an agricultural knowledge graph, developing an intelligent question-answering system and realizing accurate plant protection decision support. At present, the existing scheme in the technical field mainly depends on a classical sequence labeling model, and the implementation mode and inherent defects are as follows: scheme based on traditional machine learning and sequence labeling model: Early and part of existing systems employ model architectures such as BiLSTM-CRF (two-way long and short term memory network-conditional random field) or BERT-CRF (transducer-based two-way encoder representation-conditional random field). These models first encode the word vector of the input text, then capture semantic information through context modeling (e.g., biLSTM), and finally constrain the validity of the tag sequence using the CRF layer to output recognition results. Although the scheme has certain effect in the general field, the following technical bottlenecks to be solved are exposed in the text processing of agricultural diseases and insect pests with extremely strong professionals: The labeling data is rare and the cost is high, and the agricultural pest entity has the characteristics of strong specialization, diversity, large regional differences and the like (for example, late blight is called potato blast in certain regions). This makes the data labeling work dependent on plant protection domain expert, resulting in long labeling period and extremely high cost. The research shows that the labeling of a single high-quality pest entity can take up to several minutes, so that the construction of a large-scale high-quality data set is seriously restricted, and the method becomes a primary obstacle for improving the performance of a model. Complex entity recognition accuracy is low, namely complex entities with complex structures and different lengths (such as tomato brown wrinkle fruit viruses and potato early blight brown spots) widely exist in agricultural texts. The traditional BERT-CRF and other models have limited capability in capturing the internal structure and long-distance dependency relationship of the entity, and the limitation of the entity boundary and the insufficient understanding of the internal semantics lead to the common low recognition F1 value (such as 68.3% in the public test) of the composite entity, which is difficult to meet the requirement of practical application. The feature extraction and fusion mechanism is insufficient, and the existing feature extraction method is single. For example, the common TextCNN model typically employs only one of the policies of maximum pooling or average pooling, and cannot compromise efficient extraction of local salient features and global context information at the same time. The single feature representation strategy is difficult to comprehensively capture semantic information with different granularities and different levels in the text, so that the representation capability of a model on an entity, particularly a complex entity, is insufficient, and the model becomes a technical bottleneck for improving the recognition precision. Disclosure of Invention The invention aims to overcome the defects of the prior art, and provides a crop pest and disease damage named entity identification system based on a gating fusion unit, so as to solve the problems of scarcity of labeling data, difficult identification of complex entities and insufficient feature fusion in the agricultural field. In order to achieve the above purpose, the invention adopts the following technical scheme: Crop pest named entity recognition system based on door control fusion unit, including connecting gradually: The feature coding module is used for receiving an input text sequence and converting the input text into a corresponding word vector sequence by utilizing a pre-training language model; The input end of the multi-scale feature extraction module is connected with the output end of the feature coding module and is used for carrying out parallel multi-granularity convolution operation on the word vector sequence to extract local features with different scales; The input end of the attention enhancement module