CN-121191794-B - Intelligent semantic alignment and reasoning method and device for efficacy of traditional Chinese medicine
Abstract
The invention provides an intelligent semantic alignment and reasoning method and device for traditional Chinese medicine efficacy, and relates to the technical field of natural language processing. The method comprises the steps of obtaining a Chinese herbal medicine efficacy text and a corresponding Chinese herbal medicine efficacy text, dividing the Chinese herbal medicine efficacy text into an efficacy sub-word unit and an efficacy sub-word unit, inputting the efficacy sub-word unit and the efficacy sub-word unit into a trimmed BERT model to obtain word vectors, inputting the word vectors into a double-layer bidirectional long-short-time memory network to obtain efficacy multi-granularity characteristics and efficacy multi-granularity characteristics, inputting the word vectors into a cross attention layer, dynamically distributing matching weights between the efficacy and the efficacy, and classifying the matching weights through a Softmax function to obtain a predicted Chinese herbal medicine efficacy text. According to the invention, through fusion of the deep learning model, accurate conversion of drug effect and efficacy is effectively realized, and a brand new technical support is provided for efficacy analysis in the field of traditional Chinese medicines.
Inventors
- ZHAO YUE
- Fei Chenghao
- LI ZHIYONG
- TANG XIANGYUN
- LIU XUAN
- Jin Guancheng
- ZENG WEIFENG
- TAO CHENGAN
Assignees
- 中央民族大学
- 中国中医科学院中药研究所
Dates
- Publication Date
- 20260508
- Application Date
- 20251029
Claims (9)
- 1. An intelligent semantic alignment and reasoning method of traditional Chinese medicine efficacy-efficacy is characterized by comprising the following steps: s1, acquiring a Chinese herbal medicine efficacy text and a corresponding Chinese herbal medicine efficacy text to construct a sample data set; S2, dividing the Chinese herbal medicine efficacy text and the Chinese herbal medicine efficacy text through word splitters to obtain efficacy sub-word units and efficacy sub-word units, respectively inputting the efficacy sub-word units and the efficacy sub-word units into a finely tuned code representation BERT model based on a bidirectional transducer, and obtaining word vectors of each efficacy sub-word unit and word vectors of each efficacy sub-word unit through a word embedding layer, a sentence embedding layer and a position embedding layer of the finely tuned BERT model; S3, respectively inputting the word vector of the efficacy sub word unit and the word vector of the efficacy sub word unit into a double-layer bidirectional long-short-time memory network to obtain efficacy multi-granularity characteristics and efficacy multi-granularity characteristics; s4, inputting the drug effect multi-granularity characteristics and the efficacy multi-granularity characteristics into a cross attention layer, dynamically distributing matching weights between the drug effect and the efficacy, and classifying by a Softmax function to obtain a predicted Chinese herbal medicine efficacy text; s5, training a fusion model constructed based on a BERT model, a double-layer bidirectional long and short time memory network and a cross attention layer according to the predicted Chinese herbal medicine efficacy text and the Chinese herbal medicine efficacy text in the sample data set to obtain a trained fusion model; s6, inputting the Chinese herbal medicine efficacy text to be inferred into the trained fusion model to obtain a Chinese herbal medicine efficacy text inference result; in the step S3, the word vector of the efficacy sub word unit and the word vector of the efficacy sub word unit are respectively input into a double-layer bidirectional long-short-time memory network to obtain efficacy multi-granularity characteristics and efficacy multi-granularity characteristics, including: the word vector of the efficacy sub word unit and the word vector of the efficacy sub word unit are respectively input into a first forward long-short-time memory network unit and a first backward long-time memory network unit of a first layer of bidirectional long-time memory network to obtain a first forward hidden state and a first backward hidden state, and characteristic stitching is carried out on the first forward hidden state and the first backward hidden state to obtain efficacy local phrase characteristics and efficacy local phrase characteristics; The efficacy local phrase features and efficacy local phrase features are respectively input into a second forward long-short-time memory network unit and a second backward long-short-time memory network unit of a second layer of bidirectional long-short-time memory network to obtain a second forward hidden state and a second backward hidden state, and feature stitching is carried out on the second forward hidden state and the second backward hidden state to obtain efficacy global semantic features and efficacy global semantic features; And carrying out layer normalization processing on the efficacy local phrase features, the efficacy global semantic features and the efficacy global semantic features, splicing the efficacy local phrase features and the efficacy global semantic features after layer normalization to obtain efficacy multi-granularity features, and splicing the efficacy local phrase features and the efficacy global semantic features after layer normalization to obtain efficacy multi-granularity features.
- 2. The intelligent semantic alignment and reasoning method of the efficacy of the traditional Chinese medicine according to claim 1, wherein the construction method of the post-fine-tuning BERT model in S2 comprises the following steps: the BERT model is trained by starting error correction, decoupling gradient update and weight attenuation, and skipping weight attenuation for bias items and layer normalization parameters to construct a gradient update formula of an Adam algorithm; training the BERT model by adopting a method of gradually freezing encoder blocks in the BERT model; and (5) adjusting the learning rate in the BERT model training process.
- 3. The intelligent semantic alignment and reasoning method of traditional Chinese medicine efficacy-efficacy according to claim 2, wherein the adjusting the learning rate in the BERT model training process comprises: Iterative training is performed at a learning rate below a preset first threshold, preheating is performed at a learning rate exceeding a preset second threshold, and decay is performed at a learning rate below the preset first threshold.
- 4. The intelligent semantic alignment and reasoning method of the drug effect-efficacy of the traditional Chinese medicine according to claim 2, wherein a gradient update formula of the Adam algorithm is shown in the following formula (1): (1) in the formula, Representing model parameters at the first The value of the step(s), The time step is indicated as such, The learning rate is indicated as being indicative of the learning rate, A first moment estimate representing the gradient is presented, Representing the offset corrected first moment estimate, A second moment estimate representing the gradient is presented, Representing the offset corrected second moment estimate, A constant representing the stability of the value, Representing the weight decay coefficients.
- 5. The intelligent semantic alignment and reasoning method of traditional Chinese medicine efficacy-efficacy according to claim 1, wherein the step S4 of inputting the efficacy multi-granularity feature and efficacy multi-granularity feature into a cross attention layer, dynamically assigning matching weights between efficacy and efficacy comprises: obtaining a query vector of the efficacy multi-granularity characteristic and a key value vector of the efficacy multi-granularity characteristic, and performing similarity calculation to obtain weight; Normalizing the weights using a Softmax function; And carrying out weighted summation on the normalized weight and the corresponding key value vector to obtain the matching weight between the efficacy.
- 6. The intelligent semantic alignment and reasoning method of the efficacy of the traditional Chinese medicine according to claim 1, wherein the calculation process of the cross attention layer is as shown in the following formula (2): (2) in the formula, Representing the result of the attention interaction between the pharmacodynamic and efficacy characteristics, The characteristic of the drug effect is represented, 、 The characteristics of the efficacy are represented by the characteristics, A scaling factor representing the dimension of the feature, Representation of Is a transposed matrix of (a).
- 7. An intelligent semantic alignment and reasoning apparatus for efficacy-efficacy of a traditional Chinese medicine for implementing the intelligent semantic alignment and reasoning method for efficacy-efficacy of a traditional Chinese medicine according to any one of claims 1-6, characterized in that the apparatus comprises: The data acquisition module is used for acquiring the Chinese herbal medicine efficacy text and the corresponding Chinese herbal medicine efficacy text to construct a sample data set; The word vector construction module is used for respectively dividing the Chinese herbal medicine efficacy text and the Chinese herbal medicine efficacy text through a word segmentation device to obtain efficacy sub-word units and efficacy sub-word units, respectively inputting the efficacy sub-word units and the efficacy sub-word units into a trimmed encoding representation BERT model based on a bidirectional transducer, and obtaining a word vector of each efficacy sub-word unit and a word vector of each efficacy sub-word unit through a word embedding layer, a sentence embedding layer and a position embedding layer of the trimmed BERT model; The feature construction module is used for respectively inputting the word vector of the efficacy sub word unit and the word vector of the efficacy sub word unit into a double-layer bidirectional long-short-time memory network to obtain efficacy multi-granularity features and efficacy multi-granularity features; The efficacy prediction module is used for inputting the efficacy multi-granularity characteristics and the efficacy multi-granularity characteristics into a cross attention layer, dynamically distributing matching weights between the efficacy and the efficacy, and classifying the matching weights through a Softmax function to obtain a predicted Chinese herbal medicine efficacy text; the training module is used for training a fusion model constructed based on the BERT model, the double-layer bidirectional long-short-time memory network and the cross attention layer according to the predicted Chinese herbal medicine efficacy text and the Chinese herbal medicine efficacy text in the sample data set to obtain a trained fusion model; and the output module is used for inputting the Chinese herbal medicine efficacy text to be inferred into the trained fusion model to obtain the Chinese herbal medicine efficacy text inference result.
- 8. An intelligent semantic alignment and reasoning apparatus, the intelligent semantic alignment and reasoning apparatus comprising: A processor; A memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 6.
- 9. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method of any one of claims 1 to 6.
Description
Intelligent semantic alignment and reasoning method and device for efficacy of traditional Chinese medicine Technical Field The invention relates to the technical field of natural language processing, in particular to an intelligent semantic alignment and reasoning method and device for traditional Chinese medicine efficacy. Background The communication of traditional medicine and the introduction of new medicinal resources provide historic opportunities, especially for the traditional medicine system of minority nationality, which form unique medicine application and treatment methods. Under the background, the introduction of external medicinal resources faces the strict constraint of medicine registration management method and modern ethics, and can not directly enter the clinical practice of traditional Chinese medicine. However, the existing technology requires more attention to solve the problem of the patentability of drugs from the viewpoint of modern pharmacy, and the lack of systematic evaluation criteria for the traditional medical features and "traditional Chinese medicine" of these external medicinal resources becomes a key bottleneck restricting the effective introduction of the external medicinal resources. In the field of traditional Chinese medicine, obvious differences exist between efficacy and understanding of efficacy. The efficacy is a high summary of clinical effects of traditional Chinese medicines based on the theory of traditional Chinese medicine and is generally divided into descriptions of effects on diseases, symptoms and symptoms, while the efficacy is a pharmacological effect based on the modern biomedical theory and is a result of physiological or biochemical reactions of the organism caused by the medicines. This semantic difference between traditional medicine and modern medicine results in difficulties in the conversion of the efficacy and efficacy of the external drug. At present, the prior art fails to provide an effective way to solve the problem of semantic conversion between traditional medicine and modern medicine, and especially how to organically dock the efficacy expressions under different traditional medicine systems in the process of "traditional Chinese medicine" of external medicinal resources, and form an evaluation model of the system, and still lacks an effective technical means. For the problem of efficacy conversion of medicinal materials such as moringa leaves, the communication between traditional medicine and modern pharmacy is obviously hindered, and the systematic introduction and application of external medicinal resources are greatly restricted. Therefore, a model capable of realizing the conversion of the efficacy and the efficacy of the external medicine in the context of the traditional Chinese medicine is developed, scientific basis is provided for the traditional Chinese medicine, and the model is an important subject for improving the research and development efficiency of the traditional Chinese medicine, promoting the modernization of the traditional Chinese medicine and promoting the application of the cross-cultural medicine. The rule matching method adopted in the prior art is used for matching the traditional Chinese medicine based on the description of the efficacy and the efficacy of the traditional Chinese medicine by constructing a group of manually formulated rules. These rules are typically based on structured information or empirical knowledge in the literature of traditional Chinese medicine. For example, some keywords or phrases are defined to match the relationship between efficacy and efficacy. The method has the defects that the rule is difficult to cover all expression modes comprehensively, and has strong dependence on texts and lacks flexibility and expansibility. Statistical and machine learning-based methods include conventional machine learning algorithms such as SVM (Support Vector Machine ), random forest, etc., typically applied to training of labeled data. The classifier is trained to perform semantic matching by extracting key features of the efficacy and efficacy (such as word frequency, grammar structure, context information and the like of the text) through feature engineering. The disadvantage is that for complex natural language expressions, the effect of the machine learning method is limited, especially in cases where the data volume is not sufficiently rich. The deep learning-based method comprises a NLP (Natural Language Processing ) model, such as semantic analysis based on CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Network, cyclic neural network), LSTM (Long Short-Term Memory network) and other models. Semantic features in the text are automatically learned through the deep learning models, and matching and alignment of the efficacy are performed. Disadvantages are that a large amount of labeling data and computing resources are required and that the black