CN-121996338-A - Man-machine interaction-based large model multi-mode multi-dimensional data augmentation method and device

CN121996338ACN 121996338 ACN121996338 ACN 121996338ACN-121996338-A

Abstract

The invention provides a man-machine interaction-based large-model multi-mode multi-dimensional data augmentation method and device, wherein the method comprises the steps of collecting multi-mode multi-dimensional agricultural data, preprocessing and formatting to be unified to obtain processed agricultural data, obtaining multi-mode demand instructions input by a user through a man-machine interaction interface, calling a data augmentation model to obtain the augmented agricultural data based on the multi-mode demand instructions and the processed agricultural data, performing cross-mode consistency verification on the augmented agricultural data, fusing the verified augmented agricultural data with the processed agricultural data to obtain an augmented agricultural data set, and the data augmentation model comprises a large-scale language model of the agricultural field, a CLIP model of the agricultural field, an agricultural knowledge graph and a data generator corresponding to a plurality of mode types which are related through a concentration mechanism sharing semantic space. Thereby generating semantically consistent, agronomically reasonable high quality multi-modal augmentation data.

Inventors

Wu huarui
ZHANG YUE

Assignees

北京市农林科学院信息技术研究中心

Dates

Publication Date: 20260508
Application Date: 20251205

Claims (10)

1. A man-machine interaction-based large model multi-mode multi-dimensional data augmentation method is characterized by comprising the following steps: collecting multi-mode multi-dimensional agricultural data from different data sources, preprocessing and formatting the multi-mode multi-dimensional agricultural data uniformly to obtain processed agricultural data, and obtaining a multi-mode demand instruction input by a user through a man-machine interaction interface; calling a data augmentation model based on the multi-mode demand instruction and the processed agricultural data to obtain augmented agricultural data; Performing cross-modal consistency verification on the augmented agricultural data, and fusing the verified augmented agricultural data with the processed agricultural data to obtain an augmented agricultural data set; The data augmentation model comprises a large language model in the agricultural field, a comparison language-image pre-training CLIP model in the agricultural field, an agricultural knowledge graph and data generators corresponding to a plurality of mode types, wherein the data generators corresponding to the plurality of mode types are related through an attention mechanism sharing semantic space.
2. The man-machine interaction based large model multi-mode multi-dimensional data augmentation method of claim 1, wherein invoking a data augmentation model based on the multi-mode demand instruction and the processed agricultural data to obtain augmented agricultural data comprises: The multi-modal demand instruction is subjected to feature extraction and fusion through a large language model in the agricultural field, a CLIP model in the agricultural field and an agricultural knowledge graph to obtain structural constraint; Encoding the processed agricultural data as a multimodal feature vector and encoding the structured constraint as a target constraint boundary; and generating augmented agricultural data through a data generator corresponding to the target modality type based on the multi-modality feature vector and the target constraint boundary.
3. The human-machine interaction-based large model multi-modal multi-dimensional data augmentation method of claim 1 or 2, wherein the plurality of modal types include one or more of image data types, time series data types, and text data types; the data generator corresponding to the image data type consists of a generation countermeasure network GAN and a CLIP model; The data generator corresponding to the time sequence data type consists of a transducer time sequence model and a physical constraint rule base; the data generator corresponding to the text data type consists of a large language model subjected to knowledge fine adjustment in the agricultural field.
4. The man-machine interaction based large model multi-mode multi-dimensional data augmentation method of claim 2, wherein the multi-mode demand instruction is subjected to feature extraction and fusion to obtain a structural constraint through a large language model in the agricultural field, a CLIP model in the agricultural field and an agricultural knowledge graph, comprising: extracting characteristics of text instructions in the multi-mode demand instructions through a large language model in the agricultural field to obtain a structured entity list and entity pair relation types; extracting features of image instructions in the multi-mode demand instructions through a CLIP model in the agricultural field to obtain a quantized phenotype feature set and a feature credibility report; based on an agricultural knowledge graph, fusing the structured entity list, the entity pair relationship type, the quantitative phenotype characteristic set and the characteristic credibility report to obtain a fusion result; invoking an agricultural expert rule base to carry out logic verification on the fusion result to obtain a verification result; and converting the fusion result and the logic verification result based on a constraint generation algorithm to obtain a structured constraint.
5. The method for amplifying large-model multi-mode multi-dimensional data based on man-machine interaction according to claim 4, wherein the feature extraction of text instructions in the multi-mode demand instructions by a large-scale language model in the agricultural field to obtain a structured entity list and entity pair relationship types comprises: Fine-tuning the large language model AgriLLM special for agriculture based on agricultural corpus to obtain a large language model in the agricultural field; obtaining semantic features of the text instruction through a large language model in the agricultural field; Inputting semantic features of the text instruction into an entity recognition model based on a conditional random field CRF to obtain a structured entity list; Inputting the structured entity list into a double-transform model to obtain initial relation types and entity association probabilities of each entity pair in the structured entity list; and screening the initial relationship type through the entity association probability to obtain a final entity pair relationship type.
6. The man-machine interaction based large model multi-mode multi-dimensional data augmentation method of claim 4, wherein said feature extraction of image instructions in said multi-mode demand instructions by a CLIP model in the agricultural domain, results in a quantized phenotype feature set and feature reliability report, comprises: Extracting an interested region from an original image in the image instruction, and carrying out standardization processing on the extracted image by a bicubic interpolation algorithm to obtain a standardized image; inputting the standardized image into a CLIP model of the agricultural field obtained through fine adjustment of the agricultural scene, matching with a preset agricultural text label, and outputting key phenotype data to form an initial quantized phenotype feature set; scoring the original image by adopting an image quality evaluation model based on ResNet-50, and obtaining feature credibility according to a scoring result; and checking and screening the initial quantized phenotype feature set according to a preset decision rule and the feature credibility to obtain a final quantized phenotype feature set and a feature credibility report.
7. The method for amplifying large-model multi-mode multi-dimensional data based on man-machine interaction according to claim 6, wherein extracting the region of interest from the original image in the image instruction comprises: Aiming at any candidate threshold value in the candidate gray level search interval, calculating the inter-class variance of the foreground and the background corresponding to the any candidate threshold value; Comparing the inter-class variances corresponding to all candidate thresholds in the search interval, selecting a to-be-selected threshold which enables the inter-class variance to reach the maximum value as an optimal segmentation threshold, and performing binarization segmentation on the original image by utilizing the optimal segmentation threshold to obtain an interested region; The candidate gray level search interval is set based on gray level distribution priori knowledge of a target and a background in the agricultural image.
8. The man-machine interaction based large model multi-mode multi-dimensional data augmentation method of claim 1, wherein the manner of cross-mode consistency verification of the augmented agricultural data comprises: calculating semantic similarity between image data and text data in the augmented agricultural data using a CLIP model and comparing with a preset threshold, and/or, Based on the agricultural knowledge graph, verifying a logical association between the time series data and the image data in the augmented agricultural data.
9. The human-machine-interaction-based large model multi-modal multi-dimensional data augmentation method of claim 1 or 8, wherein the method further comprises: Acquiring a system verification index of the amplified agricultural data set, and acquiring a user feedback score of the amplified agricultural data set through a human-computer interaction interface; The user feedback score and the system verification index are fused into a reward signal, and an optimized training sample is generated according to the amplified agricultural data set by taking preset interaction times as an iteration period; and optimizing parameters of the data expansion model based on the reward signal and the optimized training sample through a near-end strategy optimization algorithm.
10. The utility model provides a large model multimode multidimensional data augmentation device based on man-machine interaction which is characterized in that includes: The system comprises an acquisition processing module, a man-machine interaction interface, a processing module and a control module, wherein the acquisition processing module is used for acquiring multi-mode multi-dimensional agricultural data from different data sources, preprocessing and formatting the multi-mode multi-dimensional agricultural data to be unified to obtain processed agricultural data, and acquiring a multi-mode demand instruction input by a user through the man-machine interaction interface; the data augmentation module is used for calling a data augmentation model based on the multi-mode demand instruction and the processed agricultural data to obtain augmented agricultural data; The verification fusion module is used for performing cross-mode consistency verification on the augmented agricultural data, and fusing the verified augmented agricultural data with the processed agricultural data to obtain an augmented agricultural data set; The data augmentation model comprises a large language model in the agricultural field, a comparison language-image pre-training CLIP model in the agricultural field, an agricultural knowledge graph and data generators corresponding to a plurality of mode types, wherein the data generators corresponding to the plurality of mode types are related through an attention mechanism sharing semantic space.

Description

Man-machine interaction-based large model multi-mode multi-dimensional data augmentation method and device Technical Field The invention relates to the technical field of data processing, in particular to a large-model multi-mode multi-dimensional data augmentation method and device based on man-machine interaction. Background In the intelligent agriculture field, the acquisition and application of multi-modal data have obvious bottlenecks. In terms of data scale, the data of the whole growth period of crops is seriously insufficient, and taking cabbages as an example, effective samples of key nodes such as a heading initial stage, an expansion period and the like are rare, a single test field is often insufficient for thousands, and deep learning of a large model is difficult to support. In terms of data quality, field sensors are easy to interfere, for example, soil humidity sensors generate jump due to muddy water adhesion, unmanned aerial vehicle images are overexposed due to backlight, and model recognition accuracy can be reduced due to noise of the type. The data diversity is also obviously insufficient, crop phenotype data in extreme climates is scarce, different varieties of growth features are unevenly covered, so that the model has weak generalization capability in complex field scenes, and abnormal states caused by rare diseases and insect pests are difficult to accurately identify. The traditional data augmentation method has obvious adaptation defects in agricultural scenes. Its single-mode processing logic has difficulty coping with the multi-dimensional associated characteristics of agricultural data. For example, the crop images are subjected to operations such as random cutting and rotation, and although the number of image samples can be increased, key environmental factors such as soil moisture content and meteorological data cannot be associated, so that the enlarged data is disjointed from the actual conditions. In the cross-modal processing, the space-time alignment precision is low, for example, the time error of the matching of the meteorological text and the remote sensing image is often more than 1 hour, the space deviation reaches a plurality of meters, and the real-time linkage requirement of disaster early warning can not be met. These limitations make it difficult for the augmented data to support complex relationships of large model learning "crop growth-environment-farming", affecting the performance of tasks such as precise irrigation, pest prediction, etc. In addition, the traditional augmentation process lacks efficient integration of agronomic expert knowledge, resulting in significant deviations of the generated data from field reality. For example, false data may be generated for "fast growth of cabbage at high temperatures in winter", ignoring its suitable growth temperature (15-20 ℃), or false samples may be generated for "aphids gathering in the head of cabbage", violating the biological characteristics of what they are parasitizing on the back of the leaf. Such "dummy data" can mislead the model to form a false recognition, such as misjudging growth slowness at normal low temperature as a pathological condition. Agronomic experiences (such as shallow cultivation in seedling stage and water control in ball forming stage) are difficult to be automatically learned and captured through a machine, so that generated agronomic operation data conflict with the growth stage of crops, and the practical application value of the intelligent agricultural technology is finally reduced. Disclosure of Invention The invention provides a man-machine interaction-based large-model multi-mode multi-dimensional data augmentation method and device, which are used for solving the problems and realizing the efficient generation of high-quality multi-mode augmentation data which is consistent in semantics and reasonable in agronomic aspects. The invention provides a man-machine interaction-based large model multi-mode multi-dimensional data augmentation method, which comprises the following steps: collecting multi-mode multi-dimensional agricultural data from different data sources, preprocessing and formatting the multi-mode multi-dimensional agricultural data uniformly to obtain processed agricultural data, and obtaining a multi-mode demand instruction input by a user through a man-machine interaction interface; calling a data augmentation model based on the multi-mode demand instruction and the processed agricultural data to obtain augmented agricultural data; Performing cross-modal consistency verification on the augmented agricultural data, and fusing the verified augmented agricultural data with the processed agricultural data to obtain an augmented agricultural data set; The data augmentation model comprises a large language model in the agricultural field, a comparison language-image pre-training CLIP model in the agricultural field, an agricultural knowledge graph and data generato