CN-122023053-A - Quality and management and control element data association fusion method thereof
Abstract
The application discloses a quality and control element data association fusion method thereof, which comprises the steps of 1, data collection and preprocessing, 2, construction of a dedicated data warehouse of agricultural products, 3, construction and fusion of intelligent data association rules, 4, visual display and intelligent application of fusion data, and 5, a system adaptation and expansion mechanism, wherein the defects of low dispersion, low association and non-visual display of agricultural product quality and control element data in the prior art are overcome, and data support is provided for agricultural product quality control, traceability analysis and marketing through construction of the dedicated data warehouse, design of the intelligent data association rules, optimization of data fusion and visual display of the data fusion processes of the dedicated data warehouse and the intelligent data association rules.
Inventors
- CHEN ZHIJUN
- SHAO HUA
- LIU YAN
- JIN FEN
Assignees
- 中国农业科学院农业质量标准与检测技术研究所
Dates
- Publication Date
- 20260512
- Application Date
- 20260126
Claims (10)
- 1. A quality and control element data association fusion method is characterized by comprising the following steps: step 1, data collection and preprocessing, namely constructing a multi-source data access channel and a synchronous calibration mechanism based on a unified data base, realizing efficient collection of agricultural product quality data and management and control element data, and simultaneously adopting a self-adaptive preprocessing strategy to eliminate data noise; Step 2, constructing an agricultural product exclusive data warehouse with layering and elastic expansion based on preprocessed data, integrating dynamic partitioning, multidimensional indexing and a safety management and control mechanism, realizing efficient storage, intelligent scheduling and safety management of the data, adopting a three-layer architecture of an operation data storage layer, a data warehouse layer and a data mart layer by the data warehouse, and realizing cross-class rapid migration by newly adding a data adaptation layer; Step 3, constructing and fusing intelligent data association rules, constructing a three-layer association model of basic association, intelligent association and characteristic constraint association by combining agricultural product growth characteristics and big data association analysis technology, optimizing and improving an Apriori algorithm and dynamic weight fusion mechanism, realizing accurate association and deep fusion of quality data and management and control element data, and mining dominant and recessive association relation; step 4, fusing data visual display and intelligent application, constructing a visual system of cross-dimension linkage, intelligent recommendation and scene display based on a fused data set, and adding an intelligent application module at the same time; and 5, a system adaptation and expansion mechanism realizes multi-class agricultural product adaptation and multi-scene expansion by standardizing a cross-agricultural product type adaptation flow and expansion interface design.
- 2. The method for associating and fusing quality and control element data thereof according to claim 1, wherein the implementation process of step 1 comprises the following steps: step 1.1, finely classifying data sources; The quality data comprises physicochemical detection data and sensory evaluation data, wherein the physicochemical detection data cover high-precision detection data of a remote laboratory and field data of rapid detection equipment, and two types of labels of the precise detection data and the reference detection data are distinguished, and the sensory evaluation data adopts a standardized scoring system and scores 1-10 points; The management and control element data comprises planting environment parameters, cultivation management data and storage transportation data, wherein the planting environment parameters subdivide real-time dynamic parameters and cycle statistics parameters, the cultivation management data are used for quantitatively recording fertilization quantity and distinguishing nitrogen, phosphorus and potassium and organic fertilizer proportion, and accurate irrigation time to hour, irrigation quantity and pest control measures, and the storage transportation data comprise storage link record real-time temperature and humidity, ventilation frequency and preservative use condition; step 1.2, a multi-source data cooperative access and synchronization mechanism; Access channel optimization: The planting base end carries an edge computing module through an edge gateway, carries out local preprocessing on high-frequency environment data acquired by a sensor, adopts an MQTT protocol to upload with low delay after eliminating obvious abnormal values, has a transmission delay less than or equal to 500ms, supports 5G/4G/Wi-Fi multi-network redundancy switching, and avoids interruption of data transmission; the laboratory end realizes the batch import of quality detection data through RESTfulAPI interfaces of the LIMS laboratory information management system, supports data verification, generates detailed error logs when the import fails and supports breakpoint continuous transmission; the warehousing and transportation terminal collects data in real time through an Internet of things terminal, uploads the data by adopting an NB-IoT low-power consumption protocol, supports offline caching for long-distance transportation scenes, and automatically supplements and transmits the data after network recovery, wherein the caching capacity of the data is more than or equal to 10 ten thousand; Timestamp synchronization calibration: the dual mechanism of GPS time service and local clock calibration is adopted, and the time stamp formats of all data are unified; Aiming at the problem of different-place data transmission delay, a delay compensation model is established, namely, according to the real-time monitoring value of the data transmission path length and the network bandwidth, the time stamp is automatically corrected for the data with the transmission delay exceeding 100ms, and the consistency of the time dimension of the planting, detecting and warehousing data of the same batch of agricultural products is ensured.
- 3. The method for associating and fusing quality and control element data thereof according to claim 2, wherein the implementation process of step 1 further comprises the steps of: step 1.3, self-adaptive pretreatment strategy; And (3) intelligent cleaning: outlier rejection, namely adopting a 3 sigma principle to the continuous environmental parameters, rejecting data outside the mean value plus or minus 3 standard deviation, adopting a box diagram method to the discrete management data, rejecting data outside 1.5 times of the upper and lower quartile ranges, adopting a deviation threshold method to the detection data, and setting an allowable deviation range based on the precision of the detection equipment; Filling the missing value, namely, continuously and stably changing environment data adopts a linear interpolation method, detection data with larger fluctuation adopts a K neighbor filling method, K=5, discrete management data adopts a mode filling method of the same batch and the same growth stage, and filling marks and confidence degrees are marked after filling; dynamic normalization: Adopting an improved min-max standardization algorithm, and introducing a data distribution self-adaptive factor, namely adopting a traditional min-max formula for data conforming to normal distribution, adopting logarithmic transformation for the data of the deviation distribution, and then standardizing the data, so that the standardized data distribution is more uniform, and extreme value influence is avoided; Aiming at dimensionless data, Z-score standardization is adopted, so that cross-index comparison is facilitated; Unified coding system: The four-dimensional coding rule of product coding, batch coding, collection node and collection time is adopted, wherein the product coding is expanded by GB/T7635.1-2022 standard and comprises 6-bit product codes, 4-bit production place codes and 3-bit product codes, the batch coding comprises 4-bit planting year and 2-bit planting batch, and the collection node coding distinguishes between a planting base 01, a laboratory 02, a warehouse center 03 and a transportation link 04, so that each piece of data is guaranteed to be unique and traceable.
- 4. The method for associating and fusing quality and control element data thereof according to claim 1, wherein the implementation process of the step 2 comprises the following steps: step 2.1, optimizing design of a layered architecture; Operation data storage layer: The method has the advantages that the data type and time two-dimensional partition storage is adopted, the time partition is divided according to the day, the type partition is divided according to the quality data, the environmental parameters, the management data and the storage transportation data, the rapid deletion of the expiration data according to the partition is supported, and the automatic archiving of the cold data for more than 5 years is realized; The new data check log storage area is added, the access time, preprocessing operation and abnormal data detail of the original data are recorded, the data tracing and reprocessing are supported, and the log storage period is consistent with the corresponding original data; data warehouse layer: the star model design is characterized in that a core dimension table is expanded into four types of agricultural product core dimension tables, growth stage dimension tables, space dimension tables and equipment dimension tables: The agricultural product core dimension table comprises product_id, product name, production place, variety, batch number, planting start date, harvest date and producer information; A growth stage dimension table, namely dividing subdivision stages according to agricultural product types, and recording time ranges and key quality influence factors of each stage; The space dimension table comprises a planting base land block number, longitude and latitude coordinates, altitude, soil type and administrative division codes, and supports space inquiry; the equipment dimension table is used for recording the serial numbers, the model numbers, the calibration time and the precision parameters of the sensor and the detection equipment and is used for tracing the data quality; the fact table is subdivided according to service scenes, including a quality detection fact table, an environment monitoring fact table, a cultivation management fact table and a storage transportation fact table, and multidimensional association with the dimension table is realized through product_id, stage coding, space coding and time stamping, so that joint query across the fact tables is supported; data adaptation layer: Storing characteristic parameter libraries of different agricultural products, including growth periods, key quality indexes, core environment parameters and association rule weights of various agricultural products, adding agricultural product type codes, and adapting to new agricultural product types by calling the layer of parameters without modifying warehouse architecture; Data mart layer: constructing a proprietary data mart according to user roles and application scenes, comprising: aggregating the associated results of the cultivation management data, the environmental parameters and the quality detection data, and providing management and control strategy optimization suggestion data; the supervision department bazaar focuses on the related data of pesticide residue and heavy metal safety index data and planting and storage links and supports traceability checking; integrating the production area advantage data, the quality scoring data and the sensory evaluation data to be used for product propaganda; And (3) maintaining original fusion data in a scientific research analysis bazaar, and supporting custom dimension analysis and association rule mining.
- 5. The method for associating and fusing quality and control element data thereof according to claim 4, wherein the implementation process of step 2 further comprises the steps of: Step 2.2, multidimensional indexing and storage optimization; The core index is that a product_id+a growth stage+a spatial coding joint B+ tree index is added on the basis of the name, the production place and the batch number of agricultural products, so that accurate query according to specific products+specific growth stages+specific plots is supported; Space-time index, namely constructing an R tree space index aiming at a space dimension table, and supporting inquiry based on longitude and latitude ranges; Full text indexing, namely constructing inverted indexes for text fields of agricultural product names, varieties, producing area names and pest control measures, and supporting fuzzy query; Storage elastic expansion: Adopting a distributed storage architecture, supporting node dynamic capacity expansion, automatically triggering a capacity expansion mechanism when the data volume is increased by more than 80% of the current storage capacity, adding storage nodes newly and balancing data distribution, and avoiding shutdown maintenance; the hot and cold data are stored separately, namely, hot data of high-frequency query data in the last 1 year are stored in an SSD solid state hard disk, temperature data in the last 1 to 5 years are stored in an SAS hard disk, cold data in the last 5 years are stored in a low-cost SATA hard disk, and the storage cost is reduced while the hot data query efficiency is ensured; step 2.3, a data security management and control mechanism; classifying access rights, and distributing different rights according to user roles; Performing desensitization processing on privacy information of a producer and business confidential data by adopting a role-based access control RBAC model and combining a data desensitization technology; and the data backup and recovery adopts a daily incremental backup, a weekly full backup and a remote disaster recovery backup strategy, the incremental backup data is stored locally, the full backup and the disaster recovery backup are stored in a remote computer room, and the backup data is transmitted and stored in an encrypted mode.
- 6. The method for associating and fusing quality and control element data thereof according to claim 1, wherein the implementation process of the step3 comprises the following steps: step 3.1, extracting and optimizing multidimensional features; step 3.2, constructing three-layer association models of basic association, intelligent association and characteristic constraint association; and 3.3, a dynamic weight fusion mechanism.
- 7. The method for associating and fusing quality and control element data thereof according to claim 6, wherein the implementation process of the step 3.1 comprises the following steps: Text feature extraction: optimizing TF-IDF algorithm, introducing word stock in agricultural product field, including exclusive term, variety name, place of production name, quality index term of various agricultural products, and improving extraction weight of core features; unstructured text data of cultivation management measures and pest control measures are extracted by adopting semantic segmentation and feature extraction based on BERT, key actions are extracted, and the key actions are converted into structured features; Numerical value feature extraction: adopting an improved PCA algorithm, introducing an agricultural product characteristic weight factor, endowing higher weight to key environment parameters affecting quality, and then carrying out dimension reduction treatment to ensure that the core characteristics with the retained contribution rate of more than or equal to 90% after dimension reduction are parameters related to the quality; and constructing feature dimensions of the derived feature enrichment association analysis on the quality index data by adopting a feature engineering method.
- 8. The method for associating and fusing quality and control element data thereof according to claim 6, wherein the implementation process of the step 3.2 comprises the following steps: step 3.2.1, basic association, explicit label precise matching, and establishing one-to-one and one-to-many precise mapping based on a four-dimensional coding rule and a multi-dimensional dimension table: The one-to-one association is that the quality data of the same product_id, batch number and acquisition time are directly associated with the management and control element data; one-to-many association, namely associating the growth stage data of the same batch of agricultural products with a plurality of environmental parameters and management measure data of corresponding stages; Performing consistency verification on the matching result, automatically triggering an alarm and prompting manual verification if the same batch of data has the problem of inconsistent coding and unmatched time stamps, and ensuring 100% of basic association accuracy; And 3.2.2, intelligent association, namely improving an Apriori algorithm to mine the hidden association, and realizing accurate mining of the hidden association between the environmental parameters, the cultivation measures and the quality indexes by embedding the growth cycle characteristics of agricultural products, dynamically adjusting key parameters and optimizing rule screening logic.
- 9. The method for associating and fusing quality and control element data thereof according to claim 8, wherein the implementation process of the step 3.2.2 comprises the following steps: step 3.2.2.1, data preprocessing and basic parameter determination, including data subset splitting, growth cycle weight assignment, dynamic support threshold calculation and growth stage characteristic constraint rule base construction; splitting the data subset, namely splitting the data set of the processed data according to the growth stage of the agricultural product to form a stage exclusive transaction set, wherein the splitting standard is as follows: Dividing the growth stages, namely dividing the whole life cycle into subdivision stages according to the biological characteristics of the target agricultural products, wherein the time range of each stage is determined by the planting start date, the harvesting date and the growth cycle parameters in the agricultural product core dimension table; The transaction set construction comprises the steps that a single plant or a single block x time slice is taken as a minimum transaction unit in each stage, each transaction comprises core control element characteristics and corresponding quality characteristics of the stage, and the core control element characteristics, namely parameters with the contribution rate being more than or equal to 90% and reserved after PCA dimension reduction, ensure the correlation logic of the transaction focus stage-factor-quality; And the growth cycle weight assignment is carried out, the influence weight of each growth stage on the target quality index is verified and determined by adopting an analytic hierarchy process and an agricultural expert, and a basis is provided for the subsequent dynamic parameter adjustment: inviting 5-8 agricultural field experts, and scoring the importance of the influence of each growth stage by 1-10 according to the target quality index, wherein 1 score = little influence and 10 scores = great influence; Calculating a weight matrix by adopting an analytic hierarchy process, ensuring the weight rationality through consistency test, and outputting weight values W1, W2 and the number of the stages, wherein Wn is equal to 1; the dynamic support threshold value is calculated, and the threshold value is dynamically adjusted through the basic support, the stage weight and the data density coefficient; The basic support degree S 0 is determined, namely, the minimum support degree of an effective association item set verified by an expert in historical data of a target agricultural product is counted, the average value of the minimum support degree is taken as a basic threshold value, the value range is 0.2-0.4, the data are densely taken by 0.3-0.4, and the data are sparsely taken by 0.2-0.3; The data density coefficient K 'is calculated, namely, the influence of unbalanced data quantity in a stage is corrected, wherein the formula is as follows, K i =the transaction number N i in a certain stage/the total transaction number N Total (S) ,K' i of the whole period=0.8+ (Ki-K_min) multiplied by 0.4/K_max-K_min, and the data density coefficient K' is mapped to a [0.8,1.2] interval to avoid extreme values; s i ∈[0.1,0.5],S i is lower than 0.1, redundant rules are easy to generate, and key association is easy to miss when the dynamic support degree threshold S i =S 0 ×W i ×K' i is higher than 0.5; The growth stage characteristic constraint rule base is constructed, a time-causal-threshold constraint rule base is constructed, item sets in a transaction set are pre-filtered, and only the item sets conforming to the constraint are reserved to enter subsequent excavation: Time constraint, namely, a certain stage of control element is only related to the quality index of the stage and the subsequent stages; causal constraint, namely excluding parameter combinations without causal relation; Threshold constraint, setting a parameter threshold based on a crop suitability interval; Step 3.2.2.2, mining frequent item sets of each stage, substituting a dynamic support threshold S i of each growth stage into a transaction set of each growth stage, and executing frequent item set generation logic of an Apriori algorithm: Generating frequent 1-item sets, namely scanning a transaction set, counting the support degree of single control elements and quality indexes, and screening item sets with the support degree more than or equal to S i ; generating frequent k-item sets, wherein k is more than or equal to 2, namely performing connection operation on the frequent (k-1) -item sets to generate candidate k-item sets, then scanning the transaction set to calculate the support degree of the candidate k-item sets, and screening item sets with the support degree more than or equal to Si; termination condition, stopping the mining at the stage when a new frequent k-item set cannot be generated; step 3.2.2.3, generating association rules, namely generating candidate association rules from frequent item sets in each stage: Splitting the front item and the rear item for each frequent k-item set to ensure that the front item is a control element and the rear item is a quality index; calculating the confidence coefficient and the lifting coefficient of each candidate rule, wherein the confidence coefficient = supporting coefficient/antecedent supporting coefficient reflects the reliability of the rule, the lifting coefficient = confidence coefficient/postitem supporting coefficient reflects the relevance of the rule, and screening the rules with the confidence coefficient more than or equal to 0.75 and the lifting coefficient more than or equal to 1.2, and entering a rule base; Step 3.2.2.4, rule screening and integration: Pruning optimization is carried out, namely pruning is carried out by introducing a correlation strength screening factor, the correlation strength=confidence coefficient×lifting degree, only rules with the correlation strength more than or equal to 1.5 are reserved, and low-strength redundancy rules are removed; Characteristic constraint verification, namely performing constraint verification on the screened rule: Summarizing the rules passing through the verification at each stage to form a recessive association rule base of the target agricultural product, wherein the rule format is unified as [ growth stage ] + [ management and control element combination ] → [ quality index result ] (confidence/promotion degree); step 3.2.2.5, rule iteration optimization, and dynamic iteration is carried out on a rule base by combining a decision tree algorithm, so that rule accuracy is improved: taking the association rule base as an initial training set, taking the control element combination as a characteristic and the quality index result as a label, and training a decision tree model; Testing the model by using the agricultural product full-chain data, and calculating the prediction accuracy of the rule; The method comprises the steps of calculating the support and confidence coefficient of a rule with the prediction accuracy rate of less than 80 percent again, and rejecting if the rule still does not meet the screening standard; and (3) the iteration period is that 1000 pieces of full period data are added every time or 1 iteration is carried out every month, so that the rule base adaptation data change and agricultural product growth characteristic fluctuation are ensured.
- 10. The method for associating and fusing quality and control element data thereof according to claim 6, wherein the implementation process of the step 3.3 comprises the following steps: Dynamically adjusting the weight, building a data quality-credibility mapping model, and calculating the weight of the credibility of each data source in real time: The laboratory detection data comprises the steps of calculating weight according to the reliability of the calibration state of the detection equipment and the reliability of the detection times, wherein the weight range is 0.4-0.8, the reliability of the calibration state is that the reliability is 0.9,3-6 months for less than or equal to 3 months, the reliability is 0.7 for more than 6 months, the reliability is 0.4 for more than 6 months, and the reliability of the detection times is that the reliability is 0.7 for more than 3 times of the same index detection and 0.9,1-2 times of the reliability; Calculating weight according to the sensor error rate and data continuity, wherein the weight range is 0.1-0.4, the error rate is less than or equal to 2% weight 0.4, the error rate is 2-5% weight 0.3, the error rate is 5-10% weight 0.2, and the error rate is more than 10% weight 0.1; Manually recording data, namely calculating weights according to the qualification of the record person and the consistency of the cross verification, wherein the weight range is 0.1-0.2, the weight of a professional technician is 0.2, the weight of a common grower is 0.1, and the recording weight of more than 3 people is 0.2,1-2 people; Updating the weight in real time, namely updating the weight once every 24 hours according to the data quality statistical result, and ensuring that the fusion result is always based on the current highest-quality data; Data conflict resolution, establishing a conflict grading and priority processing mechanism: the first-level conflict is the key index conflict, the high-reliability data is used as the standard, and if the reliability difference is less than or equal to 0.1, the weighted average is adopted to combine with the association rule to predict and correct; the secondary conflict is non-key index conflict, wherein a weighted average method is adopted for fusion, and the weight is calculated according to the current credibility; recording conflict log, namely recording the processing process, basis and result of all conflict data in the log, and supporting tracing and rechecking; Generating a fusion data set, wherein the fused data comprises basic information, associated labels and confidence: The basic information is structured quality data and control element data, the association label marks the association type, the confidence coefficient marks the reliability degree of the association result, and the data is supported to be screened according to the association confidence coefficient so as to meet the requirements of different scenes.
Description
Quality and management and control element data association fusion method thereof Technical Field The invention relates to a quality and control element data association fusion method thereof, belonging to the technical field of agricultural product data processing and big data fusion. Background With the rapid development of intelligent agriculture, quality control of agricultural products is advanced into a data-driven fine management stage, and quality data and control element data become core supports for quality tracing, control strategy optimization and marketing. In the prior art, an agricultural product data acquisition system tends to be mature, planting environment parameters such as soil humidity, illumination intensity, air temperature and the like can be acquired in real time through a sensor network deployed at a planting base, physicochemical detection data such as sugar content, acidity, pesticide residue content and the like can be accurately acquired by means of high-precision detection equipment of a remote laboratory, on-site data acquisition can be realized through portable rapid detection equipment, and meanwhile management and control element data such as cultivation management measures, storage and transportation conditions and the like can be acquired through manual recording or an Internet of things terminal. However, the current agricultural product data management and application still faces four major core technical bottlenecks, which severely restricts the full play of the data value: First, data are stored in a scattered manner, and full-chain data are severely cracked. The planting environment data of agricultural products are stored in the planting base terminal in multiple places, laboratory detection data are independently archived in the exclusive management system, warehouse transportation data are scattered on the logistics enterprise platform, and unified data aggregation carriers and collaborative access mechanisms are lacked, so that the whole life cycle data of the same agricultural products from planting to selling are mutually split, a system is required to be frequently switched in a cross-link data calling mode, the operation is complex, the risk of interruption and loss of data transmission exists, and the centralized management and control of the whole chain data and efficient retrieval are difficult to realize. Secondly, the data association mode is low in efficiency, association accuracy and depth are insufficient, the prior art is based on simple labels such as product names, batches and the like to perform data matching, and intelligent association rules are not designed by combining with the growth characteristics of agricultural products. The problems of data mismatch and missed matching are easy to occur, the hidden association between the multi-control element combination and the quality index is difficult to mine, the influence of the synergistic effect of illumination, humidity and fertilization amount on the fruit quality is difficult, and the reliability and the application value of the association result are limited. In the aspect of association rule mining, an unoptimized basic Apriori algorithm is mostly adopted in the prior art, a fixed support threshold is adopted, and the growth cycle characteristics of agricultural products (such as the influence difference of different stages on quality) are not combined, so that the implicit association of a key growth stage (such as the causal relation between multi-environmental factor combination and quality indexes) is filtered, the redundancy rules of non-key stages are generated in a large quantity, and meanwhile, growth characteristic constraints (such as time causal constraints) are lacked, so that mismatch association without practical significance is easy to occur. In the aspect of feature extraction, the prior art adopts a general TF-IDF algorithm and standard PCA dimension reduction, does not introduce a special word stock and characteristic weight factors in the agricultural product field, and has core features (insufficient extraction weight, and easily lost core parameters which are strongly related to quality after dimension reduction). In the aspect of data fusion, the prior art mostly adopts fixed weight distribution, such as weighting of laboratory data, sensor data and manual recording data according to fixed proportion, and the like, and dynamic changes of data quality, such as calibration state of detection equipment, continuity of sensor data and the like, are not considered, so that fusion results are interfered by low-quality data, and accuracy is insufficient. Thirdly, the display and application form after data fusion is single, and the threshold is higher. Meanwhile, the data application is mostly limited to simple traceable inquiry, a closed loop of data-decision-quality improvement is not formed, accurate management and control strategy suggestions cannot be provided for growe