CN-121071761-B - Method, equipment and storage medium for detecting data quality of carbon footprint

CN121071761BCN 121071761 BCN121071761 BCN 121071761BCN-121071761-B

Abstract

The application discloses a data quality detection method, equipment and a storage medium for carbon footprint, and belongs to the technical field of data quality detection. The method comprises the steps of obtaining data to be detected, retrieving at least one piece of domain knowledge information related to the data to be detected in a knowledge database, calling a data quality judging model, identifying data characteristic information of the data to be detected, determining data quality information of the data to be detected according to the data characteristic information, combining the data to be detected, the domain knowledge information and the data quality information to generate enhanced prompt information, inputting the enhanced prompt information into a large language model, and obtaining a quality detection result of the data to be detected, which is generated by the large language model based on the enhanced prompt information. According to the collaborative detection mechanism integrating the multi-source information and the artificial intelligence discrimination model, the accuracy and the reliability of quality control of multi-mode and heterogeneous carbon footprint data are remarkably improved.

Inventors

FENG WEI
LIU JIE
LIU LUJING
XUE DENGGAO
LEI HANG

Assignees

深圳先进技术研究院

Dates

Publication Date: 20260512
Application Date: 20251110

Claims (8)

1.A method for detecting the data quality of a carbon footprint, which is characterized by comprising the following steps: Acquiring data to be detected, and retrieving at least one piece of domain knowledge information associated with the data to be detected in a knowledge database; performing feature extraction and dimension reduction processing on the data to be detected through a self-encoder to obtain dimension reduction feature information, wherein a stacked self-encoder is used for constructing a deep neural network, high-dimension input data is compressed into low-dimension feature representation through the stacked self-encoder, and then original data is reconstructed through a decoder to form the dimension reduction feature information; Based on a long-short-term memory network and a convolutional neural network, identifying time sequence features or space features in the dimension reduction feature information, wherein in a time sequence analysis stage, a time dependent model is established by adopting a gating circulation unit network, and a continuous change rule in process data in the dimension reduction feature information is analyzed; performing abnormal point detection on the data to be detected, the time sequence features and the space features through an isolated forest algorithm, and identifying abnormal data in the data to be detected; generating data quality information based on the temporal features, the spatial features, and the anomaly data; The data to be detected, the domain knowledge information and the data quality information are combined to generate enhanced prompt information, which comprises the steps of constructing a data quality detection logic tree based on a data quality detection scene, traversing the logic tree through breadth-first search to form a thinking chain, generating a prompt sentence based on the thinking chain, combining the data to be detected, the domain knowledge information and the data quality information with the prompt sentence, and constructing the enhanced prompt information, wherein the logic tree comprises a plurality of detection nodes and reasoning paths; and inputting the enhanced prompt information into a large language model to obtain a quality detection result of the data to be detected, which is generated by the large language model based on the enhanced prompt information.
2. The method for detecting the data quality of the carbon footprint according to claim 1, wherein before the step of obtaining the dimension-reduction feature information, the method further comprises the steps of: Determining the data type of the data to be detected, and determining text type data and numerical type data in the data to be detected; based on the data type, carrying out type correction on text type data in the data to be detected, and converting the text type data into the numerical type data; And carrying out statistical analysis on the numerical data to identify abnormal data.
3. The method for detecting the data quality of a carbon footprint according to claim 1, wherein the step of obtaining the data to be detected and retrieving at least one piece of domain knowledge information associated with the data to be detected in a knowledge database further comprises: Acquiring domain knowledge data of the carbon footprint of the energy storage battery, and segmenting the domain knowledge data according to detection requirements through a pre-trained language processing model to form knowledge segments; performing feature coding on the knowledge segments to generate feature vectors with fixed dimensions; and storing the feature vector into a vector database to form the knowledge database.
4. The method for detecting the quality of data of a carbon footprint according to claim 1, wherein before the step of inputting the enhanced prompt information into a large language model to obtain the quality detection result of the data to be detected generated by the large language model based on the enhanced prompt information, the method further comprises: Constructing a training data set and a verification data set based on the historical data quality detection information; Training the model to be trained based on the training data set, and performing performance verification on the trained model to be trained through the verification data set to obtain performance indexes; and according to the performance index, adjusting the model parameters of the model to be trained to obtain the large language model.
5. The method for detecting the quality of data of a carbon footprint according to claim 1, wherein after the step of inputting the enhanced prompt information into a large language model to obtain the quality detection result of the data to be detected generated by the large language model based on the enhanced prompt information, the method further comprises: acquiring a model optimization sample based on the system misjudgment case and the labeling data; Actively learning the model optimization sample by an uncertainty sampling method, and identifying a target sample in the model optimization sample; Based on the target sample, updating model parameters and knowledge base weights of the large language model through an exponentially weighted moving average method.
6. The method for detecting the data quality of a carbon footprint according to claim 1, wherein the step of obtaining the data to be detected and retrieving at least one piece of domain knowledge information associated with the data to be detected in a knowledge database comprises: encoding the data to be detected into a query vector through a language processing model; retrieving, in a vector database, at least one knowledge segment matching the query vector using an approximate nearest neighbor search algorithm; And taking the retrieved knowledge segments as the domain knowledge information.
7. A data quality detection device for a carbon footprint, characterized in that the device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the data quality detection method for a carbon footprint as claimed in any one of claims 1 to 6.
8. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the data quality detection method of a carbon footprint as claimed in any one of claims 1 to 6.

Description

Method, equipment and storage medium for detecting data quality of carbon footprint Technical Field The present application relates to the field of data quality detection technologies, and in particular, to a method, an apparatus, and a storage medium for detecting data quality of a carbon footprint. Background In the related technical field of carbon footprint data, the full life cycle carbon footprint calculation of the energy storage battery mainly adopts a method of checking based on a rule base, counting outlier detection or supervising machine learning model and the like, and carries out abnormal recognition on the numerical data by predefining rules, counting threshold values or training a classifier and the like. However, in the process of detecting data quality and identifying abnormal data, related art is generally designed for structured numerical data, and it is difficult to effectively process and verify semantic logic and consistency in unstructured data such as text reports, process descriptions, and the like. This results in insufficient accuracy and reliability of the relevant data quality control method for carbon footprint data of diverse sources and heterogeneous formats, particularly data sets containing large amounts of unstructured text information. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The application mainly aims to provide a data quality detection method, equipment and a storage medium for carbon footprint, and aims to solve the technical problem of insufficient quality detection reliability of battery life cycle data. In order to achieve the above object, the present application provides a method for detecting data quality of a carbon footprint, the method comprising the steps of: Acquiring data to be detected, and retrieving at least one piece of domain knowledge information associated with the data to be detected in a knowledge database; Invoking a data quality judging model, identifying data characteristic information of the data to be detected, and determining data quality information of the data to be detected according to the data characteristic information; combining the data to be detected, the domain knowledge information and the data quality information to generate enhanced prompt information; and inputting the enhanced prompt information into a large language model to obtain a quality detection result of the data to be detected, which is generated by the large language model based on the enhanced prompt information. In an embodiment, the step of calling a data quality discrimination model, identifying data characteristic information of the data to be detected, and determining the data quality information of the data to be detected according to the data characteristic information includes: performing feature extraction and dimension reduction processing on the data to be detected through a self-encoder to obtain dimension reduction feature information; identifying time sequence features or space features in the dimension reduction feature information based on a long-term memory network and/or a convolutional neural network; Detecting abnormal points of the data to be detected and/or the time sequence features and/or the space features through an isolated forest algorithm, and identifying abnormal data in the data to be detected; The data quality information is generated based on the timing characteristics, the spatial characteristics, and/or the anomaly data. In an embodiment, before the step of obtaining the dimension-reduced feature information by performing feature extraction and dimension-reduced processing on the data to be detected by the self-encoder, the method further includes: Determining the data type of the data to be detected, and determining text type data and numerical type data in the data to be detected; based on the data type, carrying out type correction on text type data in the data to be detected, and converting the text type data into the numerical type data; And carrying out statistical analysis on the numerical data to identify abnormal data. In an embodiment, before the step of obtaining the data to be detected and retrieving at least one piece of domain knowledge information associated with the data to be detected in the knowledge database, the method further includes: Acquiring domain knowledge data of the carbon footprint of the energy storage battery, and segmenting the domain knowledge data according to detection requirements through a pre-trained language processing model to form knowledge segments; performing feature coding on the knowledge segments to generate feature vectors with fixed dimensions; and storing the feature vector into a vector database to form the knowledge database. In an embodiment, the step of combining the data to be detected