CN-122024185-A - Cross-modal data identification method based on noisy learning model and related equipment

CN122024185ACN 122024185 ACN122024185 ACN 122024185ACN-122024185-A

Abstract

The application discloses a cross-modal data identification method and related equipment based on a noisy learning model, and belongs to the technical field of data identification of artificial intelligence. The method comprises the steps of acquiring a cross-modal training data set, wherein the training data set comprises a plurality of sample pairs formed by first modal data and second modal data and sample labels of the sample pairs, evaluating sample quality of each sample pair to respectively obtain a first confidence coefficient of the first modal data, a second confidence coefficient of the second modal data and consistency scores between the first modal data and the second modal data, dividing the cross-modal training data set into a clean sample subset, a defect sample subset and an abnormal sample subset, performing iterative training on a preset model to be trained according to different subsets to obtain a cross-modal identification model, and identifying data matched with data to be queried from a corresponding cross-modal database through the cross-modal identification model, so that accuracy of model output is ensured.

Inventors

ZHU ZHAOWEI
TANG ZHE
WEI DONGZE
YE JIAYI
DI NA
LI SHAOQING

Assignees

杭州五维数据有限责任公司

Dates

Publication Date: 20260512
Application Date: 20260122

Claims (10)

1. A cross-modal data identification method based on a noisy learning model is characterized by comprising the steps of obtaining a cross-modal training data set, wherein the training data set comprises a plurality of sample pairs consisting of first modal data and second modal data and sample labels representing the association condition of the sample pairs, evaluating sample quality of each sample pair to obtain a first confidence coefficient of the first modal data and a second confidence coefficient of the second modal data respectively, and obtaining consistency scores between the first modal data and the second modal data, dividing the cross-modal training data set into a clean sample subset, a defect sample subset and an abnormal sample subset according to the first confidence coefficient, the second confidence coefficient and the consistency scores, carrying out iterative training on a preset model to be trained according to the clean sample subset, the defect sample subset and the abnormal sample subset and a differential learning strategy corresponding to different subsets to obtain a cross-modal identification model, obtaining data to be queried, and obtaining the data to be queried from the data to be queried and the data to be identified from a corresponding cross-modal database by the cross-modal identification model.
2. The method of claim 1, wherein the step of evaluating sample quality for each sample pair to obtain a first confidence level of the first modal data, a second confidence level of the second modal data, and a consistency score between the first modal data and the second modal data, respectively, comprises constructing a modal evaluation network, wherein the modal evaluation network comprises parallel first and second modal coding branches for processing the first and second modal data, respectively, predicting sample quality for each sample pair by a confidence pre-measurement head provided in each branch of the modal evaluation network, predicting a prediction probability distribution of the first and second modal data belonging to a corresponding preset data category, respectively, determining a first confidence level of the first modal data and a second confidence level of the second modal data according to the prediction probability distribution, respectively, calculating a feature vector output between the first and second modal coding branches by the modal evaluation network, calculating a feature vector of the first and second modal coding branches, and a cosine similarity score according to the feature vector, and a cosine similarity score, respectively, and obtaining a comprehensive similarity score according to the integrated similarity score.
3. The method of claim 2, wherein the step of dividing the cross-modal training dataset into a clean sample subset, a defect sample subset, and an abnormal sample subset based on the first confidence, the second confidence, and the consistency score comprises calculating loss values for two modal data for each sample pair, fitting a binary gaussian mixture model based on the loss values to determine a loss distribution and determine a posterior probability for each sample pair, weighting and fusing the posterior probability with the consistency score to obtain a decoupled confidence, and dividing the cross-modal training dataset into a clean sample subset, a defect sample subset, and an abnormal sample subset based on the decoupled confidence, and a relative ratio of the first confidence to the second confidence.
4. The method of claim 3, wherein the step of dividing the cross-modal training data set into a clean sample subset, a defective sample subset and an abnormal sample subset according to the decoupling confidence and the relative ratio of the first confidence and the second confidence comprises screening sample pairs with the decoupling confidence higher than a preset threshold as a subset to be distinguished, calculating the relative ratio of the first confidence and the second confidence of each sample pair in the subset to be distinguished, dividing the sample pairs into clean sample subsets if the relative ratio is within a preset parameter range and the corresponding consistency score is higher than a preset consistency standard, dividing the sample pairs into defective sample subsets if the relative ratio is within the preset parameter range and the corresponding consistency score is lower than the preset consistency standard, and dividing the sample pairs into abnormal sample subsets if the relative ratio is within the preset parameter range.
5. The method of claim 4, wherein the step of iteratively training the preset model to be trained according to the clean sample subset, the defect sample subset and the abnormal sample subset and the differential learning strategy corresponding to different subsets to obtain a cross-modal identification model comprises the steps of directly using the clean sample subset for iterative training of the preset model to be trained and calculating the average direction of a current loss gradient as a clean gradient consensus direction, calculating projection components and orthogonal components of the clean sample subset in the clean gradient consensus direction according to the differential learning strategy corresponding to the defect sample subset and the abnormal sample subset, correcting the gradient in a mode of discarding or scaling the orthogonal components to keep the gradient update direction consistent with the clean consensus, and periodically constructing a dynamic clean subspace based on the feature representation corresponding to the clean sample subset and restricting the feature representation of all sample pairs to align to the dynamic clean subspace through the dynamic clean subspace projection consistency loss, and iteratively training the preset model to be trained according to the defect sample subset and the abnormal sample subset after gradient update.
6. The method of claim 5, wherein prior to the step of iteratively training the pre-set model to be trained based on the gradient updated defect sample subset and the abnormal sample subset to obtain the cross-modal identification model, the method further comprises determining high confidence modal data and low confidence modal data in each sample pair in the defect sample subset, generating a suppression gating value approaching zero based on the confidence scores of the low confidence modalities, generating an enhancement gating value approaching one based on the confidence scores of the high confidence modalities, and adjusting gradient contributions of the defect sample subset in the iterative training process on the pre-set model to be trained based on the suppression gating value and the enhancement gating value.
7. The method of claim 5, wherein prior to the step of iteratively training the pre-set model to be trained based on the gradient updated defect sample subset and the anomaly sample subset to obtain the cross-modal identification model, the method further comprises arbitrating based on accumulated confidence of labels of the anomaly sample subset, which are predicted by the model in a history training period, or in the clean sample subset, searching for a sample pair most adjacent to semantics corresponding to the anomaly sample subset, and migrating the labels to generate a multi-stage reconstruction label, wherein the multi-stage reconstruction label is used to replace an original label corresponding to the anomaly sample subset to participate in the process of iteratively training the pre-set model to be trained.
8. The cross-modal data identification device based on the noisy learning model comprises an acquisition module, a division module, a training module, an evaluation module and an iteration module, wherein the acquisition module is used for acquiring a cross-modal training data set, the training data set comprises a plurality of sample pairs formed by first modal data and second modal data, and sample labels representing the association condition of the sample pairs, the evaluation module is used for evaluating the sample quality of each sample pair to respectively obtain a first confidence coefficient of the first modal data and a second confidence coefficient of the second modal data, and a consistency score between the first modal data and the second modal data, the division module is used for dividing the cross-modal training data set into a clean sample subset, a defect sample subset and an abnormal sample subset according to the first confidence coefficient, the second confidence coefficient and the consistency score, the training module is used for carrying out different training strategies corresponding to different subsets, and carrying out iterative query on the cross-modal data to be identified from a cross-modal data to be identified by the recognition module, and the cross-modal data to be identified from a database to be queried are obtained through the cross-modal data to be identified.
9. A noisy learning model based cross-modal data recognition device comprising a memory, a processor and a noisy learning model based cross-modal data recognition program stored on the memory and executable on the processor, the noisy learning model based cross-modal data recognition program configured to implement the noisy learning model based cross-modal data recognition method steps of any one of claims 1 to 7.
10. A storage medium, wherein a program for realizing a method for identifying cross-modal data based on a noisy learning model is stored on the storage medium, the program for realizing the method for identifying cross-modal data based on a noisy learning model being executed by a processor to realize the steps of the method for identifying cross-modal data based on a noisy learning model according to any one of claims 1 to 7.

Description

Cross-modal data identification method based on noisy learning model and related equipment Technical Field The application relates to the technical field of data identification of artificial intelligence, in particular to a cross-modal data identification method based on a noisy learning model and related equipment. Background In the field of machine learning, the quality of training data has a crucial impact on model performance. However, large-scale data sets obtained in practical applications often contain a large number of noisy labels that may originate from labeling errors, subjective judgment differences, or errors in automated labeling systems during data acquisition. The traditional machine learning model is easy to generate an overfitting phenomenon when facing data containing noise labels, namely the model overfits noise in training data, so that generalization performance of the model is obviously reduced. The method for solving the problem of label noise in the prior art is mainly divided into three types, namely a noise modeling method, a sample selection method and a regularization method. Noise modeling methods attempt to model the distribution of tag noise by estimating a noise transfer matrix, but conventional methods often require presetting of a noise rate or a noise type, and have poor flexibility in practical applications. Sample selection methods attempt to screen samples from the training data that may contain the correct labels for training, such as a small loss sample selection strategy, but such methods tend to result in training sample bias, ignoring some difficult-to-learn samples that are correctly labeled. Regularization methods improve the robustness of the model to noise by adding constraints or modifying the loss function, such as early-stop strategies and label smoothing techniques, but these methods have limited effectiveness in complex noise scenarios. In addition, the prior art is difficult to cope with complex noise conditions in practical application, particularly when facing noise with high noise proportion or uneven distribution, the performance of the prior art is obviously reduced, so that the generalization capability of the model in a real scene is low, and the accuracy of an output result is poor. Disclosure of Invention The application mainly aims to provide a cross-modal data identification method based on a noisy learning model and related equipment, and aims to solve the technical problems that the generalization capability of the model in a real scene is low and the accuracy of an output result is poor when the existing method is used for processing noise with high noise proportion or uneven distribution. In order to achieve the above object, the present application provides a method for identifying cross-modal data based on a noisy learning model, the method comprising the steps of: Acquiring a cross-modal training data set, wherein the training data set comprises a plurality of sample pairs consisting of first modal data and second modal data, and sample labels representing association conditions of the sample pairs; performing sample quality evaluation on each sample pair to respectively obtain a first confidence coefficient of the first modal data, a second confidence coefficient of the second modal data and a consistency score between the first modal data and the second modal data; Dividing the cross-modality training data set into a clean sample subset, a defect sample subset, and an abnormal sample subset according to the first confidence, the second confidence, and the consistency score; Performing iterative training on a preset model to be trained according to the clean sample subset, the defect sample subset, the abnormal sample subset and the differential chemistry strategy corresponding to different subsets to obtain a cross-modal identification model; And acquiring data to be queried, and identifying data matched with the data to be queried from a corresponding cross-modal database through the cross-modal identification model. In an embodiment, the step of performing sample quality evaluation on each sample pair to obtain a first confidence coefficient of the first modality data, a second confidence coefficient of the second modality data, and a consistency score between the first modality data and the second modality data, respectively, includes: constructing a modal evaluation network, wherein the modal evaluation network comprises a first modal coding branch and a second modal coding branch which are parallel and are used for respectively processing the first modal data and the second modal data; Through a confidence prediction head arranged in each branch of the modal evaluation network, sample quality evaluation is carried out on each sample pair, and the prediction probability distribution of the first modal data and the second modal data belonging to the corresponding preset data category is respectively predicted; respectively determin