CN-115269781-B - Modal association prediction method, device, equipment, storage medium and program product

CN115269781BCN 115269781 BCN115269781 BCN 115269781BCN-115269781-B

Abstract

The application discloses a mode association degree prediction method, device, equipment, storage medium and program product, and relates to the field of machine learning. The method comprises the steps of obtaining a sample content set, extracting second modal feature vectors corresponding to second modal data of sample content, determining feature vector centers corresponding to a plurality of sample classifications respectively, determining distances between the second modal feature vectors and the feature vector centers, serving as relevance labels between the second modal data and the first modal data in the sample content, training candidate relevance recognition models based on the relevance labels to obtain relevance recognition models, using the models to recognize relevance between the first modal data and the second modal data in target content, and extracting semantic feature representations of the target content based on the relevance. The association degree identification model can better learn the association relation among all modes in the multi-mode content, thereby assisting in enhancing the understanding capability of the article semantics.

Inventors

DENG WENCHAO

Assignees

腾讯科技（武汉）有限公司

Dates

Publication Date: 20260505
Application Date: 20220804

Claims (12)

1. A method for predicting modal relevance, the method comprising: Acquiring a sample content set, wherein sample content in the sample content set comprises first modal data and second modal data, the sample content is marked with a classification label, the classification label is used for indicating sample classification to which the sample content belongs, and the first modal data in the sample content is strongly correlated with the sample classification; Extracting a second modal feature vector corresponding to the second modal data of the sample content; determining feature vector centers respectively corresponding to a plurality of sample classifications based on second modal feature vectors corresponding to sample contents belonging to the same sample classification; determining a second modal feature vector corresponding to the second modal data and a distance between feature vector centers of sample classification corresponding to the second modal data as a relevance label between the second modal data and the first modal data in the sample content; Training a candidate relevance recognition model based on the sample content and a relevance label corresponding to the sample content to obtain a relevance recognition model, wherein the relevance recognition model is used for recognizing relevance between first modal data and second modal data in target content, extracting semantic feature representation of the target content based on the relevance, extracting second modal feature vectors corresponding to the second modal data in each sample content respectively, clustering the second modal feature vectors in a feature space according to classification labels to obtain feature vector centers corresponding to each classification, and determining relevance between the first modal data and the second modal data in the target content according to the distance between the second modal feature vectors of the second modal data of the target content and the feature vector centers corresponding to the classification labels of the target content, wherein the semantic feature representation is used for representing the semantics of the target content.
2. The method of claim 1, wherein training the candidate relevance recognition model based on the sample content and the relevance labels corresponding to the sample content to obtain the relevance recognition model comprises: Acquiring target sample content in the sample content set, wherein the target sample content comprises target first-mode data and target second-mode data; inputting the target sample content into the candidate association degree identification model, and outputting to obtain the predicted association degree between the target first modality data and the target second modality data; obtaining a relevance loss value based on the relevance label of the target sample content label and the predicted relevance, wherein the relevance loss value is used for representing the difference between the relevance label and the predicted relevance; And training the candidate association degree identification model based on the association degree loss value to obtain the association degree identification model.
3. The method of claim 2, wherein training the candidate relevance identification model based on the relevance loss value to obtain the relevance identification model comprises: And carrying out iterative training on the candidate relevance recognition models based on the relevance loss values respectively corresponding to the sample contents in the sample content set to obtain the relevance recognition models.
4. A method according to any one of claims 1 to 3, wherein determining feature vector centers for each of a plurality of sample classifications based on the second modality feature vectors for sample content belonging to the same sample classification comprises: And carrying out average processing on the second modal feature vectors corresponding to the sample content belonging to the same sample classification to obtain feature vector centers respectively corresponding to the plurality of sample classifications.
5. A method according to any one of claims 1 to 3, wherein training the candidate relevance recognition model based on the sample content and the relevance label corresponding to the sample content, to obtain the relevance recognition model, further comprises: Acquiring semantic similarity distribution corresponding to a classification tag library based on semantic similarity relations among all classification tags in the classification tag library; Based on semantic features extracted from the target multi-mode content by the association degree recognition model, acquiring classification tag content probability distribution corresponding to the target multi-mode content, wherein the classification tag content probability distribution is used for indicating semantic association relations between the semantic features of the target multi-mode content and each classification tag; And fusing the semantic similarity distribution and the classification label content probability distribution to obtain a multi-stage classification sub-label corresponding to the target multi-mode content.
6. The method of claim 5, wherein the obtaining the semantic similarity distribution corresponding to the class label library based on the semantic similarity relationship between class labels in the class label library comprises: Performing classification prediction on the target multi-mode content, and determining topic characterization content corresponding to each classification label, wherein the topic characterization content is used for representing topic implicit semantics corresponding to the classification label; averaging semantic feature vectors of the topic characterization content to obtain a classification tag semantic vector corresponding to the classification tag; And obtaining cosine similarity between the semantic vectors of the classification labels to obtain the semantic similarity distribution.
7. The method of claim 5, wherein the obtaining the classification tag content probability distribution corresponding to the target multi-modal content comprises: carrying out classification prediction on the target multi-mode content, and reserving classification labels of which the probability distribution corresponds to the target multi-mode content and meets probability requirements; And calculating the co-occurrence probability of the classified labels among different granularities to obtain the classified label content probability distribution, wherein the co-occurrence probability refers to the probability of the content under the classified labels of the first granularity and the probability of the content under the classified labels of the second granularity sign.
8. The method of claim 5, wherein the fusing the semantic similarity distribution and the classification tag content probability distribution to obtain a multi-level classification sub-tag corresponding to the target multi-modal content comprises: And carrying out weighted fusion on the classified label content probability distribution and the semantic similarity distribution to obtain hierarchical classified label probability distribution, wherein the multi-level classified sub-labels corresponding to the target multi-mode content have a hierarchical relationship under the condition that the numerical value corresponding to the hierarchical classified label probability distribution is higher than a preset threshold value.
9. A modality relevance prediction device, characterized in that the device comprises: The acquisition module is used for acquiring a sample content set, wherein sample content in the sample content set comprises first modal data and second modal data, the sample content is marked with a classification label, the classification label is used for indicating sample classification to which the sample content belongs, and the first modal data in the sample content is strongly related to the sample classification; The extraction module is used for extracting a second modal feature vector corresponding to the second modal data of the sample content; The device comprises a determining module, a determining module and a judging module, wherein the determining module is used for determining the centers of the feature vectors corresponding to a plurality of sample classifications respectively based on the second modal feature vectors corresponding to sample contents belonging to the same sample classification; The training module is used for training the candidate relevance recognition model based on the sample content and the relevance label corresponding to the sample content to obtain a relevance recognition model, wherein the relevance recognition model is used for recognizing the relevance between first modal data and second modal data in target content, extracting semantic feature representation of the target content based on the relevance, extracting second modal feature vectors corresponding to the second modal data in each sample content respectively, clustering the second modal feature vectors in a feature space according to classification labels to obtain feature vector centers corresponding to each classification, and determining the relevance between the first modal data and the second modal data in the target content according to the distance between the second modal feature vector of the second modal data of the target content and the feature vector center corresponding to the classification label of the target content, wherein the semantic feature representation is used for representing the semantics of the target content.
10. A computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set that is loaded and executed by the processor to implement the modality association prediction method of any of claims 1 to 8.
11. A computer readable storage medium having stored therein at least one program loaded and executed by a processor to implement the modality association prediction method of any of claims 1 to 8.
12. A computer program product comprising a computer program which when executed by a processor implements the method of modality relevance prediction as claimed in any one of claims 1 to 8.

Description

Modal association prediction method, device, equipment, storage medium and program product Technical Field The embodiment of the application relates to the field of machine learning, in particular to a method, a device, equipment, a storage medium and a program product for predicting modal association. Background The information flow article contains contents of multiple modes such as a text mode and a picture mode, a multi-mode pre-training model is continuously developed and starts to be applied to the information flow article, understanding of article semantics is enhanced, and accuracy of downstream tasks such as article classification, article label extraction and article quality prediction is improved. In the related art, training tasks of the multimodal pre-training model of the information flow article mainly comprise a mask restoration task of a text modality and a matching task of the text modality and a picture modality. However, in the training task of the multimodal pre-training model of the information flow article, the matching task of the text mode and the picture mode depends on the cross-mode data with high correlation between the text mode and the picture mode, and the data with low correlation can influence the modeling capability of the model, so that the training effect of the model is poor. Disclosure of Invention The embodiment of the application provides a method, a device, equipment, a storage medium and a program product for predicting the modal association degree, which can improve the semantic association among the modalities of multi-modal content. The technical scheme is as follows: in one aspect, a method for predicting a modality association degree is provided, the method comprising: acquiring a sample content set, wherein sample content in the sample content set comprises text modal data and image modal data, and the sample content is marked with a classification label which is used for indicating sample classification to which the sample content belongs; Extracting a second modal feature vector corresponding to the second modal data of the sample content; determining feature vector centers respectively corresponding to a plurality of sample classifications based on second modal feature vectors corresponding to sample contents belonging to the same sample classification; determining a second modal feature vector corresponding to the second modal data and a distance between feature vector centers of sample classification corresponding to the second modal data as a relevance label between the second modal data and the first modal data in the sample content; Training a candidate relevance recognition model based on the sample content and a relevance label corresponding to the sample content to obtain a relevance recognition model, wherein the relevance recognition model is used for recognizing the relevance between first modal data and second modal data in target content, and extracting semantic feature representation of the target content based on the relevance, and the semantic feature representation is used for representing the semantics of the target content. In another aspect, a device for predicting a modality association degree is provided, the device comprising: the system comprises an acquisition module, a classification module and a storage module, wherein the acquisition module is used for acquiring a sample content set, sample contents in the sample content set comprise first modal data and second modal data, and the sample contents are marked with classification labels, and the classification labels are used for indicating sample classification to which the sample contents belong; The extraction module is used for extracting a second modal feature vector corresponding to the second modal data of the sample content; The device comprises a determining module, a determining module and a judging module, wherein the determining module is used for determining the centers of the feature vectors corresponding to a plurality of sample classifications respectively based on the second modal feature vectors corresponding to sample contents belonging to the same sample classification; The training module is used for training the candidate relevance recognition model based on the sample content and the relevance label corresponding to the sample content to obtain a relevance recognition model, wherein the relevance recognition model is used for recognizing the relevance between the first mode data and the second mode data in the target content, and extracting semantic feature representation of the target content based on the relevance, and the semantic feature representation is used for representing the semantics of the target content. In another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, where the at least on