CN-116935410-B - Data classification method, device and storage medium

CN116935410BCN 116935410 BCN116935410 BCN 116935410BCN-116935410-B

Abstract

The application discloses a data classification method, equipment and a storage medium. The data classification method comprises the steps of obtaining a target image to be classified and text data in the target image, respectively extracting features of the target image and the text data to obtain image features and text features, processing the image features and the text features by using a first attention model to obtain target features, and classifying the target image by using the target features to obtain a classification result of the target image. By means of the scheme, the accuracy of the classification result of the target image can be improved.

Inventors

XU RUIFENG
WANG BINGBING
Huang Shijue
LIANG BIN
TU GENG

Assignees

哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院)

Dates

Publication Date: 20260505
Application Date: 20230811

Claims (8)

1. A method of classifying data, comprising: Acquiring a target image to be classified and text data in the target image; respectively extracting features of the target image and the text data to obtain image features and text features; processing the image features and the text features by using a first attention model to obtain target features; classifying the target image by utilizing the target features to obtain a classification result of the target image; the method comprises the steps of classifying the target image by utilizing the target features to obtain a plurality of classification results, wherein each classification result corresponds to one classification task, and the step of classifying the target image by utilizing the target features to obtain the classification result of the target image comprises the following steps: processing the target features by using a plurality of first linear modules comprising at least one linear layer to obtain a plurality of first advanced features, wherein the number of the first advanced features is the same as the number of the arrangement combinations of the classification tasks, and each first advanced feature corresponds to at least one task combination; For one single task, fusing at least part of first advanced features corresponding to the single task to obtain fused features corresponding to the single task; For one of the multi-task combinations, fusing first advanced features corresponding to the arrangement combination of each single task in the multi-task combination to obtain fusion features corresponding to the multi-task combination; updating each fusion feature by using a plurality of second gating modules comprising at least one gating layer to obtain new first advanced features corresponding to each task combination, wherein the number of the second gating modules is the same as the number of the classified task arrangement combinations; For each classification task, fusing the first advanced features related to the classification task to obtain second advanced features related to the classification task, wherein the first advanced features related to the classification task are respectively processed by a plurality of second linear modules comprising at least one linear layer to obtain a plurality of transformation features; updating each second advanced feature by using a plurality of first gating modules comprising at least one gating layer to obtain third advanced features corresponding to each classification task, wherein the number of the first gating modules is the same as that of the classification tasks; and obtaining classification results of the classification tasks based on the third advanced features.
2. The method of claim 1, wherein processing the image feature and the text feature using the first attention model to obtain a target feature comprises: determining query data of a first query key value pair in the first attention model by using the image features, and determining key data and value data of the first query key value pair by using the text features; determining a first candidate feature output by the first attention model based on the first query key value pair; and fusing the first candidate feature, the image feature and/or the text feature to obtain the target feature.
3. The method according to claim 2, wherein the method further comprises: Judging whether metaphor information is contained in the target image based on the first candidate feature; processing the text feature by using a second attention model in response to the target image not containing the metaphor information, so as to obtain a second candidate feature; The fusing the first candidate feature, the image feature and/or the text feature to obtain the target feature includes: And fusing one or both of the image feature and the text feature with the second candidate feature and the first candidate feature to obtain the target feature.
4. A method according to claim 3, characterized in that the method further comprises: Acquiring a source domain text and a target domain text of the target image, wherein the target domain text is used for representing the category of an object contained in the target image, and the source domain text is part or all of the text data; respectively extracting features of the source domain text and the target domain text to obtain source domain features and target domain features; Responding to the fact that the target image contains the metaphor information, and processing the source domain feature, the target domain feature and the text feature by using a second attention model to obtain a third candidate feature; The fusing the first candidate feature, the image feature and/or the text feature to obtain the target feature includes: And fusing one or both of the image feature and the text feature, one or both of the source domain feature and the target domain feature, the third candidate feature and the first candidate feature to obtain the target feature.
5. The method of claim 4, wherein said processing the text feature with a second attention model in response to the target image not including the metaphor information, results in a second candidate feature, comprising: determining query data, key data and value data of a second query key value pair in the second attention model by using the text features; Determining a second candidate feature output by the second attention model based on the second query key value pair; or, in response to the target image including the metaphor information, processing the source domain feature, the target domain feature, and the text feature by using a second attention model to obtain a third candidate feature, including: Determining key data of a third query key value pair in the second attention model by using the source domain feature, determining value data of the third query key value pair by using the target domain feature, and determining query data of the third query key value pair by using the text feature; and determining a third candidate feature output by the second attention model based on the third query key value pair.
6. The method of claim 3, wherein the determining whether metaphor information is contained in the target image based on the first candidate feature comprises: metaphor type prediction is carried out on the first candidate features by using a preset classification model, so that a prediction result is obtained; Determining that metaphor information exists in the target image in response to the prediction result being a first preset value, or determining that metaphor information does not exist in the target image in response to the prediction result being a second preset value; the preset classification model is obtained by training based on a sample image with metaphor class labels and sample texts contained in the sample image.
7. An electronic device comprising a memory and a processor, wherein the memory stores program instructions, and wherein the processor invokes the program instructions from the memory to perform the data classification method of any of claims 1-6.
8. A computer readable storage medium comprising program files stored thereon, which when executed by a processor are adapted to carry out the data sorting method according to any of the claims 1-6.

Description

Data classification method, device and storage medium Technical Field The present application relates to the field of image processing technologies, and in particular, to a data classification method, apparatus, and storage medium. Background In recent years, many methods for classifying images and methods for classifying texts have appeared, for example, classifying animals contained in images to obtain the category to which the animals belong, or classifying texts to determine the format in texts, etc., because image data and text data are in different modes, for images containing texts, only image data in images or texts in images are extracted individually, and there is no better way to integrate image data and text data to classify images, and this way to classify data only in a single mode makes the classification result less accurate, and there is an urgent need for a way to classify images containing text according to the fusion of text content and image data contained in images. Disclosure of Invention The application provides at least one data classification method, device and storage medium. The application provides a data classification method which comprises the steps of obtaining a target image to be classified and text data in the target image, respectively extracting features of the target image and the text data to obtain image features and text features, processing the image features and the text features by using a first attention model to obtain the target features, and classifying the target image by using the target features to obtain a classification result of the target image. The application provides an electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the data classification method. The present application provides a computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the above-described data classification method. According to the scheme, after the target image to be classified and text data in the target image are acquired, the image features and the text features are obtained, the first attention model is utilized to process the image features and the text features, information interaction between the image features and the text features can be achieved, so that the target features are obtained, the target image is classified by utilizing the target features, and a classification result of the target image is obtained. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed. Drawings The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. FIG. 1 is a flow chart of an embodiment of a data classification method according to the present application; FIG. 2 is a schematic diagram of a target image according to an embodiment of the data classification method of the present application; FIG. 3 is a schematic view showing a sub-process of step S14 in the data classification method of the present application; FIG. 4 is a schematic diagram of a data classification method according to an embodiment of the present application; FIG. 5 is a schematic diagram of an embodiment of a data classification apparatus according to the present application; FIG. 6 is a schematic diagram of an embodiment of an electronic device of the present application; FIG. 7 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. Detailed Description The following describes embodiments of the present application in detail with reference to the drawings. In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application. The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C. The application provides a data classification method and a data classification device. Application scenarios of the da