CN-122019823-A - Image-text correlation detection method and device, electronic equipment and storage medium

CN122019823ACN 122019823 ACN122019823 ACN 122019823ACN-122019823-A

Abstract

The disclosure provides a method, a device, electronic equipment and a storage medium for detecting image-text correlation, and relates to the artificial intelligence fields of natural language processing technology, deep learning, computer vision and the like. The method comprises the steps of analyzing a target document to be processed to obtain pictures and structured context information corresponding to the pictures, obtaining hierarchical text vectors corresponding to the target document, determining target visual vectors corresponding to any picture, and determining a correlation judging result between the pictures and the target document according to the hierarchical text vectors, the target visual vectors and the structured context information. By applying the scheme disclosed by the disclosure, the accuracy and the like of the obtained correlation judgment result can be improved.

Inventors

HAN XINYING

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (20)

1. A picture and text relativity detection method comprises the following steps: analyzing a target document to be processed to obtain a picture and structural context information corresponding to the picture; Acquiring a hierarchical text vector corresponding to the target document; And respectively determining a target visual vector corresponding to any picture, and determining a correlation judging result between the picture and the target document according to the hierarchical text vector, the target visual vector and the structural context information.
2. The method of claim 1, wherein, The hierarchical text vector comprises a target paragraph vector of each paragraph in the target document, a target chapter vector of each chapter in the target document and a target document vector of the target document; The step of obtaining the hierarchical text vector corresponding to the target document comprises the following steps: respectively acquiring an initial paragraph vector of each paragraph, an initial chapter vector of each chapter and an initial document vector of the target document; And determining the target paragraph vector, the target chapter vector and the target document vector according to the initial paragraph vector, the initial chapter vector and the initial document vector.
3. The method of claim 2, wherein the separately obtaining the initial paragraph vector for each paragraph, the initial section vector for each section, and the initial document vector for the target document comprises: Respectively determining basic paragraph vectors of all paragraphs; according to the basic paragraph vectors belonging to the same chapter, respectively determining the initial chapter vector of each chapter; determining the initial document vector according to each initial chapter vector; And respectively adjusting the basic paragraph vectors according to the initial document vector to obtain the initial paragraph vector of each paragraph.
4. The method of claim 2, wherein the determining the target visual vector for the picture comprises: Acquiring a global picture vector of the picture; Performing salient region detection on the picture, and respectively acquiring the detected region vectors of all salient regions; identifying the text information in the picture, and obtaining an information vector of the identified text information; Fusing the global picture vector, the region vector and the information vector to obtain an initial visual vector; And determining the target vision vector according to the initial vision vector.
5. The method of claim 4, wherein, The determining the target paragraph vector, the target section vector, and the target document vector from the initial paragraph vector, the initial section vector, and the initial document vector includes: Directly determining the initial paragraph vector, the initial chapter vector and the initial document vector as the target paragraph vector, the target chapter vector and the target document vector, or determining the domain to which the target document belongs, acquiring a domain vector corresponding to the domain, and respectively adjusting the initial paragraph vector, the initial chapter vector and the initial document vector according to the domain vector to obtain the target paragraph vector, the target chapter vector and the target document vector; The determining the target vision vector according to the initial vision vector comprises: and directly determining the initial vision vector as the target vision vector, or adjusting the initial vision vector according to the field vector to obtain the target vision vector.
6. The method of claim 4, wherein the determining a relevance determination between the picture and the target document from the hierarchical text vector, the target visual vector, and the structured context information comprises: Determining a comprehensive matching degree evaluation result between the picture and the target document according to the hierarchical text vector, the target visual vector and the structured context information; and determining the correlation judgment result according to the comprehensive matching degree evaluation result.
7. The method of claim 6, wherein, The structured context information comprises a section where the picture is located, a context section of the picture and drawing information of the picture, wherein the context section comprises M adjacent sections positioned in front of the picture and N adjacent sections positioned behind the picture, and M and N are positive integers.
8. The method of claim 7, wherein the determining a comprehensive matching degree evaluation result between the picture and the target document comprises: determining a first-level matching degree evaluation result according to the target document vector and the target visual vector; Determining a second-level matching degree evaluation result according to a target chapter vector of a chapter where the picture is located and the target visual vector; Determining a third-level matching degree evaluation result according to the target paragraph vector and the target vision vector of each context paragraph; determining a fourth-level matching degree evaluation result according to the drawing vector corresponding to the drawing information, the information vector and the target vision vector; and determining the comprehensive matching degree evaluation result according to the matching degree evaluation results of all the layers.
9. The method of claim 8, wherein, The step of determining a first-level matching degree evaluation result according to the target document vector and the target visual vector comprises the steps of obtaining a first correlation degree between the target document vector and the target visual vector, and determining the first correlation degree as the first-level matching degree evaluation result; and/or the step of determining a second-level matching degree evaluation result according to the target chapter vector of the chapter where the picture is located and the target visual vector comprises the steps of obtaining a second correlation degree between the target chapter vector of the chapter where the picture is located and the target visual vector, and determining the second correlation degree as the second-level matching degree evaluation result; and/or the determining a third-level matching degree evaluation result according to the target paragraph vector and the target visual vector of each context paragraph comprises respectively obtaining third correlation degrees between the target paragraph vector and the target visual vector of each context paragraph, and determining the maximum value in each third correlation degree as the third-level matching degree evaluation result; And/or determining a fourth-level matching degree evaluation result according to the drawing vector corresponding to the drawing information, the information vector and the target vision vector comprises the steps of obtaining a fourth correlation degree between the drawing vector and the target vision vector, obtaining a fifth correlation degree between the information vector and the target vision vector, and carrying out weighted addition on the fourth correlation degree and the fifth correlation degree to obtain the fourth-level matching degree evaluation result.
10. The method of claim 9, wherein the determining the comprehensive fitness evaluation result from the tier-level fitness evaluation results comprises: acquiring a consistency penalty value corresponding to the picture; Obtaining a fusion result of the matching degree evaluation results of each level; And determining the comprehensive matching degree evaluation result according to the fusion result and the consistency penalty value.
11. The method of claim 10, wherein the obtaining the corresponding consistency penalty value for the picture comprises: Acquiring the average value of the second-level matching degree evaluation result and the third-level matching degree evaluation result, and acquiring a first difference value between the average value and a first tolerance threshold value; Acquiring an absolute value of a second difference value between the second-level matching degree evaluation result and the third-level matching degree evaluation result; and determining the consistency penalty value according to a comparison result between the absolute value and a second tolerance threshold value and a comparison result between the first level matching degree evaluation result and the first difference value.
12. The method of claim 10, wherein, The correlation judgment result comprises a basic judgment result and a correlation probability, wherein the basic judgment result is used for explaining whether the picture is correlated with the target document, and the correlation probability is used for explaining a correlation probability value between the picture and the target document; the determining the correlation determination result according to the comprehensive matching degree evaluation result comprises the following steps: determining a target threshold corresponding to the target document according to the matching degree evaluation result of each level and the domain vector; and fusing the comprehensive matching degree evaluation result, the consistency penalty value, the target threshold value and the matching degree evaluation results of all levels to obtain target reference information, and determining the basic judgment result and the relevance probability according to the target reference information.
13. The method of claim 12, wherein, The correlation judgment result also comprises a certainty grade and evidence information, wherein the evidence information is used for explaining the reason giving the basic judgment result, and the certainty grade is used for explaining the certainty of the basic judgment result; The determining the correlation determination result according to the comprehensive matching degree evaluation result further includes: And determining the certainty level and the evidence information according to the matching degree evaluation result of each level, the consistency penalty value, the correlation probability and the target threshold value.
14. The method of claim 13, wherein, The deterministic levels include high confidence, medium confidence, and low confidence; the method further comprises the steps of: after the correlation judgment result between the picture and the target document is determined, responding to the determination that the certainty level is the high confidence, and outputting the basic judgment result, the correlation probability and the evidence information; And acquiring a manual auditing result of the target document in response to determining that the certainty level is the medium confidence level or the low confidence level, outputting the basic judging result, the correlation probability and the evidence information in response to determining that the basic judging result and the evidence information are not required to be modified according to the manual auditing result, and outputting unmodified content and modified content in the basic judging result, the correlation probability and the evidence information in response to determining that the basic judging result and/or the evidence information are required to be modified according to the manual auditing result.
15. The method of claim 12, wherein, Each correlation degree is calculated by a cross-modal matching network model; and/or the obtaining of the fusion result of the matching degree evaluation results of all the levels comprises the steps of utilizing a weighted fusion model to fuse the matching degree evaluation results of all the levels to obtain the fusion result; And/or determining a target threshold corresponding to the target document according to the evaluation result of the matching degree of each level and the domain vector comprises generating the target threshold by using a threshold prediction model according to the evaluation result of the matching degree of each level and the domain vector; and/or the determining the basic judging result and the correlation probability according to the target reference information comprises generating the basic judging result and the correlation probability according to the target reference information by utilizing a correlation evaluation model.
16. The method of claim 15, further comprising: After the correlation judgment result between the picture and the target document is determined, in response to determining that the difficult case mining triggering condition is met, difficult case mining is carried out according to the feedback result corresponding to the image-text correlation detection carried out in the latest preset time, a training sample is constructed according to the difficult case mining result, and each model is optimized by utilizing the training sample.
17. The image-text correlation detection device comprises a first processing module and a second processing module; The first processing module is used for analyzing the target document to be processed to obtain a picture and structural context information corresponding to the picture; The second processing module is configured to obtain a hierarchical text vector corresponding to the target document, determine, for any picture, a target visual vector corresponding to the picture, and determine, according to the hierarchical text vector, the target visual vector, and the structured context information, a correlation determination result between the picture and the target document.
18. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-16.
19. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-16.
20. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-16.

Description

Image-text correlation detection method and device, electronic equipment and storage medium Technical Field The disclosure relates to the technical field of artificial intelligence, in particular to the fields of natural language processing technology, deep learning, computer vision and the like, and particularly relates to a method and a device for detecting image-text relativity, electronic equipment and a storage medium. Background In the scenes of document content auditing, content quality evaluation generation, advertisement delivery compliance auditing, searching, recommendation ordering and the like, image-text correlation detection is usually required, namely whether the pictures in the document are correlated with the document content (especially surrounding paragraph content) or not is judged. For example, it is determined whether the illustration in the information matches the paragraph content, instead of "title party" or "misleading image", and for example, when document content review is performed, it is determined whether the picture is embedded in an irrelevant document to avoid review, etc. Disclosure of Invention The disclosure provides a method, a device, an electronic device and a storage medium for detecting image-text correlation. A picture and text relativity detection method comprises the following steps: analyzing a target document to be processed to obtain a picture and structural context information corresponding to the picture; Acquiring a hierarchical text vector corresponding to the target document; And respectively determining a target visual vector corresponding to any picture, and determining a correlation judging result between the picture and the target document according to the hierarchical text vector, the target visual vector and the structural context information. The image-text correlation detection device comprises a first processing module and a second processing module; The first processing module is used for analyzing the target document to be processed to obtain a picture and structural context information corresponding to the picture; The second processing module is configured to obtain a hierarchical text vector corresponding to the target document, determine, for any picture, a target visual vector corresponding to the picture, and determine, according to the hierarchical text vector, the target visual vector, and the structured context information, a correlation determination result between the picture and the target document. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described above. A computer program product comprising computer programs/instructions which when executed by a processor implement a method as described above. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: fig. 1 is a flowchart of a first embodiment of a method for detecting image-text correlation according to the present disclosure; FIG. 2 is a flow chart of an embodiment of a method of generating a comprehensive matching degree evaluation result according to the present disclosure; FIG. 3 is a flowchart of a second embodiment of a method for detecting image-text correlation according to the present disclosure; fig. 4 is a schematic diagram of a composition structure of an embodiment 400 of the image-text correlation detection device according to the present disclosure; fig. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure. Detailed Description Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. In addition, it should be understood that the term "and/or" is merely an association relations