CN-122024023-A - Method and computing device for detecting image counterfeiting

CN122024023ACN 122024023 ACN122024023 ACN 122024023ACN-122024023-A

Abstract

A method and computing device for detecting image forgery. The method for detecting image counterfeiting comprises the steps of extracting text information in an image to be detected to obtain a first text, generating image description information of the image to be detected to obtain a second text, and determining whether logic abnormality exists in the image to be detected based on the first text and the second text.

Inventors

ZENG FANWEI
LI JIANSHU
YAO WEIBIN

Assignees

蚂蚁区块链科技(上海)有限公司

Dates

Publication Date: 20260512
Application Date: 20260120

Claims (10)

1. A method of detecting image forgery, comprising: extracting text information in an image to be detected to obtain a first text; Generating image description information of the image to be detected to obtain a second text; and determining whether logic abnormality exists in the image to be detected or not based on the first text and the second text.
2. The method of claim 1, wherein determining whether a logical anomaly exists in the image to be detected based on the first text and the second text comprises: analyzing whether logic anomalies exist between different texts in the first text and/or Analyzing whether logic abnormality exists between the first text and the second text.
3. The method of claim 2, wherein the method further comprises: Identifying the named entities to the first text to obtain one or more named entities, Analyzing whether logic abnormality exists among different texts in the first text comprises the following steps: Analyzing whether the named entities with the numerical type accords with preset calculation logic or not and/or And analyzing whether time sequence conflict exists between named entities with the entity type being time type.
4. The method of claim 2, wherein the method further comprises: Carrying out named entity recognition on the first text to obtain one or more named entities; retrieving attribute knowledge of named entities with semantic entity types to obtain one or more entity attributes; Analyzing whether logic abnormality exists between the first text and the second text comprises analyzing whether logic abnormality exists between the entity attribute and the second text.
5. The method of claim 4, wherein analyzing whether a logical anomaly exists between the entity attribute and the second text comprises: Calculating semantic compatibility between the entity attribute and the second text to obtain a compatibility score; And if the compatibility score is lower than a first threshold value, judging that the image to be detected has semantic tampering.
6. The method of claim 1, wherein the method further comprises: Acquiring a target position of a text causing logic abnormality in the image to be detected; Highlighting an image part at the target position in the image to be detected; And displaying the processed image to be detected.
7. The method of claim 1, wherein the image description information includes at least one of scene features of the image to be detected, world knowledge related to the image to be detected.
8. The method of claim 1, wherein extracting text information in an image to be detected to obtain a first text, generating image description information of the image to be detected to obtain a second text, and determining whether a logic anomaly exists in the image to be detected based on the first text and the second text comprises: Constructing a prompt word, wherein the prompt word comprises the image to be detected and prompt information, the prompt information sequentially comprises a knowledge preparation step and a logic abnormality analysis step, the knowledge calibration step indicates a multi-mode large language model to extract text information in the image to be detected to obtain a first text, and indicates the multi-mode large language model to generate image description information of the image to be detected to obtain a second text, and the logic abnormality analysis step indicates the multi-mode large language model to analyze whether logic abnormality exists between different texts in the first text and/or whether logic abnormality exists between the first text and the second text; and inputting the prompt word into the multi-modal large language model, and determining whether logic abnormality exists in the image to be detected or not by the multi-modal large language model.
9. The method of claim 1, wherein the method further comprises: Constructing a training data set, and training a multi-mode large language model by using the training data set, wherein each piece of training data in the training data set comprises an input image and expected model output, the expected model output comprises a thinking chain and processing results of each step in the thinking chain, the thinking chain sequentially comprises statements of a knowledge preparation step and a logic abnormality analysis step, the knowledge preparation step indicates the multi-mode large language model to extract text information in the image to be detected to obtain a first text, the multi-mode large language model is indicated to generate image description information of the image to be detected to obtain a second text, and the logic abnormality analysis step indicates the multi-mode large language model to determine whether logic abnormality exists in the image to be detected based on the first text and the second text.
10. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-9.

Description

Method and computing device for detecting image counterfeiting Technical Field The embodiment of the specification belongs to the field of image evidence obtaining, and particularly relates to a method and computing equipment for detecting image counterfeiting. Background The rapid development of artificial intelligence Content (ARTIFICIAL INTELLIGENCE GENERATED Content, AIGC) technology has prompted complex counterfeiting, which poses a serious threat to social security and information authenticity. The existing image evidence obtaining technology mainly focuses on visual layer anomalies (such as JPEG compression blocking effect, edge blurring and noise point discontinuity). However, images generated using AIGC techniques tend to be coherent at the pixel level, and conventional image forensics techniques cannot distinguish between the true and false of such images from the visual level. Therefore, a technical scheme is hoped to be capable of detecting tamper marks which cannot be found in a visual aspect and improving the detection capability of image counterfeiting. Disclosure of Invention The first aspect of the specification provides a method for detecting image counterfeiting, which comprises the steps of extracting text information in an image to be detected to obtain a first text, generating image description information of the image to be detected to obtain a second text, and determining whether logic abnormality exists in the image to be detected based on the first text and the second text. A seventh aspect of the present specification provides a computing device comprising a memory having executable code stored therein and a processor which when executing the executable code implements the method of the first aspect of the present specification. According to the method provided by the embodiment of the specification, the OCR text originally existing in the image to be detected is extracted, the image description text is generated for the image to be detected, whether the logic abnormality exists in the image to be detected is determined based on the OCR text and the image description text, and the visually perfect (without PS trace) but logically paradoxical deep counterfeiting image can be identified, so that the detection capability of image counterfeiting is improved. Drawings In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. FIG. 1 is a schematic diagram of detecting image forgery in one embodiment; FIG. 2 is a flow chart of a method of detecting image forgery in one embodiment; Fig. 3 is a timing diagram of a method for detecting image forgery based on a distributed system architecture in an embodiment of the present disclosure. Detailed Description In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure. The existing image evidence obtaining technology mainly focuses on visual layer abnormality. However, the image generated by AIGC (e.g., stable Diffusion) techniques tends to be coherent at the pixel level, and conventional visual detectors cannot distinguish between true and false. For example, a forged bank bill is transferred, if an attacker perfectly replaces the monetary digits with the same font, no break-up is visible from the visual point of view, but if the modified digits result in "initial balance + in-out not equal to end balance", this constitutes a mathematical computation logical vulnerability. For another example, if an attacker uses the same font to perfectly replace the name of a sign of a bank building in an image with a name of a fast food store, no break can be seen from the visual aspect, but the co-occurrence probability between the name of the fast food store and the sign of the bank building is extremely low, and mutual information of the two approaches 0, which constitutes a semantic logic vulnerability. In view of this, the embodiments of the present disclosure propose a new method for detecting image forgery, which can determine tamper marks that exist in an image but cannot be found at a visual le