CN-122024024-A - Method for detecting image and computing device

CN122024024ACN 122024024 ACN122024024 ACN 122024024ACN-122024024-A

Abstract

Embodiments of the present disclosure provide a method and computing device for detecting an image, the method including determining abnormal visual information based on pixel information of an image to be detected, determining abnormal logic information from the image, and determining an abnormal position of the image based on the abnormal visual information and the abnormal logic information. The method can accurately detect the abnormal position in the image.

Inventors

ZENG FANWEI
LI JIANSHU
YAO WEIBIN

Assignees

蚂蚁区块链科技(上海)有限公司

Dates

Publication Date: 20260512
Application Date: 20260120

Claims (12)

1. A method of detecting an image, the method comprising: determining abnormal visual information based on pixel information of an image to be detected; determining abnormal logic information based on semantic conflicts in the image; And determining an abnormal position of the image based on the abnormal visual information and the abnormal logic information.
2. The method of claim 1, wherein the determining anomaly logic information based on semantic conflicts in the image comprises: Extracting text information from the image; And determining abnormal logic information based on the text information.
3. The method of claim 2, wherein the extracting text information from the image comprises: Extracting text information and corresponding positions from the image; the determining the abnormal position of the image based on the abnormal visual information and the abnormal logic information comprises the following steps: determining target text information based on the abnormal visual information and the abnormal logic information; and determining the position corresponding to the target text information as an abnormal position of the image.
4. The method of claim 2, wherein the method further comprises: identifying the scene type of the image and acquiring domain knowledge corresponding to the scene type; the determining abnormal logic information based on the text information comprises the following steps: And determining abnormal logic information based on the domain knowledge and the text information.
5. The method of claim 4, wherein the determining anomaly logic information based on the domain knowledge and the textual information comprises: Based on the domain knowledge, verifying consistency between quantifiable data in the text information, determining abnormal logic information, and/or verifying semantic consistency between the text information and the domain knowledge, and determining abnormal logic information.
6. The method of claim 1, wherein the method further comprises: identifying the scene type of the image and acquiring domain knowledge corresponding to the scene type; the determining abnormal visual information based on the pixel information of the image to be detected includes: abnormal visual information is determined based on the domain knowledge and pixel information of the image to be detected.
7. The method of claim 1, wherein the method further comprises: Identifying the scene type of the image and acquiring the knowledge of the fake means corresponding to the scene type; the determining abnormal visual information based on the pixel information of the image to be detected includes: determining abnormal visual information based on the knowledge of the forgery means and pixel information of the image to be detected; the determining abnormal logic information based on the semantic conflict in the image comprises: Abnormal logic information is determined based on semantic conflict in the image based on the knowledge of the forgery means.
8. The method of claim 1, wherein the determining abnormal visual information based on pixel information of the image to be detected comprises: extracting global visual features based on pixel information of the image; Determining a candidate region where the abnormal feature is located based on the abnormal feature in the global visual feature; and determining abnormal visual information based on the pixel information of the candidate region.
9. The method of claim 1, wherein the determining an anomaly location of the image based on the anomaly visual information and the anomaly logic information comprises: And if the same area exists between the area pointed by the at least one piece of abnormal visual information and the area pointed by the at least one piece of abnormal logic information, determining the same area as the abnormal position of the image.
10. The method of claim 1, wherein the determining an anomaly location of the image based on the anomaly visual information and the anomaly logic information comprises: scoring the abnormal confidence coefficient of each piece of abnormal information to obtain the abnormal confidence coefficient corresponding to the abnormal information, wherein the abnormal information is the abnormal visual information and/or the abnormal logic information; Screening to obtain target abnormal information based on the abnormal confidence corresponding to the abnormal information; and determining the abnormal position of the image based on the target abnormal information.
11. The method of claim 1, wherein the determining abnormal visual information based on pixel information of an image to be detected, determining abnormal logic information based on semantic conflicts in the image, determining an abnormal location of the image based on the abnormal visual information and the abnormal logic information, comprises: inputting an image to be detected and a task prompt word into a multi-mode large model, outputting an abnormal position of the image by the multi-mode large model, wherein the task prompt word is used for guiding the multi-mode large model to determine abnormal visual information based on pixel information of the image, determining abnormal logic information based on semantic conflict in the image, and determining the abnormal position of the image based on the abnormal visual information and the abnormal logic information.
12. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-11.

Description

Method for detecting image and computing device Technical Field The embodiment of the specification belongs to the technical field of image processing, and particularly relates to a method and computing equipment for detecting images. Background With the rapid development of image processing technology, the detection requirements of various industries on image authenticity, compliance and the like are increasing. In the fields of e-commerce, advertising, digital media and the like, artificial intelligence generated Content (ARTIFICIAL INTELLIGENCE GENERATED Content, AIGC) has been widely used for automatic beautification of commodity images, including operations such as background replacement, light and shadow enhancement, text superposition and the like, so as to improve marketing effects. After that, the generated or edited image needs to be checked to ensure that the image meets the display requirements. Meanwhile, malicious attackers also use depth generation technologies such as diffusion models, generation countermeasure networks and the like to implement high-fidelity pixel-level tampering on text-containing key images such as certificates, notes, contracts and the like, such as replacing money, forging signatures or modifying key words. These tampered areas are difficult to detect by the naked eye and detection of the image is required to prevent such attacks. In addition, in the field of medical imaging, automatic detection technology based on computer vision is widely applied to scanning image analysis such as CT (Computed Tomography ), MRI (Magnetic Resonance Imaging, magnetic resonance imaging), ultrasound and the like, is used for locating abnormal areas such as nodules, tumors, hemorrhage or inflammation, and has extremely high requirements on detection accuracy and robustness. Wherein image generation or editing of AIGC or other deep learning models does not always maintain content consistency, local semantic conflicts and fine pixel anomalies are often introduced, i.e., there are anomaly locations in the image. For example, the physical property of the commodity is inconsistent with the superimposed text in the diagram (such as a black cup is displayed in the diagram, but a white cup is displayed in the text), the orientation of the commodity component is wrong, and the character modification area in the certificate is inconsistent with the surrounding image area in the aspects of font style, font rendering effect, edge continuity and the like. Because such anomalies often lack obvious stitching boundaries and significant noise differences, traditional detection methods for low-level visual cues (e.g., noise distribution, compression artifacts, etc.) tend to find false marks at the visual level, making it difficult to stably identify such anomaly locations from images. The common practice for anomaly detection of low-level visual cues is to divide the image into local blocks or sliding windows, extract noise residual statistics and compression artifact features in each local region, compare the noise residual statistics and compression artifact features with the "normal" statistics mode of the neighborhood or the whole image, calculate the degree of difference, and determine that certain regions are likely to be abnormal regions and form corresponding abnormal thermodynamic diagrams or region positioning results when certain regions deviate from the background or normal distribution in the low-level statistics. Such methods tend to be limited in effectiveness in scenes where visual marks are not apparent. For example, many existing detection models that rely on low-level visual cues are prone to failure when the tampered region is highly consistent with surrounding content at the pixel level, producing little visible artifacts. For example, an attacker counterfeits "-100 elements" in bank running water into "+10000 elements" and keeps fonts, colors, background textures and compression traces as consistent as possible with the original image, and in this case, the conventional method often has difficulty in locating or identifying a tampered specific area. Disclosure of Invention The invention aims to provide a method and computing equipment for detecting images, so as to accurately detect abnormal positions in the images. A first aspect of the present specification provides a method of detecting an image, the method comprising determining abnormal visual information based on pixel information of an image to be detected, determining abnormal logic information based on semantic conflicts in the image, and determining an abnormal position of the image based on the abnormal visual information and the abnormal logic information. A second aspect of the present specification provides a computing device comprising a memory having executable code stored therein and a processor which when executing the executable code implements the method of the first aspect. According to the technical scheme provide