US-12620199-B2 - Advanced image hashing for illegal image detection
Abstract
One example method includes receiving a digital image, receiving an input list that includes a respective hash for each frame in a group of frames, creating new frames on a particular area of the digital image, and the new frames are created and located based on the information in the input list, obtaining a respective hash for content included within each of the new frames, comparing one of the hashes generated for one of the new frames with a hash from the input list and, when the hashes match, continue the comparing for all frames of the particular area, and when all hashes from the input list have been checked, determining whether or not the digital image is an illegal image.
Inventors
- Ohad Arnon
- Adriana Bechara Prado
Assignees
- DELL PRODUCTS L.P.
Dates
- Publication Date
- 20260505
- Application Date
- 20231218
Claims (20)
- 1 . A method, comprising: receiving a digital image; receiving an input list that includes a respective hash for each frame in a group of frames; creating new frames on a particular area of the digital image, and the new frames are created and located based on information in the input list; obtaining a respective hash for content included within each of the new frames; comparing one of the hashes generated for one of the new frames with a hash from the input list and, when the hashes match, continue the comparing for all frames of the particular area; and when all hashes from the input list have been checked, determining whether or not the digital image is an illegal image.
- 2 . The method as recited in claim 1 , wherein the input list includes a reference for each of the frames that indicates where, in a coordinate system, the frame is located.
- 3 . The method as recited in claim 1 , wherein the hashes in the input list are each a hash of respective content located within the frames.
- 4 . The method as recited in claim 1 , wherein the creating, the obtaining, and the comparing are performed without accessing any actual content of the digital image.
- 5 . The method as recited in claim 1 , wherein the digital image is deemed to be an illegal image when a ratio of matches meets or exceeds a defined threshold.
- 6 . The method as recited in claim 5 , wherein the ratio is a ratio of hash matches to a total number of the new frames.
- 7 . The method as recited in claim 1 , wherein when the hash generated for the one new frame does not match the hash from the input list, comparing, with a different hash from the input list, another hash generated for another new frame in a different area of the digital image.
- 8 . The method as recited in claim 1 , wherein one of the new frames overlaps with another of the new frames, and a further new frame does not overlap with any of the new frames.
- 9 . The method as recited in claim 1 , wherein the particular area of the digital image is an area suspected as possibly embracing illegal content.
- 10 . The method as recited in claim 1 , wherein the creating, the obtaining, the comparing, and the determining, are performed so as to maintain privacy of content of the digital image.
- 11 . A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: receiving a digital image; receiving an input list that includes a respective hash for each frame in a group of frames; creating new frames on a particular area of the digital image, and the new frames are created and located based on information in the input list; obtaining a respective hash for content included within each of the new frames; comparing one of the hashes generated for one of the new frames with a hash from the input list and, when the hashes match, continue the comparing for all frames of the particular area; and when all hashes from the input list have been checked, determining whether or not the digital image is an illegal image.
- 12 . The non-transitory storage medium as recited in claim 11 , wherein the input list includes a reference for each of the frames that indicates where, in a coordinate system, the frame is located.
- 13 . The non-transitory storage medium as recited in claim 11 , wherein the hashes in the input list are each a hash of respective content located within the frames.
- 14 . The non-transitory storage medium as recited in claim 11 , wherein the creating, the obtaining, and the comparing are performed without accessing any actual content of the digital image.
- 15 . The non-transitory storage medium as recited in claim 11 , wherein the digital image is deemed to be an illegal image when a ratio of matches meets or exceeds a defined threshold.
- 16 . The non-transitory storage medium as recited in claim 15 , wherein the ratio is a ratio of hash matches to a total number of the new frames.
- 17 . The non-transitory storage medium as recited in claim 11 , wherein when the hash generated for the one new frame does not match the hash from the input list, comparing, with a different hash from the input list, another hash generated for another new frame in a different area of the digital image.
- 18 . The non-transitory storage medium as recited in claim 11 , wherein one of the new frames overlaps with another of the new frames, and a further new frame does not overlap with any of the new frames.
- 19 . The non-transitory storage medium as recited in claim 11 , wherein the particular area of the digital image is an area suspected as possibly embracing illegal content.
- 20 . The non-transitory storage medium as recited in claim 11 , wherein the creating, the obtaining, the comparing, and the determining, are performed so as to maintain privacy of content of the digital image.
Description
FIELD OF THE INVENTION Embodiments of the present invention generally relate to image analysis. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods, for analyzing images to determine whether or not the images are illegal. BACKGROUND Illegal content and especially the phenomenon of child abuse images is growing at a very high rate. According to the IWF (Internet Watch Foundation) annual report, the total number of category A images in 2022 was twice the amount in 2020. Such increase was partly due to criminal sites selling videos and images of such abuse and, therefore, it has become a problem that concerns most countries of the world. More and more countries are defining regulations that require service providers, such as ISPs (internet service providers) to identify illegal content, block it, and report it to the authorities. Examples are the EU Digital Services Act and UK Online Safety Bill. The advent of such regulations introduced an “explicit duty” for firms to design websites and services in a way that mitigates against the possibility that their platform will host illegal activity or content. In order to comply with so many regulations, the service providers need to be able to identify illegal content with high accuracy. This is a huge challenge, since technology that would provide such capabilities must also comply with various data privacy regulations that prevent data analyses unless the data owner allows the service provider to do so. In more detail, aiming at identifying and reporting child abuse images, while maintaining data privacy, organizations like IWF, which report offensive images online, distribute the hash value of offensive images. In this way, to detect abuse images, service providers can compare the distributed hash value with that of the images uploaded to their servers. If a match is found, the image is downloaded from the server and reported to the authorities. Note, however, that such technology that uses a hash value was created specifically for finding identity between images and is not particularly suited for images of child abuse. Since a hash value is unique to each image, changing one pixel in an image will change its hash value. More specifically, if criminals change a single pixel in the distributed images, it can be difficult to identify the content of the image using the simplistic approach of hash value comparison. On top of these technologies, there are some improvements, such as perceptual hashing, which is less sensitive to pixel changes. Again however, these technologies have not been designed to detect child abuse images, and therefore, in this domain at least, they suffer from many shortcomings. For example, if the image size changes, the perceptual hash value will also change, so it is very easy to bypass the detection mechanism by simply cropping the image. In addition, using a perceptual hash value may cause many false positives (FPs), which can cause false reports to the authorities, blaming criminal activities on legitimate companies. BRIEF DESCRIPTION OF THE DRAWINGS In order to describe the manner in which at least some of the advantages and features of the invention may be obtained, a more particular description of embodiments of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings. FIG. 1 discloses aspects of an approach for creating a perceptual hashing of an image. FIG. 2 discloses an example where two different images result in the same perceptual hash. FIG. 3 discloses a table of image manipulation techniques, and the accuracy of the different perceptual hashing resulting in the same hash code as the image source. FIG. 4 discloses aspects of an example embodiment for processing a possibly suspect image. FIG. 5 discloses an example method and architecture according to one embodiment. FIG. 6 discloses an example computing entity configured to perform any of the disclosed methods, processes, and operations. DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS Embodiments of the present invention generally relate to image analysis. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods, for analyzing images to determine whether or not the images are illegal. One example embodiment comprises a method for determining whether or not an image is illegal, such as an image that comprises illegal content. In a first phase of the example method, frames, such as squares for example, are created around different selected portions of an image