Search

CN-115905589-B - Image de-duplication method, terminal device and computer storage medium

CN115905589BCN 115905589 BCN115905589 BCN 115905589BCN-115905589-B

Abstract

The application discloses an image de-duplication method, a terminal device and a computer storage medium, wherein the image de-duplication method comprises the steps of obtaining a first image set; the method comprises the steps of reading a plurality of second image sets according to an image index table, traversing each second image set, executing first de-duplication operation on the first image set to obtain a third image set formed after de-duplication of the first image set, wherein the first de-duplication operation comprises traversing each second image of the second image set, executing image set updating operation on the first image set by utilizing each second image, and the image set updating operation comprises the steps of obtaining first similarity between the second image and all first images in the first image set, and deleting the first image corresponding to the first similarity higher than a preset threshold value from the first image set when the first similarity is higher than the preset threshold value so as to update the first image set. The image deduplication method can improve the image deduplication efficiency through the segmentation task, reduce the matching times of deduplication each time, and realize the processing of mass images.

Inventors

  • Shang Shouwang
  • ZHOU XIANGMING
  • WU LI
  • HUANG PENG
  • ZHANG PENG
  • CAI DANPING
  • ZHENG CHUNHUANG

Assignees

  • 浙江大华技术股份有限公司

Dates

Publication Date
20260508
Application Date
20221208

Claims (7)

  1. 1. An image deduplication method, characterized in that the image deduplication method comprises: Acquiring a first image set; Reading a plurality of second image sets according to an image index table, wherein each second image in the image index table exists in one of the second image sets; Traversing each second image set, and executing a first de-duplication operation on the first image set to obtain a third image set formed after de-duplication of the first image set; The first deduplication operation includes: Traversing each second image of the second image set, and performing an image set update operation on the first image set by using each second image, wherein the image set update operation comprises the following steps: Acquiring first similarity of the second image and all first images in the first image set, deleting the first image corresponding to the first similarity higher than a preset threshold from the first image set when the first similarity is higher than the preset threshold, so as to update the first image set; The acquiring a first image set includes: Dividing the first image set into a plurality of first image subsets, wherein any one first image of each first image subset is not existed in other first image subsets; Determining each two first image subsets as a group of image subsets, and executing second de-duplication operation on each group of image subsets to obtain third image subsets formed by merging each group of image subsets, wherein the similarity of any two first images contained in each third image subset is lower than a second preset threshold; the second deduplication operation includes: Selecting a first target image of a first target image subset of the image subsets, and executing an image subset update operation on the first target image subset; the image subset updating operation comprises deleting a first target image with the second similarity higher than the preset threshold value from the first target image subset based on the second similarity of the first target image and all first non-target images of the first non-target image subset in the image subset, and updating the first target image subset; Continuing to select unselected first target images in the image subset to execute the image subset updating operation until all first target images in the first target image subset are traversed; after the third image subset formed by merging each group of image subsets is obtained, the image de-duplication method further comprises: Acquiring the number of the third image subsets; Outputting the third subset of images when the number of the third subset of images is 1; And when the number of the third image subsets is greater than 1, determining each two third image subsets as a group of image subsets, and executing the second de-duplication operation on each group of image subsets until the number of the third image subsets formed by merging each group of image subsets is 1.
  2. 2. The method for image deduplication as claimed in claim 1, wherein, After the first image set is divided into a plurality of first image subsets, the image deduplication method further comprises: performing a third deduplication operation on each first subset of images to update each first subset of images; the third deduplication operation includes: Selecting a first sub-image of the first image subset, and executing inter-set deduplication operation; The inter-set deduplication operation comprises the steps of obtaining third similarity between the first sub-image and other first sub-images, and deleting the first sub-image with the third similarity higher than the preset threshold from the first image subset; and continuing to select one first sub-image which is not selected in the first image subset, and executing inter-set de-duplication operation until the selection of the first sub-image in the first image subset is completed, wherein the first sub-image which is not selected comprises a first sub-image with a third similarity which is matched in the image subset and is lower than the preset threshold value.
  3. 3. The method for image deduplication as claimed in claim 1, wherein, The image de-duplication method further comprises the following steps: Updating the image index table with the third image set.
  4. 4. An image deduplication method, characterized in that the image deduplication method comprises: acquiring a first image set, dividing the first image set into a plurality of first image subsets, wherein any one first image of each first image subset does not exist in other first image subsets; performing a first deduplication operation on each first subset of images to update each first subset of images; The first deduplication operation includes: Selecting a first sub-image of the first image subset, and executing inter-set deduplication operation; The inter-set deduplication operation comprises the steps of obtaining first similarity between the first sub-image and other first sub-images, and deleting the first sub-images with the first similarity higher than a preset threshold from the first image subset; continuing to select one first sub-image which is not selected in the first image subset, and executing inter-set de-duplication operation until the selection of the first sub-image in the first image subset is completed, wherein the first sub-image which is not selected comprises a first sub-image, which is matched in the image subset, with a first similarity lower than the preset threshold value; after the performing a first deduplication operation on each first subset of images to update each first subset of images, the image deduplication method further comprises: Determining each two first image subsets as a group of image subsets, and executing second de-duplication operation on each group of image subsets to obtain third image subsets formed by merging each group of image subsets, wherein the similarity of any two first images contained in each third image subset is lower than a second preset threshold; the second deduplication operation includes: Selecting a first target image of a first target image subset of the image subsets, and executing an image subset update operation on the first target image subset; the image subset updating operation comprises deleting a first target image with the second similarity higher than the preset threshold value from the first target image subset based on the second similarity of the first target image and all first non-target images of the first non-target image subset in the image subset, and updating the first target image subset; and continuing to select the unselected first target images in the image subset to execute the image subset updating operation until all the first target images in the first target image subset are traversed.
  5. 5. The method for image deduplication as claimed in claim 4, wherein, After the third image subset formed by merging each group of image subsets is obtained, the image de-duplication method further comprises: Reading a plurality of second image sets according to an image index table, wherein each second image in the image index table exists in one of the second image sets; traversing each second image set, and executing third de-duplication operation on the third image subset to obtain a third image set formed after de-duplication of the third image subset; the third deduplication operation includes: traversing each second image of the second image set, performing an image set update operation on the third subset of images using the each second image, the image set update operation comprising: And acquiring third similarity between the second image and all the first images in the third image subset, and deleting the first image corresponding to the third similarity higher than the preset threshold from the third image subset when the third similarity is higher than the preset threshold, so as to update the third image subset.
  6. 6. A terminal device, comprising a memory and a processor coupled to the memory; The memory is configured to store program data, and the processor is configured to execute the program data to implement the image deduplication method according to any one of claims 1 to 5.
  7. 7. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the image deduplication method of any of claims 1 to 5.

Description

Image de-duplication method, terminal device and computer storage medium Technical Field The present application relates to the field of image processing technologies, and in particular, to an image deduplication method, a terminal device, and a computer storage medium. Background The method has the advantages that a large number of images with the same or similar content exist in the image database, storage resource waste is caused, the display of too many repeated images in the display image also affects user experience, and for some algorithms driven by data (such as a neural network and the like), the too many repeated images increase the training time of the algorithm and cause deviation of the final training result. The traditional image de-duplication method is to match the similarity of every two images to be de-duplicated, and if the two images are considered to be similar, deleting one image. The whole process consumes a lot of time, and as the data size increases, the time consumption and memory occupation increase significantly. Disclosure of Invention The application provides an image deduplication method, terminal equipment and a computer storage medium. The application adopts a technical scheme that an image de-duplication method is provided, and the image de-duplication method comprises the following steps: Acquiring a first image set; Reading a plurality of second image sets according to an image index table, wherein each second image in the image index table exists in one of the second image sets; Traversing each second image set, and executing a first de-duplication operation on the first image set to obtain a third image set formed after de-duplication of the first image set; The first deduplication operation includes: Traversing each second image of the second image set, and performing an image set update operation on the first image set by using each second image, wherein the image set update operation comprises the following steps: And acquiring first similarity of the second image and all first images in the first image set, and deleting the first image corresponding to the first similarity higher than the preset threshold from the first image set when the first similarity is higher than the preset threshold so as to update the first image set. Wherein the acquiring a first image set includes: Dividing the first image set into a plurality of first image subsets, wherein any one first image of each first image subset is not existed in other first image subsets; Determining each two first image subsets as a group of image subsets, and executing second de-duplication operation on each group of image subsets to obtain third image subsets formed by merging each group of image subsets, wherein the similarity of any two first images contained in each third image subset is lower than a second preset threshold; the second deduplication operation includes: Selecting a first target image of a first target image subset of the image subsets, and executing an image subset update operation on the first target image subset; the image subset updating operation comprises deleting a first target image with the second similarity higher than the preset threshold value from the first target image subset based on the second similarity of the first target image and all first non-target images of the first non-target image subset in the image subset, and updating the first target image subset; and continuing to select the unselected first target images in the image subset to execute the image subset updating operation until all the first target images in the first target image subset are traversed. Wherein after the third image subset formed by merging each group of image subsets is obtained, the image de-duplication method further comprises: Acquiring the number of the third image subsets; Outputting the third subset of images when the number of the third subset of images is 1; And when the number of the third image subsets is greater than 1, determining each two third image subsets as a group of image subsets, and executing the second de-duplication operation on each group of image subsets until the number of the third image subsets formed by merging each group of image subsets is 1. Wherein after the first image set is divided into a plurality of first image subsets, the image deduplication method further comprises: performing a third deduplication operation on each first subset of images to update each first subset of images; the third deduplication operation includes: Selecting a first sub-image of the first image subset, and executing inter-set deduplication operation; The inter-set deduplication operation comprises the steps of obtaining third similarity between the first sub-image and other first sub-images, and deleting the first sub-image with the third similarity higher than the preset threshold from the first image subset; and continuing to select one first sub-image which is not selected in the