Search

US-12620252-B2 - Information source detection using unique watermarks

US12620252B2US 12620252 B2US12620252 B2US 12620252B2US-12620252-B2

Abstract

Document source detection uses unique copies to identify sources of leaked documents. The unique copies are generated from an original document and include a unique watermarking of one or more perturbations to a feature of the original document. An artifact, such as a photo or copy, is derived from one of the unique copies, the unique copy from which it was derived is identified. To identify the unique copy, unique copy keypoints in the unique copies are matched to artifact keypoints in the artifact to align the artifact with a location within the unique documents. Pixel regions in the unique copies that include perturbations are used to identify corresponding pixel regions in the artifact. Pixels in these regions are compared to identify the unique copy from which the artifact was derived, thus identifying a possible source of the leaked document.

Inventors

  • Sanjay Krishnan
  • David Wong
  • Chad Voss
  • Troy Batterberry
  • Clayton Huthwaite
  • Colin Saunders
  • Stephen Bianamara
  • Parker Beck

Assignees

  • EchoMark, Inc.

Dates

Publication Date
20260505
Application Date
20230307

Claims (20)

  1. 1 . A system comprising: at least one processor; and one or more computer storage media storing computer-readable instructions thereon that when executed by the at least one processor cause the at least one processor to perform operations comprising: receiving an artifact derived from a unique copy of an original document; identifying artifact keypoints within the artifact, the artifact keypoints being identified for features that are reproduced between the artifact and the unique copy; matching the artifact keypoints to unique copy keypoints within the unique copy; comparing a first pixel region of the unique copy with a second pixel region of the artifact, the second pixel region of the artifact being located within an area of the artifact that corresponds to the first pixel region in the unique copy, the second pixel region of the artifact being determined from matching the artifact keypoints to the unique copy keypoints; and outputting an indication that the artifact was derived from the unique copy based on comparing the pixels of the first pixel region with the pixels of the second pixel region.
  2. 2 . The system of claim 1 , wherein identifying artifact keypoints comprises: identifying corner locations formed by content within the artifact; and associating the identified corner locations of the artifact as the artifact keypoints.
  3. 3 . The system of claim 1 , wherein matching the artifact keypoints to the unique copy keypoints comprises: identifying pixel neighborhoods comprising pixels surrounding the artifact keypoints and the unique copy keypoints; and vectorizing a feature of the pixels in the pixel neighborhoods, wherein the artifact keypoints are matched to the unique copy keypoints based on a vector distance between feature vectors of the pixels within the pixel neighborhoods of the artifact keypoints and feature vectors of the pixels within the pixel neighborhoods of the unique copy keypoints.
  4. 4 . The system of claim 3 , wherein the vectorized feature comprises a pixel intensity gradient.
  5. 5 . The system of claim 1 , further comprising performing a rigid transformation to limit an orientation of the artifact relative to the unique copy, wherein the matching is based on the rigid transformation.
  6. 6 . The system of claim 1 , wherein the unique copy is included as part of a set of unique copies derived from the original document, and the operations further comprise: selecting a subset of unique copies from the set of unique copies based on matching the artifact keypoints to the unique copy keypoints; comparing pixel regions of each unique copy in the subset with corresponding pixel regions of the artifact; and selecting the unique copy from the unique copies within the subset based on comparing the first pixel region with the second pixel region, wherein the indication is output in response to selecting the unique copy.
  7. 7 . The system of claim 1 , wherein the first pixel region of the unique copy is identified by a bounding box of an area in the unique copy comprising a perturbation made from the original document.
  8. 8 . A method performed by one or more processors, the method comprising: generating a set of unique copies of an original document; identifying unique copy keypoints in the unique copies of the set of unique copies; identifying pixel regions of pixels within the unique copies, the pixel regions comprising perturbations made from the original document; determining an artifact is derived from a unique copy of the set of unique copies based on artifact keypoints matching the unique copy keypoints of the unique copy and a comparison of a first pixel region of the unique copy to a second pixel region of the artifact, the second pixel region of the artifact corresponding in location to the first pixel region in the unique copy, wherein the artifact keypoints correspond to features that are reproduced between the artifact and the unique copy; and outputting an indication that the artifact is derived from the unique copy.
  9. 9 . The method of claim 8 , wherein identifying unique copy keypoints comprises: identifying corner locations formed by content within the unique copies; and associating the identified corner locations of the unique copies as the unique copy keypoints.
  10. 10 . The method of claim 8 , further comprising: identifying pixel neighborhoods comprising pixels surrounding the unique copy keypoints and the artifact keypoints; vectorizing a feature of the pixels in the pixel neighborhoods; and matching the artifact keypoints to the unique copy keypoints based on a vector distance between feature vectors of the pixels within the pixel neighborhoods of the artifact keypoints and feature vectors of the pixels within the pixel neighborhoods of the unique copy keypoints.
  11. 11 . The method of claim 10 , wherein the vectorized feature comprises a pixel intensity gradient.
  12. 12 . The method of claim 10 , further comprising performing a rigid transformation to limit an orientation of the artifact relative to the unique copy, wherein the matching is based on the rigid transformation.
  13. 13 . The method of claim 10 , further comprising: selecting a subset of unique copies from the set of unique copies based on matching the artifact keypoints to the unique copy keypoints; comparing pixel regions of each unique copy in the subset with pixel regions of the artifact; and selecting the unique copy from the unique copies within the subset based on comparing the first pixel region with the second pixel region, wherein the indication is output in response to selecting the unique copy.
  14. 14 . The method of claim 8 , wherein the first pixel region of the unique copy is identified by a bounding box of an area in the unique copy comprising a perturbation made from the original document.
  15. 15 . One or more computer storage media storing computer readable instructions thereon that, when executed by a processor, cause the processor to perform a method comprising: matching artifact keypoints to unique copy keypoints, the artifact keypoints identified in an artifact derived from a unique copy of an original document, the artifact keypoints corresponding to features that are reproduced between the artifact and the unique copy, the unique copy keypoints identified in the unique copy; identifying a first pixel region of pixels in the unique copy, the first pixel region comprising a perturbation made from the original document; identifying a second pixel region of pixels in the artifact, the second pixel region identified based on the pixels of the second pixel region having a location within the artifact that corresponds to the first pixel region of the unique copy based on matching the artifact keypoints to the unique copy keypoints; and outputting an indication that the artifact was derived from the unique copy based on the first pixel region compared to the second pixel region.
  16. 16 . The media of claim 15 , further comprising identifying the artifact keypoints and the unique copy keypoints, the artifact keypoints and the unique copy keypoints identified by: identifying corner locations formed by content within the unique copy and the artifact; associating the identified corner locations of the artifact as the artifact keypoints; and associating the identified corner locations of the unique copy as the unique copy keypoints.
  17. 17 . The media of claim 15 , wherein matching the artifact keypoints to the unique copy keypoints comprises: identifying pixel neighborhoods comprising pixels surrounding the artifact keypoints and the unique copy keypoints; and vectorizing a feature of the pixels in the pixel neighborhoods, wherein the artifact keypoints are matched to the unique copy keypoints based on a vector distance between feature vectors of the pixels within the pixel neighborhoods of the artifact keypoints and feature vectors of the pixels within the pixel neighborhoods of the unique copy keypoints.
  18. 18 . The media of claim 17 , wherein the vectorized feature comprises a pixel intensity gradient.
  19. 19 . The media of claim 15 , further comprising performing a rigid transformation to limit an orientation of the artifact relative to the unique copy, wherein the matching is based on the rigid transformation.
  20. 20 . The media of claim 15 , wherein the first pixel region of the unique copy is identified by a bounding box of an area in the unique copy comprising the perturbation made from the original document.

Description

BACKGROUND Organizations often create and distribute information. In some cases, this information may be harmful if leaked. Methods that identify the source of document leaks raise stewardship of information. SUMMARY At a high level, the technology relates to generating and detecting unique copies of original documents. More specifically, the technology relates to generating unique copies of original documents, and identifying which unique variation of the original was leaked. Initially, a set of unique copies of a document is generated from an original document. Each unique copy includes a watermarking of one or more perturbations to content in the original document. The unique copies are then individually distributed. If one of the unique copies is leaked, then an artifact can be used to identify which unique copy from which the artifact was derived. The artifact may be any reproduction of the unique document, in whole or in part, such as a photo or print of the unique copy. To determine the unique copy, the artifact is visually aligned with the unique copies to determine the location from which the artifact was derived. Artifact keypoints within the artifact are matched to unique copy keypoints in the unique copies to identify the alignment. Pixel regions of the unique copies that are known to have perturbations are compared to corresponding pixel regions in the artifact. The corresponding pixels and pixel region of the artifact used in the comparison are determined from the alignment. Based on the comparison in the pixel regions, the unique copy from which the artifact was derived is identified, which indicates a likely source of the leak. This summary is intended to introduce a selection of concepts in a simplified form that is further described in the Detailed Description section of this disclosure. The Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be an aid in determining the scope of the claimed subject matter. Additional objects, advantages, and novel features of the technology will be set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the disclosure or learned through practice of the technology. BRIEF DESCRIPTION OF THE DRAWINGS The present technology is described in detail below with reference to the attached drawing figures, wherein: FIG. 1 illustrates an example operating environment in which aspects of the technology can be employed, in accordance with an aspect described herein; FIG. 2 illustrates an example set of unique copies generated from an original document and artifacts derived therefrom, in accordance with an aspect described herein; FIG. 3 illustrates an example process of generating unique copies and identifying a unique copy from which an artifact is derived, in accordance with an aspect described herein; FIG. 4 illustrates an example unique copy, in accordance with an aspect described herein; FIG. 5 illustrates the unique copy of FIG. 4 having an example set of bounding boxes identifying pixel regions therein, in accordance with an aspect described herein; FIG. 6 illustrates the unique copy of FIG. 4 having example unique copy keypoints, in accordance with an aspect described herein; FIG. 7 illustrates an example artifact derived from the unique copy of FIG. 4, in accordance with an aspect described herein; FIG. 8 illustrates the example artifact of FIG. 7 having example artifact keypoints, in accordance with an aspect described herein; FIG. 9 is an illustrative matching of the unique copy keypoints of FIG. 6 with the artifact keypoint of FIG. 8, in accordance with an aspect described herein; FIG. 10 illustrates an overlay of the unique copy of FIG. 6 with the artifact of FIG. 8 based on the matching shown in FIG. 9, in accordance with an aspect described herein; FIG. 11 illustrates the unique copy of FIG. 4 with the bounding boxes of FIG. 5 overlaid with the artifact of FIG. 7, in accordance with an aspect described herein; FIG. 12 illustrates an example process of generating a subset of unique copies based on keypoint matching and identifying a unique copy from a subset, in accordance with an aspect described herein; FIGS. 13-15 are example methods for identifying a unique copy from which an artifact was derived, in accordance with aspects described herein; FIG. 16 illustrates an example computing device in which aspects of the technology may be employed, in accordance with an aspect described herein; and FIG. 17 illustrates a table comprising example template matching operations suitable for use as example sub-image similarity metrics for comparing pixel regions, in accordance with an aspect described herein. DETAILED DESCRIPTION Existing systems for managing private documents lack robust detection methods for identifying sources of document leaks. In particular, many of these systems fail when a recovered artifact is only a frag