Search

US-12619689-B1 - Data watermarking

US12619689B1US 12619689 B1US12619689 B1US 12619689B1US-12619689-B1

Abstract

Systems and methods for tracking and monitoring the distribution, and consumption of data regardless of format are robust, durable, and can include both human and machine-readable hidden mark data. These extrinsic yet invisible identifiers are tamper resistant and tamper evident, allowing for origin and integrity of valuable data to be ensured. A content management and distributed content distribution network including distributed ledger technology is enabled by smart watermarking using wave synthesis techniques that can survive massive marking data loss or attack.

Inventors

  • Johnathan William Brownlee

Assignees

  • KMG Labs

Dates

Publication Date
20260505
Application Date
20240918

Claims (20)

  1. 1 . A method of identifying data, the method comprising: a. generating a marking signal, the marking signal generated by combining marking parameters and mark data with at least one wave synthesis function; b. embedding a representation of the marking signal within host data to yield marked data; c. recovering at a later time a retrieved marking signal from the marked data; d. extracting, from the retrieved marking signal, retrieved mark data by applying an inverse of the at least one wave synthesis function, the at least one wave synthesis function provided the marking parameters; and e. comparing the retrieved mark data with the mark data used to generate the marking signal.
  2. 2 . The method of claim 1 wherein the marking signal is a computer-generated hologram, and the marking parameters are holographic recording parameters.
  3. 3 . The method of claim 1 wherein the at least one wave synthesis function is a discrete Fourier transform, and the inverse of the at least one wave synthesis function is an inverse discrete Fourier transform.
  4. 4 . The method of claim 1 wherein embedding the representation of the marking signal includes at least one of a quantization index modulation of a component of the marking signal, least significant bit encoding, or an output of a neural network.
  5. 5 . The method of claim 1 wherein recovering the retrieved marking signal includes reconstructing a computer-generated hologram using holographic parameters as the marking parameters.
  6. 6 . The method of claim 1 wherein the mark data comprises: an image, an encryption key, a machine code, a bitstream, a table of key-value pairs, an executable program, an N-dimensional matrix, a uniform resource identifier, a uniform resource locator, a QR code, a barcode, second marking parameters, a second marked data, or a second marking signal.
  7. 7 . The method of claim 1 where comparing the retrieved mark data with the mark data includes at least one of: computing a correlation of the retrieved mark data and the mark data, comparing a hash of the retrieved mark data and a hash of the mark data, evaluating a result of executing a machine code or executable program within the retrieved mark data, a bitwise comparison of the retrieved mark data and the mark data, or comparing a spectrum of the retrieved mark data with a spectrum of the mark data.
  8. 8 . The method of claim 1 where generating a marking signal comprises generating a quantized kinoform of the mark data.
  9. 9 . The method of claim 1 where comparing the retrieved mark data with the mark data includes computing a measure of a change between the mark data and the retrieved mark data.
  10. 10 . A system for identifying data, the system comprising: a. a marking signal generator that combines marking parameters and mark data using at least one wave synthesis algorithm to generate a marking signal; b. an encoder that processes the marking signal into an embeddable marking signal that can me embedded in a type of host data; c. an embedder that adds the embeddable marking signal to the host data, outputting marked host data; d. an extractor that extracts the embeddable marking signal from the marked host data, outputting the embeddable marking signal; e. a decoder that reconstructs retrieved mark data from the embeddable marking signal using the marking parameters; and f. a comparer that processes the retrieved mark data and determines a difference between the mark data used to generate the marking signal and the retrieved mark data.
  11. 11 . The system of claim 10 wherein the at least one wave synthesis algorithm the marking signal generator uses is one of a Fourier transform, a Laplace transform, a wavelet transform, an iterative Fresnel transform algorithm, a Feinup algorithm, or a Gerchberg-Saxon algorithm.
  12. 12 . The system of claim 10 wherein the decoder also applies an error correction to the retrieved mark data.
  13. 13 . The system of claim 10 wherein the comparer also compares a result of executing the retrieved mark data as machine code or an executable program.
  14. 14 . The system of claim 10 wherein the mark data comprises: an image, an encryption key, a machine code, a bitstream, a table of key-value pairs, an executable program, an N-dimensional matrix, a uniform resource identifier, a uniform resource locator, a QR code, a barcode, second marking parameters, a second marked data, or a second marking signal.
  15. 15 . A system for distributing host data comprising: a. a catalog, hosted on one or more catalog nodes, for tracking users requesting host data, the host data identified by a data identifying method including mark data, where the catalog records usage rates of the host data requested by the users; b. a data identification system that tracks in the catalog the host data the users are presently requesting by a comparison of mark data in the catalog with retrieved mark data determining a number of users requesting the host data; and c. distribution nodes that provide to the users the host data, where the number of distribution nodes providing the host data in the catalog is adjusted in response to the number of users requesting the host data determined by the comparison of mark data in the catalog with the retrieved mark data from the users.
  16. 16 . The system of claim 15 where the data identification system uses mark data encoded into a marking signal that is embedded in the host data.
  17. 17 . The system of claim 15 where the number of users are viewers of a media stream, and the host data is a part of the media stream.
  18. 18 . The system of claim 15 where the comparison of mark data in the catalog and the retrieved mark data reported by the users are used to detect a degree of change in the host data received by the users from that stored in the distribution nodes.
  19. 19 . The system of claim 15 wherein the distribution nodes comprise a content distribution network for music, video, text, or interactive entertainment data as the host data.
  20. 20 . The system of claim 15 wherein the catalog and the catalog nodes further comprise a distributed ledger hosted by one or more of the catalog nodes.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. provisional patent application Ser. 63/539,091 filed Sep. 18, 2023. The entire contents of the identified priority document are hereby incorporated herein by reference. STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT Not Applicable REFERENCE TO SEQUENCE LISTING, A TABLE, OR A COMPUTER PROGRAM LISTING COMPACT DISK APPENDIX Not Applicable TECHNICAL FIELD The present invention relates to digital watermarking of data, in particular the marking and tracking of digital data that resists manipulation or removal. BACKGROUND OF THE INVENTION Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section. Digital data is an increasingly valuable and defined object of property, rising above a simple commodity; copyrighted works such as motion pictures, music recordings, printed word, photos and art, timely market data for trading stocks or bonds, algorithms, source code, and news articles from around the world are all valuable data. In many cases, the value of the data is intrinsic and related to where it comes from, such as the authorship of a report, timeliness of a stock price in a particular market, or a genuine licensed copy of a creative work like an album, video performance, book, or song. Who and where valuable data is attributed to strongly determines the potential value of that data, as well as determining that the data was acquired through bona-fide means from its owner or licensee. Digital watermarking systems are one way that electronically represented data can be embedded with a durable code that is not necessarily perceptible to the user but can be detected through a predetermined detection process. In these systems, an encoder adds a pre-determined watermarking signal to a host media signal represented by the data. One common example of a digital watermark is a hidden logo or signal added into a host data stream like an image or sound file so the digital watermark can be later recovered as evidence of where that particular host data originated. Although images, music and video data are commonly watermarked all structured data ca be watermarked with sufficiently flexible schemes; watermarking is simply injecting one signal (the mark) into a second signal (the host data to be marked) in such a way as to make it unobvious when the host data is being consumed in its usual idiom. Plain text, for example, can be watermarked by using word-choice and frequency schemes to provide a host signal that can encode the marking signal. Watermarking is differenced from the related topics of steganography and encryption. Steganography seeks to make detecting the hidden signal undetectable by all parties but an intended recipient, where encryption seeks to make the valuable host data simply unusable without the use of a key to decrypt and render usable the host data. Simply comparing the mark data retrieved from host data that bears a mark with the original mark measures the authenticity and degree of any adulteration the marked data may have seen in transit. Finally, with the advent of generative artificial intelligence (AI) and related phenomena such as deep-fakes and generative text, there is a need to mark content that may be used to train diffusion or adversarial content generators. Examples of such generative adversarial network AI (“GAN AI”) services include Stable Diffusion, DALL-E/CLIP, Google's Bard, ChatGPT, and its meta-learning-based platform models such as GPT-3 and GPT-4, and Midjourney. A watermarking technique where the majority of the marking signal could be destroyed or replaced, and yet the mark is still discernably recovered, is of great interest in tracking the input and output to GAN AI models and the content mixed into the digital zeitgeist. There is therefore a need for systems and methods for marking data so that it can later be attributed to a source or point of release and tracked by efficient and effective watermarking. SUMMARY OF THE INVENTION The present invention is a general-purpose watermarking technique using digital holography to generate and embed mark data within the signal(s) of host data via a marking signal generated from the mark data and embedded in the host data itself. The host data is thus altered in order to embed the computer-generated digital hologram of the mark data so that a reading component can analyze the host signal to detect whether a mark is present, and if so to extract and identify that mark. Holography is used because of several properties known in the art of holograms, such as holograms' resistance to digital manipulation, signal loss, rescaling/transcoding, cropping, occlusion, compression and compressed dynamic range; a small part of a hologram represented in a few bits of dynamic range can reconstruct a mark with a high correlation