US-12627645-B2 - Protecting intellectual property using digital signatures

US12627645B2US 12627645 B2US12627645 B2US 12627645B2US-12627645-B2

Abstract

Methods of protecting intellectual property using digital signatures include generating reference digital signatures of first content, generating a digital signature of second content, comparing the digital signature of the second content to the reference digital signatures to identify matching reference digital signatures, and selectively performing an action based on the matching reference digital signatures and a policy. The first content may include proprietary information of an organization and/or posts of a machine-learning (ML) model. The second content may include source code intercepted from a transmission directed to an external site, such as a ML model, and/or source code saved to a source code repository. Actions may include, without limitation, initiating an audit of the second content, sending a notification to a user interface indicating that the second content likely contains a portion of the first content, releasing the transmission, and/or terminating the transmission.

Inventors

Damon A. Weinstein
Kathleen E. SIMMONS
Mayur Kadu
Jay RICCO
Jagat Prakashchandra Parekh
Sai Keerthy Kakarla
Matthew Fenwick

Assignees

BLACK DUCK SOFTWARE, INC.

Dates

Publication Date: 20260512
Application Date: 20240311

Claims (18)

1 . A method, comprising: generating reference digital signatures of first content; intercepting second content of a network transmission directed to an external network; generating a digital signature of the second content; comparing, by a processing device, the digital signature of the second content to the reference digital signatures of the first content to identify matching digital signatures of the first content; and selectively performing an action based on the matching digital signatures and a policy, wherein selectively performing the action comprises initiating an audit of the second content if a number of the matching digital signatures meets a threshold of the policy.
2 . The method of claim 1 , wherein: the second content comprises source code; and the external network comprises a generative artificial intelligence model (AIML).
3 . The method of claim 1 , wherein the initiating an audit comprises: capturing one or more of, the second content, a user identifier associated with the second content, a time-stamp associated with the second content, the digital signature of the second content, the matching digital signatures, and the first content corresponding to the matching digital signatures.
4 . The method of claim 1 , further comprising one or more of: generating the reference digital signatures and the digital signature of the second content based on a policy that specifies one or more of multiple methods of generating digital signatures; generating the reference digital signatures and the digital signature of the second content based on a policy that specifies one or more levels of granularity at which to generate the digital signatures; generating the reference digital signatures and the digital signature of the second content based on a content leakage risk identification policy; generating the reference digital signatures and the digital signature of the second content based on a policy that specifies content types for which digital signatures are to be generated; updating a database of the reference digital signatures based a policy that specifies one or more of an update frequency and an update schedule; and comparing the digital signature of the second content to the reference digital signatures of the first content based on a policy that specifies one or more levels of digital signature granularity at which to compare the digital signatures.
5 . A method comprising: generating reference digital signatures of first content; intercepting second content of a network transmission directed to an external network; generating a digital signature of the second content; comparing, by a processing device, the digital signature of the second content to the reference digital signatures of the first content to identify matching digital signatures of the first content; and selectively performing an action based on the matching digital signatures and a policy, wherein selectively performing the action comprises sending a notification to a user interface indicating that the second content likely contains a portion of the first content if a number of the matching digital signatures meets a threshold of the policy.
6 . A method comprising: generating reference digital signatures of first content; intercepting second content of a network transmission directed to an external network; pausing the network transmission; generating a digital signature of the second content while the network transmission is paused; and comparing, by a processing device, the digital signature of the second content to the reference digital signatures of the first content while the network transmission is paused to identify matching digital signatures of the first content; and selectively performing an action based on the matching digital signatures and a policy.
7 . The method of claim 6 , wherein the selectively performing an action comprises: releasing the network transmission if a number of the matching digital signatures does not meet a threshold of the policy.
8 . The method of claim 6 , wherein the selectively performing an action comprises: terminating the network transmission if a number of the matching digital signatures meets a threshold of the policy.
9 . The method of claim 6 , wherein the selectively performing an action comprises: sending a notification to a user interface if a number of the matching digital signatures meets a threshold of the policy, wherein the notification comprises a notice that the second content likely contains a portion of the first content and one or more of a link to permit a user to release the network transmission and a link to permit the user to terminate the network transmission.
10 . The method of claim 6 , wherein the selectively performing an action comprises: releasing the network transmission if a number of the matching digital signatures does not meet a first threshold of the policy; initiating an audit of the second content if the number of the matching digital signatures meets the first threshold; and terminating the network transmission if the number of the matching digital signatures meets a second threshold of the policy, wherein the second threshold is higher than the first threshold.
11 . The method of claim 10 , wherein the selectively performing an action further comprises: sending a notification to a user interface if the number of the matching digital signatures is between the first and second thresholds, wherein the notification comprises a notice that the second content likely contains a portion of the first content, a link to permit a user to release the network transmission, and a link to permit the user to terminate the network transmission.
12 . A method comprising: generating multiple reference digital signatures for a unit of first content at multiple respective levels of granularity based on features of the unit of the first content; intercepting second content of a network transmission directed to an external network; generating multiple digital signatures for the second content at the multiple respective levels of granularity based on features of the second content; comparing one or more of the multiple reference digital signatures to one or more of the corresponding multiple digital signatures of the second content based on a policy, wherein the features of the unit of the first content and the features of the second content comprise one or more of snippets of the respective content, metadata of the respective content, and directory structure descriptors of the respective content; and selectively performing an action based on the matching digital signatures and the policy.
13 . A non-transitory computer readable medium comprising stored instructions, which when executed by a processor, cause the processor to: generate reference digital signatures of posts of a machine learning (ML) model; generate a digital signature of source code submitted to a source code repository; compare the digital signature of the source code to the reference digital signatures of the posts of the ML model to identify matching digital signatures of the posts of the ML model; and selectively perform an action based on the matching digital signatures and a policy, wherein selectively performing the action comprises initiating an audit of the source code if a number of the matching digital signatures meets a threshold of the policy.
14 . The non-transitory computer readable medium of claim 13 , wherein the stored instructions, when executed, further cause the processor to initiate the audit by: capturing one or more of, the source code, a user identifier associated with the source code, a time-stamp associated with the source code, the digital signature of the source code, the matching digital signatures, and the ML posts corresponding to the matching digital signatures.
15 . The non-transitory computer readable medium of claim 13 , wherein the stored instructions, when executed, further cause the processor to selectively perform the action by: initiating an audit of the source code and sending a notification to a user interface indicating that the source code likely contains a portion of the posts of the ML model, if a number of the matching digital signatures meets a threshold of the policy.
16 . A non-transitory computer readable medium comprising stored instructions, which when executed by a processor, cause the processor to: generate reference digital signatures of posts of a machine learning (ML) model; generate a digital signature of source code submitted to a source code repository; compare the digital signature of the source code to the reference digital signatures of the posts of the ML model to identify matching digital signatures of the posts of the ML model; and selectively perform an action based on the matching digital signatures and a policy, wherein selectively performing the action comprises sending a notification to a user interface indicating that the source code likely contains a portion of the ML posts if a number of the matching digital signatures meets a threshold of the policy.
17 . A system comprising: a memory storing instructions; and a processor, coupled with the memory and to execute the instructions, the instructions when executed cause the processor to: generate reference digital signatures of first content; intercept second content of a network transmission directed to an external network; generate a digital signature of the second content; compare the digital signature of the second content to the reference digital signatures of the first content to identify matching digital signatures of the first content; selectively performing an action based on the matching digital signatures matching digital signatures of the first content and a policy; generate reference digital signatures of posts of a machine learning (ML) model; generate a digital signature of source code submitted to a source code repository; compare the digital signature of the source code to the reference digital signatures of the posts of the ML model to identify matching digital signatures of the posts of the ML model; and selectively perform an action based on the matching digital signatures of the posts of the ML and the policy.
18 . The system of claim 17 , wherein the action comprises one or more of: releasing the network transmission if a first number of the matching digital signatures of the first content does not meet a first threshold; initiating an audit of the second content if the first number of the matching digital signatures of the first content meets the first threshold; sending a notification to a user interface indicating that the second content likely contains a portion of the first content if the first number of the matching digital signatures of the first content meets the first threshold; terminating the network transmission if the first number of the matching digital signatures of the first content meets a second threshold; initiating an audit of the source code if a second number of the matching digital signatures of the posts of the ML model meets a third threshold; and sending a notification to a user interface indicating that the source code likely contains a portion of the ML posts if the second number of the matching digital signatures of the posts of the ML model meets the third threshold.

Description

TECHNICAL FIELD The present disclosure generally relates to a computer security system. In particular, the present disclosure relates to protecting intellectual property through the use of digital signatures. BACKGROUND Generative artificial intelligence machine learning (AIML) models, such as large language models (LLMs), are rapidly proliferating. LLMs are language models built upon neural networks containing billions of parameters trained to generate textual content, including computer source code. Publicly-accessible LLMs pose privacy and security risks. For example, an employee of an organization may submit proprietary source code to a publicly-accessible LLM for training purposes and/or to improve the source code. An LLM may, however, provide an indication of and/or disclose information on which the LLM was trained. BRIEF DESCRIPTION OF THE DRAWINGS The disclosure will be understood more fully from the detailed description given below and from the accompanying figures of embodiments of the disclosure. The figures are used to provide knowledge and understanding of embodiments of the disclosure and do not limit the scope of the disclosure to these specific embodiments. Furthermore, the figures are not necessarily drawn to scale. FIG. 1 is a block diagram of a computing platform for protecting intellectual property through the use of digital signatures, according to an embodiment. FIG. 2 illustrates a method of protecting intellectual property through the use of digital signatures, according to an embodiment. FIG. 3 is another block diagram of the computing platform, according to an embodiment. FIG. 4 illustrates another method of protecting intellectual property through the use of digital signatures, according to an embodiment. FIG. 5 is another block diagram of the computing platform, according to an embodiment. FIG. 6 illustrates another method of protecting intellectual property through the use of digital signatures, according to an embodiment. FIG. 7 is another block diagram of the computing platform, according to an embodiment. FIG. 8 depicts a diagram of an example computer system in which embodiments of the present disclosure may operate. DETAILED DESCRIPTION Aspects of the present disclosure relate to protecting intellectual property through the use of digital signatures. An organization/entity may generate proprietary information (i.e., intellectual property) related to products, services, and/or customers, which may include designs/inventions (e.g., source code and/or circuit designs), marketing materials, customer lists, employee information, legal documents, and/or other content, and/which may be embodied as digitized text, images, video, and/or audio. An organization may want to identify proprietary information contained in outgoing correspondence and/or identify internal contributions to proprietary information, such as to identify contributions of a machine-learning (ML) model (e.g., ML-generated source code). An organization may also want to perform automated actions (e.g., initiate audits, issue warnings, and/or block transmissions of content to external networks). The sheer quantity and complexity of proprietary information may make it technically challenging to determine whether such content includes proprietary information, and/or to identify internal contributions (e.g., of a ML model). In an example, protecting intellectual property through the use of digital signatures includes generating a database of digital signatures of proprietary content, generating a digital signature of a user post directed to an external network, comparing the digital signature of the user post to the database of digital signatures to determine if the user post contains or likely contains proprietary information, and selectively performing one or more actions configurable policies if the user post contains or likely contains proprietary information. The digital signatures may be generated using, for example and without limitation, piecewise hashing, rolling hashes, Merkle trees, Simhash, Minhash, content defined chunking (CDC), differential hashing, shingling, and/or combinations thereof. The one or more actions may be performed by a processing device and include, without limitation, initiating an audit of the user post, terminating transmission of the user post, alerting a user that the user post contains or likely contains proprietary content, and/or permitting the user to terminate transmission of the user post. In another example, protecting intellectual property through the use of digital signatures includes generating a database of digital signatures of posts/outputs of a ML model (e.g., an internal/proprietary ML model and/or an external/public ML model), generating a digital signature of source code submitted to a source code repository and comparing it to the database of digital signatures to quickly and efficiently identify source code that contains or likely contains posts of the ML model, and