EP-4742117-A1 - MODIFICATION OF RESPONSIVE CONTENT THAT IS GENERATED USING GENERATIVE MODEL(S) AND THAT INCLUDES OPT-OUT CONTENT
Abstract
Implementations described herein relate to determining whether to modify segment(s) of responsive content, that is generated using a generative model (GM), and based on whether the segment(s) include opt-out content. The opt-out content can be associated with a given user or a given entity that has opted-out of: the GM being trained on data, that is associated with the given user or the given entity, since a last training cycle for the GM, or the GM being able to use the data, that is associated with the given user or the given entity, in generating the responsive content. If processor(s) of a system determine that a corresponding segment of the responsive content matches a corresponding segment of the opt-out content, then the processor(s) can modify the corresponding segment of the responsive content to generate modified responsive content, and cause the modified responsive content to be rendered at the client device.
Inventors
- ZHU, Zhenkai
- LI, Yunjie
- NYBERG, Linda Marie
Assignees
- GOOGLE LLC
Dates
- Publication Date
- 20260513
- Application Date
- 20251104
Claims (15)
- A system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to be operable to: receive user input that is associated with a client device of a user; generate, based on processing at least the user input and using a generative model (GM), responsive content that is responsive to the user input; determine whether a corresponding segment of the responsive content matches a corresponding segment of opt-out content, the opt-out content being associated with a given user or a given entity that has opted-out of: the GM being trained on data, that is associated with the given user or the given entity, since a last training cycle for the GM, or the GM being able to use the data, that is associated with the given user or the given entity, in generating the responsive content; and in response to determining that a corresponding segment of the responsive content matches a corresponding segment of the opt-out content: modify the corresponding segment of the responsive content to generate modified responsive content; and cause the modified responsive content, in lieu of the responsive content, to be rendered at the client device of the user.
- The system of claim 1, wherein the at least one processor is further operable to: receive, from the given user or the given entity, an indication that the given user or the given entity desires to opt-out of the GM being trained on the data or the GM being able to use the data in generating the responsive content; and in response to receiving the indication that the given user or the given entity desires to opt-out of the GM being trained on the data or the GM being able to use the data in generating the responsive content: determine, for the given user or the given entity, the opt-out content; and store, in one or more databases, the opt-out content.
- The system of claim 2, wherein the instructions to determine the opt-out content for the given user or the given entity comprise instructions to: identify content that is associated with the given user or the given entity; normalize, using one or more normalization operations, the content that is associated with the given user or the given entity to generate normalized content; segment the normalized content into a plurality of normalized content segments; and store, in one or more of the databases, the plurality of normalized content segments as the opt-out content.
- The system of claim 3, wherein the at least one processor is further operable to: normalize, using the one or more normalization operations, the responsive content to generate normalized responsive content; and segment the normalized responsive content into a plurality of normalized responsive content segments.
- The system of claim 4, wherein the instructions to determine whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content comprise instructions to: determine, based on a comparison of the plurality of normalized content segments to the plurality of normalized responsive content segments, whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content.
- The system of claim 5, wherein the instructions to determine whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content based on the comparison of the plurality of normalized content segments to the plurality of normalized responsive content segments comprise instructions to: determine a corresponding quantity of alphanumeric characters in a normalized instance of the corresponding segment of the responsive content that need to be inserted, deleted, and/or substituted to result in a normalized instance the corresponding segment of the content; determine a corresponding edit distance between the corresponding segment of the responsive content and the corresponding segment of the content as a function of (a) the corresponding quantity of alphanumeric characters in the normalized instance of the corresponding segment of the responsive content that need to be inserted, deleted, and/or substituted to result in the normalized instance the corresponding segment of the content; and (b) a corresponding total quantity of alphanumeric characters in the normalized instance of the corresponding segment of the content; and in response to determining that the corresponding edit distance between the corresponding segment of the responsive content and the corresponding segment of the content satisfies an edit distance threshold: determine that a corresponding segment of the responsive content matches a corresponding segment of the opt-out content.
- The system of claim 4, wherein the plurality of normalized content segments are stored in one or more of the databases as the opt-out content prior to the user input being received, and wherein the responsive content is normalized and segmented subsequent to the responsive content being generated.
- The system of claim 2, wherein the instructions to determine the opt-out content for the given user or the given entity comprise instructions to: identify content that is associated with the given user or the given entity; process, using a hash function, the content that is associated with the given user or the given entity to generate a plurality of corresponding hashes for the content; and store, in one or more of the databases, the plurality of corresponding hashes for the content as the opt-out content.
- The system of claim 8, wherein the at least one processor is further operable to: prior to processing the content that is associated with the given user or the given entity to generate the plurality of corresponding hashes for the content and using the hash function: normalize, using one or more normalization operations, the content that is associated with the given user or the given entity to generate normalized content; segment the normalized content into a plurality of normalized content segments; and wherein the instructions to process the content that is associated with the given user or the given entity to generate the plurality of corresponding hashes for the content and using the hash function comprise instructions to: process, using the hash function, the plurality of normalized content segments to generate the plurality of corresponding hashes for the content as the opt-out content.
- The system of claim 8 or 9, wherein the at least one processor is further operable to: process, using the hash function, the responsive content to generate a plurality of corresponding hashes for the responsive content.
- The system of claim 10, wherein the at least one processor is further operable to: prior to processing the responsive content to generate the plurality of corresponding hashes for the responsive content and using the hash function: normalize, using the one or more normalization operations, the responsive content to generate normalized responsive content; segment the normalized responsive content into a plurality of normalized responsive content segments; and wherein the instructions to process the responsive content to generate the plurality of corresponding hashes for the responsive content and using the hash function comprise instructions to: process, using the hash function, the plurality of normalized responsive content segments to generate the plurality of corresponding hashes for the responsive content, and/or wherein the instructions to determine whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content comprise instructions to: determine, based on a comparison the plurality of corresponding hashes for the content to the plurality of corresponding segments for the responsive content, whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content and/or, wherein the plurality of corresponding hashes for the content are stored in one or more of the databases as the opt-out content prior to the user input being received, and wherein the plurality of corresponding hashes for the responsive content are generated subsequent to the responsive content being generated.
- The system of any preceding claim, wherein the at least one processor is further operable to: determine, based on the user input, one or more search queries; obtain, based on the one or more search queries, a plurality of search result documents; and wherein one or more of the plurality of search result documents are processed along with the user input and using the GM to generate the responsive content that is responsive to the GM, wherein optionally the at least one processor is further operable to: determine whether one or more of the search result documents are associated with the given user or the given entity that has opted-out of the GM being trained on the data or the GM being able to use the data in generating the responsive content; and in response to determining that one or more of the search result documents are associated with the given user or the given entity that has opted-out of the GM being trained on the data or the GM being able to use the data in generating the responsive content: classify the one or more of the search result documents that are associated with the given user or the given entity as the opt-out content, wherein optionally the at least one processor is further operable to: normalize, using one or more normalization operations, the one or more of the search result documents that are associated with the given user or the given entity to generate normalized content; segment the normalized content into a plurality of normalized content segments; normalize, using the one or more normalization operations, the responsive content to generate normalized responsive content; segment the normalized responsive content into a plurality of normalized responsive content segments; and wherein the instructions to determine whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content comprises instructions to: determine, based on a comparison of the plurality of normalized content segments to the plurality of normalized responsive content segments, whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content.
- The system of any preceding claim, wherein the given user of the given entity, prior to the last training cycle for the GM, was opted-in to allow the GM to be trained or the data and/or was opted-in to allow the GM to be able to use the data in generating the responsive content, and/or wherein the at least one processor is further operable to: prior to a next training cycle of the GM: cause the data that is associated with the given user or the given entity to be removed from a GM training dataset that will be utilized to train the GM.
- A method implemented by one or more processors, the method comprising: receiving user input that is associated with a client device of a user; generating, based on processing at least the user input and using a generative model (GM), responsive content that is responsive to the user input; determining whether a corresponding segment of the responsive content matches a corresponding segment of opt-out content, the opt-out content being associated with a given user or a given entity that has opted-out of: the GM being trained on data, that is associated with the given user or the given entity, since a last training cycle for the GM, or the GM being able to use the data, that is associated with the given user or the given entity, in generating the responsive content; and in response to determining that a corresponding segment of the responsive content matches a corresponding segment of the opt-out content: modifying the corresponding segment of the responsive content to generate modified responsive content; and causing the modified responsive content, in lieu of the responsive content, to be rendered at the client device of the user.
- A non-transitory computer-readable storage medium storing computer-readable instructions that, when executed by at least one processor, cause the at least one processor to perform operations, the operations comprising: receiving user input that is associated with a client device of a user; generating, based on processing at least the user input and using a generative model (GM), responsive content that is responsive to the user input; determining whether a corresponding segment of the responsive content matches a corresponding segment of opt-out content, the opt-out content being associated with a given user or a given entity that has opted-out of: the GM being trained on data, that is associated with the given user or the given entity, since a last training cycle for the GM, or the GM being able to use the data, that is associated with the given user or the given entity, in generating the responsive content; and in response to determining that a corresponding segment of the responsive content matches a corresponding segment of the opt-out content: modifying the corresponding segment of the responsive content to generate modified responsive content; and causing the modified responsive content, in lieu of the responsive content, to be rendered at the client device of the user.
Description
BACKGROUND Various generative models (GMs) have been proposed that can be used to process image content, video content, audio content, natural language (NL) content (e.g., typed content or spoken content), and/or other input(s), to generate responsive content that is responsive to these input(s). These GMs are typically trained on enormous amounts of diverse data including data from, but not limited to, webpages, images, videos, electronic books, software code, electronic news articles, and machine translation data. Accordingly, in performing various tasks, these GMs leverage the underlying data on which they were trained, and optionally other data, such as user provided documents, search result documents obtained as part of a retrieval augmented generation (RAG) process, and so on, in generating the responsive content. In many cases, users and/or entities have a right to control their data and how it is utilized. For instance, some jurisdictions across the world have a so-called "right to be forgotten" or a "right to opt-out" that, among other things, gives these users and/or entities the power to request that their data no longer be utilized in training these GMs and/or utilized by these GMs in generating the responsive content. However, given the enormous amounts of diverse data on which these GMs are typically trained, a duration of time required for a given training cycle to train these GMs can range from weeks to months to years. As a result, and assuming a given entity invokes the right to be forgotten or the right to opt-out, an enormous amount of computational resources would be wasted if a new training cycle, that omits the data associated with the given user or the given entity, was initiated each time that the given user or the given entity invokes the right to be forgotten or the right to opt-out. Accordingly, there is a need in the art for techniques that address the right to be forgotten or the right to opt-out without requiring initiation of a new training cycle of these GMs each time a given user or a given entity invokes the right to be forgotten or the right to opt-out. SUMMARY Some implementations described herein relate to determining whether to modify segment(s) of responsive content, that is generated using a generative model (GM), and based on whether the segment(s) include opt-out content. The opt-out content can be associated with a given user or a given entity that has opted-out of: the GM being trained on data, that is associated with the given user or the given entity, since a last training cycle for the GM, or the GM being able to use the data, that is associated with the given user or the given entity, in generating the responsive content. Accordingly, processor(s) of a system can: receive user input that is associated with a client device of a user; generate, based on processing at least the user input and using the GM, the responsive content that is responsive to the user input; and determine whether a corresponding segment of the responsive content matches a corresponding segment of the opt-out content. In response to determining that a corresponding segment of the responsive content matches a corresponding segment of the opt-out content, the processor(s) can: modify the corresponding segment of the responsive content to generate modified responsive content; and cause the modified responsive content, in lieu of the responsive content, to be rendered at the client device of the user. By modifying the responsive content in response to determining that it matches the opt-out content, the processor(s) can effectively ensure data security of the opt-out content without having to immediately initiate a new training cycle for the GM and can conserve computational and/or network resources associated with executing the new training cycle for the GM. Further, the processor(s) can remove the opt-out content from a GM training dataset such that, when the new training cycle for the GM is initiated, the GM is not trained based on the opt-out content. For example, assume that a given user is an author that manages or controls a blog about all things related to patent law, and the blog includes various online articles related to different topics of patent law. Further assume that the given user interacts with a GM responsive content system that is executed by the processor(s) and indicates a desire to opt-out from the GM being trained on data, that is associated with the given user, or the GM being able to use the data that is associated with the given user, in generating the responsive content. In this example, the processor(s) can determine the data that is associated with the given user and store the data that is associated with the given user in an opt-out content database, and along with an indication of the given user and/or an indication of the given user's blog. Accordingly, when other users interact with the GM responsive content system to obtain responsive content, the processor(