US-12627715-B1 - Open vocabulary content moderation policies via multi-modal embeddings

US12627715B1US 12627715 B1US12627715 B1US 12627715B1US-12627715-B1

Abstract

Users can create natural language prompts that describe a content moderation policy. A computing system can then process these prompts using a machine-learned multi-modal embedding generation model to create policy embeddings. Then, for an item of content to be screened for the content moderation policy, the system can also process the item of content with the machine-learned multi-modal embedding generation model to create one or more content embeddings. The system can then compare the policy embedding(s) with the content embedding(s) to determine whether or not the content violates the content moderation policy.

Inventors

Enming Luo
Wei Qiao
Kathleen Louise Warren
Chih-Chun Chia
Yuan Wang
Dongjin KWON
Cyrus RASHTCHIAN
Benjamin Max Ewing

Assignees

GOOGLE LLC

Dates

Publication Date: 20260512
Application Date: 20240503

Claims (20)

1 . A computer-implemented method for open vocabulary content moderation, the method comprising: obtaining, by a computing system comprising one or more computing devices, one or more natural language prompts that correspond to a content moderation policy; processing, by the computing system, the one or more natural language prompts with a machine-learned model to generate one or more policy embeddings for the content moderation policy; obtaining, by the computing system, an item of content comprising one or more images; processing, by the computing system, the item of content with the machine-learned model to generate one or more content embeddings for the item of content, wherein a single machine-learned multi-modal embedding generation model is used as the machine-learned model to generate both the one or more policy embeddings for the content moderation policy and the one or more content embeddings for the item of content; and comparing, by the computing system, the one or more policy embeddings with the one or more content embeddings to determine a policy outcome for the item of content.
2 . The computer-implemented method of claim 1 , wherein at least one of the one or more natural language prompts comprises a plurality of words.
3 . The computer-implemented method of claim 1 , wherein comparing, by the computing system, the one or more policy embeddings with the one or more content embeddings to determine the policy outcome for the item of content comprises: determining, by the computing system, a cosine similarity between the one or more policy embeddings and the one or more content embeddings; and comparing, by the computing system, the cosine similarity to a threshold value to determine the policy outcome for the item of content.
4 . The computer-implemented method of claim 3 , wherein: the one or more natural language prompts comprise a plurality of natural language prompts; and processing, by the computing system, the one or more natural language prompts with the machine-learned multi-modal embedding generation model to generate the one or more policy embeddings for the content moderation policy comprises processing, by the computing system, the plurality of natural language prompts with the machine-learned multi-modal embedding generation model to generate a plurality of policy embeddings; and determining, by the computing system, the cosine similarity between the one or more policy embeddings and the one or more content embeddings comprises determining, by the computing system, a signed average cosine similarity between the plurality of policy embeddings and the one or more content embeddings.
5 . The computer-implemented method of claim 1 , wherein at least one of the one or more natural language prompts comprises a user-generated prompt generated by a user associated with the content moderation policy.
6 . The computer-implemented method of claim 1 , wherein at least one of the one or more natural language prompts comprises a model-generated prompt generated by a machine-learned model, wherein the machine-learned model generates the model-generated prompt by re-writing a user-supplied natural language prompt.
7 . The computer-implemented method of claim 1 , wherein at least one of the one or more natural language prompts comprises a model-generated prompt generated by a machine-learned model, wherein the machine-learned model generates the model-generated prompt by captioning a user-supplied image.
8 . The computer-implemented method of claim 1 , wherein obtaining, by the computing system, the one or more natural language prompts comprises: obtaining, by the computing system, a plurality of candidate natural language prompts; and performing, by the computing system, prompt threshing to select the one or more natural language prompts from the plurality of candidate natural language prompts.
9 . The computer-implemented method of claim 8 , wherein performing, by the computing system, prompt threshing comprises: obtaining, by the computing system, a plurality of negative examples, each negative example comprising an example item of content that does not violate the content moderation policy; and for each of the candidate natural language prompts that is an in-scope prompt: comparing, by the computing system, a candidate policy embeddings generated from the candidate natural language prompt with a plurality of negative example embeddings generated from the plurality of negative examples to determine a number of false positives.
10 . The computer-implemented method of claim 9 , wherein the plurality of negative examples comprise a plurality of images randomly selected from a production dataset.
11 . The computer-implemented method of claim 1 , wherein the single machine-learned multi-modal embedding generation model is configured to process different types of data, including the one or more natural language prompts and the item of content, into a shared semantic space to perform a comparison between the one or more policy embeddings and the one or more content embeddings.
12 . A computer system for open vocabulary content moderation, the system comprising: a processor; a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to: obtain one or more natural language prompts that correspond to a content moderation policy; process the one or more natural language prompts with a machine-learned model to generate one or more policy embeddings for the content moderation policy; obtain an item of content comprising one or more images; process the item of content with the machine-learned model to generate one or more content embeddings for the item of content, wherein a single machine-learned multi-modal embedding generation model is used as the machine-learned model to generate both the one or more policy embeddings for the content moderation policy and the one or more content embeddings for the item of content; and compare the one or more policy embeddings with the one or more content embeddings to determine a policy outcome for the item of content.
13 . The computer system of claim 12 , wherein the instructions further cause the system to determine a cosine similarity between the one or more policy embeddings and the one or more content embeddings, and compare the cosine similarity to a threshold value to determine the policy outcome for the item of content.
14 . The computer system of claim 13 , wherein the one or more natural language prompts include a plurality of natural language prompts, and the system is configured to process the plurality of natural language prompts to generate a plurality of policy embeddings and determine an average cosine similarity between the plurality of policy embeddings and the one or more content embeddings.
15 . The computer system of claim 12 , wherein the instructions further cause the system to accept user-generated prompts, wherein at least one of the one or more natural language prompts is generated by a user associated with the content moderation policy.
16 . The computer system of claim 12 , wherein the instructions further cause the system to generate model-generated prompts, wherein at least one of the one or more natural language prompts is generated by a machine-learned model based on a user-supplied natural language prompt.
17 . The computer system of claim 12 , wherein the instructions further cause the system to generate model-generated prompts, wherein at least one of the one or more natural language prompts is generated by a machine-learned model based on captioning a user-supplied image.
18 . The computer system of claim 12 , wherein the instructions further cause the system to obtain a plurality of candidate natural language prompts and perform prompt threshing to select the one or more natural language prompts from the plurality of candidate natural language prompts.
19 . The computer system of claim 18 , wherein the prompt threshing includes comparing candidate policy embeddings generated from the candidate natural language prompts that are in-scope with a plurality of negative example embeddings generated from a plurality of negative examples to determine a number of false positives.
20 . A non-transitory computer-readable medium storing instructions that, when executed by a computer system comprising one or more computing devices, cause the computer system to perform operations for open vocabulary content moderation, the operations comprising: obtaining one or more natural language prompts that correspond to a content moderation policy; processing the one or more natural language prompts with a machine-learned model to generate one or more policy embeddings for the content moderation policy; obtaining an item of content comprising one or more images; processing the item of content with the machine-learned model to generate one or more content embeddings for the item of content, wherein a single machine-learned multi-modal embedding generation model is used as the machine-learned model to generate both the one or more policy embeddings for the content moderation policy and the one or more content embeddings for the item of content; and comparing the one or more policy embeddings with the one or more content embeddings to determine a policy outcome for the item of content.

Description

FIELD The present disclosure relates generally to content moderation. More particularly, the present disclosure relates to the use of a machine-learned multi-modal embedding generation model to enable open vocabulary content moderation. BACKGROUND Digital content moderation includes the identification and classification of content that violates various nuanced policies. Traditional content moderation systems often rely on classifiers that are trained on specific datasets labeled according to a single policy. These classifiers struggle to adapt to the multifaceted nature of content that may span across multiple policy domains, each with its own set of rules and characteristics. Additionally, the computational burden of processing and classifying large volumes of content against an ever-growing list of policies poses a significant technical hurdle. Existing systems face difficulties in efficiently scaling up to handle the increasing complexity and volume of content, which can lead to bottlenecks and delays in content moderation workflows. This technical problem is amplified in environments where real-time or near-real-time content moderation is performed, and/or where the computational resources are finite and must be judiciously utilized. SUMMARY Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments. A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a computer-implemented method for open vocabulary content moderation. The computer-implemented method also includes obtaining, by a computing system may include one or more computing devices, one or more natural language prompts that correspond to a content moderation policy. The method also includes processing, by the computing system, the one or more natural language prompts with a machine-learned multi-modal embedding generation model to generate one or more policy embeddings for the content moderation policy. The method also includes obtaining, by the computing system, an item of content may include one or more images. The method also includes processing, by the computing system, the item of content with the machine-learned multi-modal embedding generation model to generate one or more content embeddings for the item of content. The method also includes comparing, by the computing system, the one or more policy embeddings with the one or more content embeddings to determine a policy outcome for the item of content. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. One general aspect includes a computer system for open vocabulary content moderation. The computer system also includes a processor. The system also includes a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to: obtain one or more natural language prompts that correspond to a content moderation policy; process the one or more natural language prompts with a machine-learned multi-modal embedding generation model to generate one or more policy embeddings for the content moderation policy; obtain an item of content may include one or more images; process the item of content with the machine-learned multi-modal embedding generation model to generate one or more content embeddings for the item of content; and compare the one or more policy embeddings with the one or more content embeddings to determine a policy outcome for the item of content. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. One general aspect includes a non-transitory computer-readable medium storing instructions for performing operations. The non-transitory computer-readable medium storing instructions for obtaining one or more natural language prompts that correspond to a content moderation policy. The non-transitory computer-readable medium also storing instructions for processing the one or more natural language prompts with a machine-learned multi-modal embedding generation model to generate one or more policy embeddings for the content moderation policy. The non-transitory computer-readable medium also storing instructions for obta