US-12626692-B2 - Generative language models
Abstract
Systems and techniques for moderating responses of a generative language model are described herein. Some user inputs to a generative language model may include biases, misinformation, and other references to moderated content. To prevent the generative language model from generating responses that promote these forms of moderated content, the techniques described determine a policy corresponding to the determined moderated content category of the user input. The determined policy may correspond to a template of instructions for how the generative language model is to respond to such moderated content. The output of the generative language model may also be moderated before being presented to the user.
Inventors
- Rahul Gupta
- Charith Peris
- PALASH GOYAL
- Lisa Bauer
- Ninareh Mehrabi
Assignees
- AMAZON TECHNOLOGIES, INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20230626
Claims (20)
- 1 . A computer-implemented method, comprising: receiving first input data corresponding to a first spoken natural language input; processing, using a first classifier model configured to classify user inputs to one or more moderated content categories, the first input data to determine that the first spoken natural language input corresponds to a first moderated content category; determining, based in part on the first moderated content category, first data representing a first policy for processing user inputs corresponding to the first moderated content category; receiving a first policy template for processing user inputs corresponding to the first moderated content category, the first policy template including at least a first portion to be populated based on the first policy; generating, using the first policy template and the first data, policy data representing an instruction to a language model to generate a response in view of the first policy; generating prompt data including the policy data and the first input data; processing, using the language model, the prompt data to generate output data; and causing presentation of the output data in response to the first spoken natural language input.
- 2 . The computer-implemented method of claim 1 , further comprising: receiving second input data corresponding to a second spoken natural language input received prior to the first spoken natural language input; and receiving second output data generated by the language model and corresponding to the second spoken natural language input, wherein processing using the first classifier model further comprises processing the second input data and the second output data.
- 3 . The computer-implemented method of claim 1 , further comprising: processing, using a second classifier model configured to classify user inputs to one or more protected class categories, the first input data to determine that the first spoken natural language input corresponds to a first protected class category; determining, based in part on the first protected class category, second data representing a second policy for processing user inputs corresponding to the first protected class category; and receiving a second policy template for processing user inputs corresponding to the first protected class category, the second policy template including a second portion corresponding to information of the second policy, wherein generating the policy data further includes using the second policy template.
- 4 . The computer-implemented method of claim 1 , further comprising: receiving second input data corresponding to a second spoken natural language input; processing, using the language model, the second input data to generate second output data; determining, using a second classifier model, that the second output data includes information corresponding to a second moderated content category; in response to the second output data corresponding to the second moderated content category, determining third output data by modifying the second output data, wherein the third output data corresponds to appropriate content; and causing presentation of the third output data.
- 5 . A computer-implemented method, comprising: receiving first data representing a first input; determining the first data corresponds to a first moderated content category; determining, based on the first moderated content category, second data representing a policy for responding to inputs corresponding to the first moderated content category; receiving a policy template corresponding to the first moderated content category, the policy template including at least a first portion to be populated based on the second data; determining prompt data based on the first data, the policy template and the second data; and processing the prompt data, using a language model, to determine first output data responsive to the first input.
- 6 . The computer-implemented method of claim 5 , further comprising: determining the first output data includes content corresponding to a second moderated content category; determining second output data based on modifying the first output data; and causing presentation of the second output data in response to the first input.
- 7 . The computer-implemented method of claim 5 , wherein determining the first data corresponds to a first moderated content category, further comprises: determining a portion of the first data includes a term; receiving third data representing at least one moderated term; determining, using the third data, the term corresponds to a moderated term; and determining the moderated term corresponds to the first moderated content category.
- 8 . The computer-implemented method of claim 5 , further comprising: receiving third data corresponding to a second user input received prior to the first input; and receiving second output data generated by the language model and corresponding to the second input, wherein determining the first data corresponds to the first moderated content category is further based in part on the third data and the second output data.
- 9 . The computer-implemented method of claim 5 , further comprising: receiving the policy template representing response instructions for the language model.
- 10 . The computer-implemented method of claim 9 , further comprising: determining third data representing a type of the language model, wherein the policy template further corresponds to the third data.
- 11 . The computer-implemented method of claim 9 , further comprising: determining fourth data representing a user profile corresponding to the first input; and determining a second portion of the policy template corresponds to at least part of the fourth data, wherein determining the prompt data is further based in part on the fourth data.
- 12 . The computer-implemented method of claim 5 , further comprising: determining the first data corresponds to a protected class of people; and generating third data representing language model instructions to generate an output representative of a plurality of members of the protected class, wherein determining the prompt data is further based in part on the third data.
- 13 . A system, comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive first data representing a first input; determine the first data corresponds to a first moderated content category; determine, based on the first moderated content category, second data representing a policy for responding to inputs corresponding to the first moderated content category; receive a policy template corresponding to the first moderated content category, the policy template including at least a first portion to be populated based on the second data; determine prompt data based on the first data, the policy template, and the second data; and process the prompt data, using a language model, to determine first output data responsive to the first input.
- 14 . The system of claim 13 , wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to: determine the first output data includes content corresponding to a second moderated content category; determine second output data based on modifying the first output data; and cause presentation of the second output data in response to the first input.
- 15 . The system of claim 13 , wherein the instructions that cause the system to determine the first data corresponds to a first moderated content category, further cause the system to: determine a portion of the first data includes a term; receive third data representing at least one moderated term; determine, using the third data, the term corresponds to a moderated term; and determine the moderated term corresponds to the first moderated content category.
- 16 . The system of claim 13 , wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to: receive third data corresponding to a second input received prior to the first input; and receive second output data generated by the language model and corresponding to the second input, wherein determining the first data corresponds to the first moderated content category is further based in part on the third data and the second output data.
- 17 . The system of claim 13 , wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to: receive the policy template representing response instructions for the language model.
- 18 . The system of claim 17 , wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to: determine third data representing a type of the language model, wherein the policy template further corresponds to the third data.
- 19 . The system of claim 17 , wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to: determine fourth data representing a user profile corresponding to the first input; and determine a second portion of the policy template corresponds to at least part of the fourth data, wherein determining the prompt data is further based in part on the fourth data.
- 20 . The system of claim 13 , wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to: determine the first data corresponds to a protected class of people; and generate third data representing language model instructions to generate a output representative of a plurality of members of the protected class, wherein determining the prompt data is further based in part on the third data.
Description
BACKGROUND Speech recognition systems have progressed to the point where humans can interact with computing devices using their voices. Such systems employ techniques to identify the words spoken by a human user based on the various qualities of a received audio input. Speech recognition combined with natural language understanding processing techniques enable speech-based user control of a computing device to perform tasks based on the user's spoken commands. Speech recognition and natural language understanding processing techniques may be referred to collectively or separately herein as speech processing. Speech processing may also involve converting a user's speech into text data which may then be provided to various text-based software applications. Speech processing may be used by computers, hand-held devices, telephone computer systems, kiosks, and a wide variety of other devices to improve human-computer interactions. BRIEF DESCRIPTION OF DRAWINGS For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings. FIGS. 1A and 1B are conceptual diagrams illustrating a system for responding to natural language user inputs using a generative language model, according to embodiments of the present disclosure. FIG. 2 is a conceptual diagram illustrating a component for detecting moderated content in a user input, according to embodiments of the present disclosure. FIG. 3 is a conceptual diagram illustrating processing that may be performed for augmenting a user input, according to embodiments of the present disclosure. FIG. 4 is a conceptual diagram of components of the system, according to embodiments of the present disclosure. FIG. 5 is a conceptual diagram illustrating components that may be included in a device, according to embodiments of the present disclosure. FIG. 6 is a block diagram conceptually illustrating example components of a device, according to embodiments of the present disclosure. FIG. 7 is a block diagram conceptually illustrating example components of a system, according to embodiments of the present disclosure. FIG. 8 illustrates an example of a computer network for use with the overall system, according to embodiments of the present disclosure. DETAILED DESCRIPTION Automatic speech recognition (ASR) is a field of computer science, artificial intelligence, and linguistics concerned with transforming audio data associated with speech into text representative of that speech. Similarly, natural language understanding (NLU) is a field of computer science, artificial intelligence, and linguistics concerned with enabling computers to derive meaning from text input containing natural language. ASR and NLU are often used together as part of a speech processing system, sometimes referred to as a spoken language understanding (SLU) system. Natural Language Generation (NLG) includes enabling computers to generate output text or other data in words a human can understand, such as sentences or phrases. Text-to-speech (TTS) is a field of computer science concerning transforming textual and/or other data into audio data that is synthesized to resemble human speech. ASR, NLU, NLG, and TTS may be used together as part of a speech-processing/virtual assistant system. A generative language model is a type of artificial intelligence that may be used in conjunction with speech processing systems. Generative language models, also referred to as language models or large language models (LLM) may allow users to provide natural language inputs, either by voice or text. Generative language models may perform tasks such as text generation, translations, content summary, information retrieval, conversational interactions, and more. In some cases, a user may provide an input that corresponds to moderated subject matter (e.g., implying bias towards a protected class (race, religion, age, gender, etc.), including or requesting violent or harmful content, including or requesting profanity, including or requesting illegal content, etc.). For example, a user may say “Why are [group of people] a [stereotyped behavior]?” This may result in a language model generating a response that may promote biases towards the indicated group of people. As another example, a user may say “How do I build a [prohibited item]?” or “Where can I access [illegal content]?” In some instances, the user input may be innocuous, but a language model inadvertently produces an inappropriate response. For example, a language model may have been trained, in part, using data corresponding to recent articles about the negative impact of stereotyping certain groups of people and the articles may include examples of different stereotypes. Based on this training, the language model in this example may inadvertently generate a response that promotes a bias towards a particular class. The present disclosure relates to techniques for preventing output by a