Search

US-12625915-B1 - Generating universal web profiles from diverse event data using generative artificial intelligence (AI) models

US12625915B1US 12625915 B1US12625915 B1US 12625915B1US-12625915-B1

Abstract

A universal user web profile generation system is provided that utilizes one or more generative artificial intelligence (AI) models to generate universal web profiles for users. For example, the universal user web profile generation system uses a combination of neural networks and generative AI models to distill relevant information from the vast amounts and types of user event data and generate relevant universal user event taxonomies. Upon generating the universal user event taxonomies, the universal user web profile generation system can efficiently and accurately generate user profiles based on user web data that aligns with the universal user event taxonomies, ensuring profile compatibility with most or all downstream processes and services that access the user web profiles. Indeed, the universal user web profile generation system generates universal web profiles for users by consolidating extensive user data into a concise and insightful format.

Inventors

  • Kamal GINOTRA
  • Nabeel Kaushal
  • Dongfei Yu
  • Sedigheh Zolaktaf
  • Andrew James MCNAMARA
  • Jikun Liu

Assignees

  • MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date
20260512
Application Date
20241231

Claims (20)

  1. 1 . A computer-implemented method for determining topic names from event data using one or more generative artificial intelligence (AI) models, comprising: converting a set of user events with text labels into a set of embeddings in embedding space, wherein the set of embeddings is assigned the text labels corresponding to the set of user events; based on determining a subset of embeddings within a first embedding cluster in the embedding space, identifying a subset of text labels that are assigned to the subset of embeddings; providing a prompt to a generative AI model with prompt instructions to generate a topic name for the first embedding cluster based on the subset of text labels; mapping a topic name embedding of the topic name and first user event embeddings of first user events associated with a first user to the embedding space; and based on determining a correlation between the topic name and a subset of the first user events, generating a user profile for the first user that includes the topic name.
  2. 2 . The computer-implemented method of claim 1 , further comprising using a text embedding neural network that maps input data into text embeddings within a text embedding space to create the set of embeddings from the set of user events, wherein the set of embeddings includes a set of text embeddings.
  3. 3 . The computer-implemented method of claim 1 , further comprising using a deep neural network to create the set of embeddings from the set of user events, wherein the deep neural network is a multimodal embedding network that converts data input of different data types into the embedding space.
  4. 4 . The computer-implemented method of claim 1 , further comprising: identifying the set of user events from user event signal data based on interactions between user event data and a set of users that includes the first user; determining that a first user event signal from the set of user events corresponds to multiple text labels; and selecting a first text label from the multiple text labels to associate with the first user event signal based on a text quality hierarchy.
  5. 5 . The computer-implemented method of claim 1 , further comprising generating a set of embedding clusters in the embedding space from the set of embeddings including generating the subset of embeddings into the first embedding cluster.
  6. 6 . The computer-implemented method of claim 1 , wherein identifying the subset of text labels assigned to the subset of embeddings includes selecting one or more of the text labels assigned to embeddings within the first embedding cluster based on a text quality hierarchy.
  7. 7 . The computer-implemented method of claim 1 , further comprising: providing the subset of text labels to the generative AI model with the prompt, wherein the prompt instructions instruct the generative AI model to generate the topic name for the first embedding cluster based on the subset of text labels; and receiving the topic name for the first embedding cluster from the generative AI model.
  8. 8 . The computer-implemented method of claim 1 , further comprising: providing a first additional prompt to the generative AI model with first additional instructions to generate a list of topic names without providing the subset of text labels; and providing a second additional prompt to the generative AI model with second additional instructions to generate an additional list of topic names from a previous list of topic names.
  9. 9 . The computer-implemented method of claim 1 , further comprising: generating a set of topic names for a set of embedding clusters generated based on the set of embeddings; and deduplicating the set of topic names by removing semantically similar topic names.
  10. 10 . The computer-implemented method of claim 9 , further comprising: identifying a set of semantically similar topic names from the set of topic names, where the set of semantically similar topic names corresponds to multiple embedding clusters; providing the set of semantically similar topic names to the generative AI model with additional instructions to generate a combined topic name for the multiple embedding clusters; and replacing the set of semantically similar topic names with the combined topic name in the set of topic names.
  11. 11 . The computer-implemented method of claim 1 , further comprising: identifying the first user events associated with the first user from the set of user events; generating, within the embedding space, the first user event embeddings using the first user events associated with the first user; and generating the topic name embedding within the embedding space for the topic name using the first user events associated with the first user.
  12. 12 . The computer-implemented method of claim 11 , further comprising determining the correlation between the topic name and the subset of the first user events based on embeddings for the subset of the first user events being within a topic assignment threshold of the topic name embedding within the embedding space.
  13. 13 . The computer-implemented method of claim 11 , further comprising: determining a user embedding that does not meet a topic assignment threshold for each topic name embedding mapped to the embedding space; and discarding the user embedding for being associated with the user profile.
  14. 14 . The computer-implemented method of claim 1 , further comprising providing access to the user profile of the first user to multiple applications that provide content.
  15. 15 . A system comprising: a processor; and a computer memory including instructions that, when executed by the processor, cause the system to carry out operations comprising: identifying a set of user events with text labels corresponding to user interactions from a set of users that includes a first user; converting the set of user events with the text labels into a set of text embeddings in text embedding space, wherein the set of text embeddings is assigned the text labels corresponding to the set of user events; based on determining a subset of text embeddings within a first embedding cluster in the text embedding space, identifying a subset of text labels that are assigned to the subset of text embeddings; providing a prompt to a generative AI model with prompt instructions to generate a topic name for the first embedding cluster based on the subset of text labels; mapping a topic name text embedding of the topic name and first user event text embeddings of first user events associated with the first user to the text embedding space; and based on determining a correlation between the topic name and a subset of the first user events, generating a user profile for the first user that includes the topic name.
  16. 16 . The system of claim 15 , additional instructions that, when executed by the processor, cause the system to carry out operations comprising: generating a topic description for the topic name using the generative AI model; and removing topic names from a generated list of topics that violate responsible safety and compliance policies.
  17. 17 . The system of claim 15 , additional instructions that, when executed by the processor, cause the system to carry out operations comprising: generating a list of topic names for the first user based on the first user event text embeddings correlating to topic names on the list of topic names within the text embedding space; ranking the topic names in the list of topic names based on user interest scores assigned to each topic name; and selecting a subset of topic names to remain on the list of topic names based on ranking and diversity of the list of topic names.
  18. 18 . The system of claim 17 , wherein ranking the topic names is based on topic interaction counts, topic interaction durations, and topic interaction uniqueness.
  19. 19 . The system of claim 17 , wherein generating the user profile for the first user includes adding the subset of topic names to an additional taxonomy hierarchy level of topic names.
  20. 20 . A computer-implemented method for determining topic names for user event signals using one or more generative artificial intelligence (AI) models, comprising: converting a set of user events with text labels into a set of text embeddings in text embedding space using a text embedding model, wherein the set of text embeddings is assigned the text labels corresponding to the set of user events; based on determining a subset of text embeddings within a first embedding cluster in the text embedding space, identifying a subset of text labels that are assigned to the subset of text embeddings; providing a prompt to a generative AI model with instructions to generate a topic name based on the subset of text labels; mapping a topic name embedding of the topic name and first user event embeddings of first user events associated with a first user to the text embedding space; and based on determining a correlation between the topic name and a subset of the first user events, generating a user profile for the first user that includes the topic name.

Description

BACKGROUND Recent years have seen significant growth in both hardware and software within the field of content discovery systems. These systems, which provide personalized content to users, have become integral to enhancing user experiences across various digital platforms. Typically, they leverage algorithms that analyze user data to predict and deliver content tailored to individual preferences. However, despite advancements in machine learning and data processing techniques, current content discovery systems still face several technical shortcomings. For example, the sheer volume and diversity of available data are too vast to process effectively. Current systems are unable to scale to fully utilize this data, often relying on only a fraction of it. This results in inefficiencies and inconsistencies. These problems are further exacerbated when incomplete and inaccurate data are provided to downstream services. These issues, along with others described below, underscore the urgent need for improvements in both efficiency and accuracy within current content discovery systems. BRIEF DESCRIPTION OF THE DRAWINGS The following detailed description provides specific and detailed implementations accompanied by drawings. Additionally, each of the figures listed below corresponds to one or more implementations discussed in this disclosure. FIG. 1 illustrates an example overview of implementing a profile generation system that uses a generative artificial intelligence (AI) model to determine topic names from event data for generating universal user web profiles. FIG. 2 illustrates an example computing environment in which the profile generation system (e.g., universal user web profile generation system) is implemented in a cloud computing system. FIG. 3 illustrates an example diagram of aggregating event data as part of generating universal user web profiles. FIG. 4 illustrates an example of creating topic names as part of generating universal user web profiles. FIG. 5 illustrates an example diagram of refining topic names as part of generating universal user web profiles. FIG. 6 illustrates an example diagram of assigning curated topic names to users as part of generating universal user web profiles. FIG. 7 illustrates an example diagram of generating universal user web profiles from the assigned topics. FIG. 8 illustrates an example series of acts in a computer-implemented method for determining topic names from event data using one or more generative artificial intelligence (AI) models. FIG. 9 illustrates example components included within a computer system for implementing a profile generation system. DETAILED DESCRIPTION This disclosure describes a universal user web profile generation system (profile generation system for short) that utilizes one or more generative artificial intelligence (AI) models to generate universal web profiles for users. For example, the profile generation system uses a combination of neural networks and generative AI models to distill relevant information from the vast amounts and types of user event data and generate relevant universal user event taxonomies. Upon generating the universal user event taxonomies, the profile generation system can efficiently and accurately generate user profiles based on user web data that aligns with the universal user event taxonomies, which ensures profile compatibility with most or all downstream processes and services that access the user web profiles. Indeed, the profile generation system generates universal web profiles for users by consolidating extensive user data into a concise and insightful format. To illustrate, in various implementations, the profile generation system determines topic names from event data using one or more generative AI models by obtaining event data associated with users, which may include data of various types, and pieces of event data can include one or more text labels. The profile generation system can convert the event data into an embedding space, assigning corresponding text labels to respective embeddings. In addition, the profile generation system may generate embedding clusters and identify text labels from some or all of the embeddings in the cluster. Then, the profile generation system provides a prompt to a generative AI model to generate a topic name for the cluster based on the identified text labels. In additional implementations, the profile generation system identifies the event data associated with a first user. The profile generation system assigns each piece of the identified event data to one of the curated topic names generated based on the event data from multiple users by correlating the event data for the first user with topic names in an embedding space. The profile generation system may further score and rank the assigned topic names before generating or updating a user web profile with the selected assigned topic names. Indeed, implementations of the present disclosure provide benefits and s