Search

US-12626290-B2 - System, method, and non-transitory machine-readable medium for self-guided sequence selection and extrapolation

US12626290B2US 12626290 B2US12626290 B2US 12626290B2US-12626290-B2

Abstract

Embodiments described herein provide systems and methods for training a sequential recommendation model. Methods include determining a difficulty and quality (DQ) score associated with user behavior sequences from a training dataset. User behavior sequences are sampled during training based on their DQ scores. A meta-extrapolator may also be trained based on user behavior sequences sampled according to DQ score. The meta-extrapolator may be trained with high quality low difficulty sequences. The meta-extrapolator may then be used with an input of high quality high difficulty sequences to generate synthetic user behavior sequences. The synthetic user behavior sequences may be used to augment the training dataset to fine-tune the sequential recommendation model, while continuing to sample user behavior sequences based on DQ score. As the DQ score is based on current model predictions, DQ scores iteratively update during the training process.

Inventors

  • Yongjun Chen
  • Zhiwei Liu
  • Jianguo Zhang
  • Huan WANG
  • Caiming Xiong

Assignees

  • SALESFORCE, INC.

Dates

Publication Date
20260512
Application Date
20220819

Claims (20)

  1. 1 . A method for training a sequential recommendation model to automatically generate a sequential recommendation of multiple items in a sequence to a user, comprising: receiving, via a communication interface, a training dataset comprising a plurality of user behavior sequences; augmenting the training dataset with a plurality of synthetic user behavior sequences generated by a trained extrapolator based on a subset of user behavior sequences of the plurality of user behavior sequences; determining, via a DQ score generator, for at least one user behavior sequence from the training dataset: a difficulty score with an inverse relation to a probability of a sequential recommendation model with a first set of parameters correctly recommending an item in the at least one user behavior sequence, and a quality score based on a first prediction generated by the sequential recommendation model with the first set of parameters from the at least one user behavior sequence; determining, via the DQ score generator, a respective combined difficulty and quality (DQ) score for each of the plurality of user behavior sequences based on the respective difficulty and quality scores; sampling one or more user behavior sequences from the plurality of user behavior sequences with a probability proportional to the respective DQ scores; generating, by the sequential recommendation model, a second prediction from the sampled one or more user behavior sequences; training a sequence encoder of the sequential recommendation model in multiple stages, including: training the sequence encoder of the sequential recommendation model based on a training objective to minimize a loss computed via a noise contrastive estimation (NCE) module, wherein the loss is based on a comparison of the second prediction against a ground truth, corresponding to the sampled one or more user behavior sequences, and a noise distribution, resulting in a second set of parameters of the sequential recommendation model; updating the respective combined DQ score for each of the plurality of user behavior sequences based on the sequential recommendation model with the second set of parameters; and training the sequence encoder of the sequential recommendation model with the second set of parameters utilizing the updated respective combined DQ score for each of the plurality of user behavior sequences, resulting in a third set of parameters of the sequential recommendation model; and generating, by the trained sequential recommendation model with the third set of parameters integrated at a recommender system, a next recommended item predicted based on an input user behavior sequence received via a user interface; and displaying, via the user interface, the next recommended item.
  2. 2 . The method of claim 1 , wherein the difficulty score of the at least one user behavior sequence is based on an accuracy of next-item predictions of the sequential recommendation model for the at least one user behavior sequence.
  3. 3 . The method of claim 1 , wherein the difficulty score of the at least one user behavior sequence is combined with a previous difficulty score of the at least one user behavior sequence.
  4. 4 . The method of claim 1 , wherein the quality score of the at least one user behavior sequence is based on a measure of variance of prediction scores of the sequential recommendation model across items in the at least one user behavior sequence.
  5. 5 . The method of claim 1 , wherein the quality score of the at least one user behavior sequence is combined with a previous quality score of the at least one user behavior sequence.
  6. 6 . The method of claim 1 , wherein the determining the respective combined DQ score of the at least one user behavior sequence comprises summing a square of the difficulty score with a square of the quality score.
  7. 7 . The method of claim 6 , wherein the determining the respective combined DQ score of the at least one user behavior sequence further comprises raising the sum of the squares to a predetermined power.
  8. 8 . The method of claim 1 , wherein the determining the respective combined DQ score the at least one user behavior sequence comprises weighting the difficulty score and the quality score differently.
  9. 9 . The method of claim 1 , wherein the difficulty score and the quality score are iteratively updated based on the trained sequential recommendation model.
  10. 10 . A system for automatically generating a sequential recommendation of multiple items in a sequence to a user, comprising: a memory that stores a sequential recommendation model; a communication interface that receives a plurality of user behavior sequences; and one or more hardware processors that: receives, via the communication interface, a training dataset comprising a plurality of user behavior sequences; augments the training dataset with a plurality of synthetic user behavior sequences generated by a trained extrapolator based on a subset of user behavior sequences of the plurality of user behavior sequences; determines, via a DQ score generator for at least one user behavior sequence from the training dataset: a difficulty score with an inverse relation to a probability of a sequential recommendation model with a first set of parameters correctly recommending an item in the at least one user behavior sequence, and a quality score based on predictions of the sequential recommendation model with the first set of parameters; determines, via the DQ score generator a respective combined difficulty and quality (DQ) score for each of the plurality of user behavior sequences based on the respective difficulty and quality scores; samples one or more user behavior sequences from the plurality of user behavior sequences with a probability proportional to the respective DQ scores; trains a sequence encoder of the sequential recommendation model in multiple stage, including: training the sequence encoder of the sequential recommendation model based on a training objective to minimize a loss computed via a noise contrastive estimation (NCE) module, wherein the loss is based on a comparison of predictions generated by the sequential recommendation model against a ground-truth from the sampled one or more user behavior sequences, and a noise distribution, resulting in a second set of parameters of the sequential recommendation model, updating the respective combined DQ score for each of the plurality of user behavior sequences based on the sequential recommendation model with the second set of parameters, and training the sequence encoder of the sequential recommendation model with the second set of parameters utilizing the updated respective combined DQ score for each of the plurality of user behavior sequences, resulting in a third set of parameters of the sequential recommendation model; generates by the trained sequential recommendation model with the third set of parameters integrated with a recommender system, a next recommended item predicted based on an input user behavior sequence received via a user interface; and displaying, via the user interface, the next recommended item.
  11. 11 . The system of claim 10 , wherein the difficulty score of the at least one user behavior sequence is based on an accuracy of next-item predictions of the sequential recommendation model for the at least one user behavior sequence.
  12. 12 . The system of claim 10 , wherein the difficulty score of the at least one user behavior sequence is combined with a previous difficulty score of the at least one user behavior sequence.
  13. 13 . The system of claim 10 , wherein the quality score of the at least one user behavior sequence is based on a measure of variance of prediction scores of the sequential recommendation model across items in the at least one user behavior sequence.
  14. 14 . The system of claim 10 , wherein the quality score of the at least one user behavior sequence is combined with a previous quality score of the at least one user behavior sequence.
  15. 15 . The system of claim 10 , wherein the determining the respective combined DQ score of the at least one user behavior sequence comprises summing a square of the difficulty score with a square of the quality score.
  16. 16 . The system of claim 15 , wherein the determining the respective combined DQ score of the at least one user behavior sequence further comprises raising the sum of the squares to a predetermined power.
  17. 17 . The system of claim 10 , wherein the determining the respective combined DQ score the at least one user behavior sequence comprises weighting the difficulty score and the quality score differently.
  18. 18 . A non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: receiving, via a communication interface, a training dataset comprising a plurality of user behavior sequences; determining, for at least one user behavior sequence from the training dataset: a difficulty score with an inverse relation to a probability of a sequential recommendation model with a first set of parameters correctly recommending an item in the at least one user behavior sequence, and a quality score based on a first prediction generated by the sequential recommendation model with the first set of parameters; determining a first set of difficulty and quality (DQ) scores based on a combination of the difficulty score and the quality score, corresponding to the plurality of user behavior sequences, respectively; training a sequence encoder of the sequential recommendation model in multiple stages, including: updating a first set of parameters of the sequence encoder of the sequential recommendation model based on a training objective to minimize a loss computed via a noise contrastive estimation (NCE) module, wherein the loss is based on a comparison of predictions generated by the sequential recommendation model against a ground-truth from the plurality of user behavior sequences sampled based on respective DQ scores of the first set of DQ scores, and a noise distribution, resulting in a second set of parameters of the sequential recommendation model, determining a second set of DQ scores based on the sequential recommendation model with the second set of parameters, and updating the second set of parameters of the sequential recommendation utilizing the second set of DQ scores, resulting in a third set of parameters of the sequential recommendation model; selecting a first subset of user behavior sequences and a second subset of user behavior sequences from the plurality of user behavior sequences based on the second set of DQ scores; training an extrapolator using the first subset of user behavior sequences; generating, by the trained extrapolator, a plurality of synthetic user behavior sequences based on the second subset of user behavior sequences; and training the sequence encoder of the sequential recommendation model using user behavior sequences sampled from the training dataset with a probability proportional to respective DQ scores of the second set of DQ scores; fine-tuning the sequence encoder of the sequential recommendation model using the plurality of synthetic user behavior sequences; generating, by the fine-tuned recommendation model, a next recommended item predicted based on an input user behavior sequence received via a user interface; and displaying, via the user interface, the next recommended item.
  19. 19 . The non-transitory machine-readable medium of claim 18 , wherein the first subset of user behavior sequences includes user behavior sequences with a respective quality score above a first predetermined threshold, and a respective difficulty score below a second predetermined threshold.
  20. 20 . The non-transitory machine-readable medium of claim 18 , wherein the second subset of user behavior sequences includes user behavior sequences with a respective quality score above a first predetermined threshold, and a respective difficulty score above a second predetermined threshold.

Description

TECHNICAL FIELD The embodiments relate generally to natural language processing and machine learning systems, and more specifically to systems and methods for self-guided sequence selection and extrapolation. BACKGROUND Machine learning systems have been widely used in sequential recommendation tasks. Sequential recommendation provides a sequence of recommended items that capture item relationships and behaviors of users, e.g., recommending a water bottle holder after a user purchases a water bottle. Dataset limitations may pose difficulty in training accurate sequential recommender models, because sequential recommendation data can often be sparse, or on the other hand rich but redundant, and/or noisy. Therefore, there is a need for improved systems and methods for effectively using training datasets for training sequential recommendation models. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a simplified diagram illustrating a method for training a sequential recommendation model using a difficulty and quality (DQ) score according to some embodiments. FIG. 2 is a simplified diagram illustrating a plot of DQ score and self-guided sequence selection according to some embodiments. FIG. 3 is a simplified diagram illustrating training and utilizing a meta-extrapolator according to some embodiments. FIG. 4 is an exemplary input user behavior sequence and corresponding synthetic user behavior sequence. FIG. 5A provides an example pseudo-code illustrating an example algorithm for training a sequential recommendation system, according to some embodiments. FIG. 5B provides an example logic flow diagram illustrating an example algorithm for training a sequential recommendation system, according to some embodiments. FIG. 6 is a simplified diagram illustrating a computing device implementing the methods described in FIGS. 1-5, according to one embodiment described herein. FIG. 7 is a simplified block diagram of a networked system suitable for implementing the framework described in FIGS. 1-6 and other embodiments described herein. FIGS. 8-13 provide example tables illustrating example performance of different summarization models and training methods discussed herein. Embodiments of the disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, wherein showings therein are for purposes of illustrating embodiments of the disclosure and not for purposes of limiting the same. DETAILED DESCRIPTION As used herein, the term “network” may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith. As used herein, the term “module” may comprise hardware or software-based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks. Sequential recommendation (SR) systems are used to predict a user's interest to items based on their historical interactions. Implicit user behavior sequences (such as reviews, clicks, and ratings) are common in building modern recommender systems because of their ubiquity. However, these sequences in the real world can be sparse (scarce), or rich but redundant, or noisy, and therefore challenging to efficiently train an accurate model based on imperfect training data. In view of the need for improved systems and methods for effectively using training datasets for training sequential recommendation models, embodiments described herein provide for training a sequential recommendation model governed by difficulty and quality (DQ) scores that evaluate the training samples. The difficulty component of a DQ score is based on the accuracy of the model's prediction with its current parameters. The quality component of a DQ score is based on the variance of current model prediction accuracy across the members of a sequence. The DQ score is computed in unsupervised fashion and is dynamically updated along with the model. User behavior sequences can then be sampled during training based on their DQ scores. In this way, according to the DQ score, high quality and informative (difficult) sequences may be selected for training the model to achieve improved training performance. In this way, rather than simply removing noisy data from a training dataset, which may exacerbate sparsity problems, input sequences may be sampled intelligently thereby utilizing all of the data, but focusing more on the useful data. In one implementation, a meta-extrapolator which may be used to generate additional training user sequences, may also be trained based on user behavior sequences sampled according to DQ score. The meta-extrapolator may be trained with high quality low difficulty sequences. The meta-extrapolator may then be used with an i