Search

JP-2026514212-A - Method and system for extracting product review keywords based on language models

JP2026514212AJP 2026514212 AJP2026514212 AJP 2026514212AJP-2026514212-A

Abstract

This disclosure provides a language model-based product review keyword extraction method, which is performed by at least one processor. The method includes the steps of: collecting product-related review data based on information for identifying the product; generating at least one response data from at least a portion of the review data based on a predetermined set of questions using a language model; and extracting product-related review keywords based on the at least one response data.

Inventors

  • カン ジェウク
  • ソン ボギョン
  • パク ドンジュ
  • チェ ソンジェ
  • クォン ヘナ
  • パク ボヨン
  • キム スヨン
  • パク ドンウク

Assignees

  • ネイバー コーポレーション

Dates

Publication Date
20260507
Application Date
20240228
Priority Date
20230308

Claims (20)

  1. A method for extracting product review keywords based on a language model, which is executed by at least one processor, The steps include: collecting review data related to the product based on information for identifying the product; A step of using a language model to generate at least one response data from at least a portion of the review data based on a predetermined set of questions, A step of extracting review keywords related to the product based on at least one of the aforementioned response data, A method for extracting product review keywords, including those mentioned above.
  2. The product review keyword extraction method according to claim 1, further comprising the step of removing spam review data and promotional review data from review data related to the product using a machine learning model.
  3. The step of generating at least one response data from at least a portion of the review data based on a predetermined set of questions using a language model is: The steps include using the language model to determine whether the review data contains answers to at least some of the predetermined set of questions, If it is determined that the review data contains answers to at least some of the predetermined multiple questions, the step of determining at least some of the review data as answer data corresponding to at least some of the predetermined multiple questions, A method for extracting product review keywords according to claim 1, including the method described in claim 1.
  4. The process further includes the step of training the language model using a predetermined training dataset, The product review keyword extraction method according to claim 1, wherein the predetermined learning dataset includes at least one of document data and question data.
  5. The step of training the language model using a predetermined training dataset is: The steps include: using a first generative model to pseudo-label at least a portion of the document data as first answer data for a specific question among the question data; A step of training a second generative model using the aforementioned specific questions and a portion of the first answer data, A method for extracting product review keywords according to claim 4, including the method described in claim 4.
  6. The remaining portion of the aforementioned first response data is the accepted response data. The step of training the language model using a predetermined dataset is: A step of training the second generative model using the aforementioned specific questions and the remaining first answer data, The steps include labeling a portion of the first response data as second response data for a specific question through the second generation model, The method for extracting product review keywords according to claim 5, further comprising:
  7. The step of extracting review keywords related to the product based on at least one of the aforementioned response data is: The steps include: post-processing at least one of the aforementioned response data, The steps include extracting at least one review keyword related to the product from the post-processed response data, A method for extracting product review keywords according to claim 1, including the method described in claim 1.
  8. The step of post-processing at least one of the response data is: The product review keyword extraction method according to claim 7, further comprising the step of removing response data containing duplicate sentences from at least one of the response data.
  9. The step of post-processing at least one of the aforementioned response data is: The steps include determining a sentence in the review data that corresponds to at least a portion of the response data, based on a match score between at least a portion of the response data and the review data, If the match score is equal to or greater than a predetermined threshold, the steps include replacing at least a portion of the response data with sentences from the review data, A method for extracting product review keywords according to claim 7, including the method described in claim 7.
  10. The step of post-processing at least one of the response data is: The product review keyword extraction method according to claim 9, further comprising the step of removing at least a portion of the response data if the match score is less than the predetermined threshold.
  11. The step of post-processing at least one of the response data is: The product review keyword extraction method according to claim 7, further comprising the step of removing the remaining response data, excluding one of the multiple response data that are inclusion relationships with at least one of the response data.
  12. The step of post-processing at least one of the aforementioned response data is: The product review keyword extraction method according to claim 11, further comprising the step of removing the remaining response data from the plurality of response data except for the longest response data.
  13. The step of extracting review keywords related to the product based on at least one of the aforementioned response data is: The steps include converting the answer data to the predetermined set of questions into an embedding vector, A step of generating at least one group based on the distance between the aforementioned embedding vectors, A method for extracting product review keywords according to claim 1, including the method described in claim 1.
  14. The step of extracting review keywords related to the product based on at least one of the aforementioned response data is: The product review keyword extraction method according to claim 13, further comprising the step of extracting representative keywords from each of the at least one group.
  15. The product review keyword extraction method according to claim 1, wherein the information for identifying the aforementioned product is at least one of a predetermined product name, product number, or catalog ID related to the product.
  16. The aforementioned review data was collected from blogs and online shopping malls. Review data related to some of the predetermined questions mentioned above was collected from the blog mentioned above. The product review keyword extraction method according to claim 1, wherein the review data related to the remaining of the predetermined set of questions is collected from the online shopping mall.
  17. The product review keyword extraction method according to claim 1, wherein the review data related to the aforementioned product is review data generated within a predetermined period.
  18. A method for extracting product review keywords according to claim 1, further comprising the step of collecting review data related to the product based on information for identifying the product, and then removing predetermined prohibited words or special characters from the review data related to the product.
  19. A program for executing the method described in claim 1 on a computer.
  20. Communication module and Memory and The system includes at least one processor connected to the memory and configured to execute at least one computer-readable program contained in the memory, The at least one program is Based on information to identify the product, we collect review data related to the said product. Using a language model, at least one response data is generated from at least a portion of the review data based on a predetermined set of questions. A system including a command for extracting review keywords related to the product based on at least one of the aforementioned response data.

Description

This disclosure relates to a method and system for extracting keywords from product reviews, and more specifically, to a method and system for generating response data based on a predetermined set of questions from product-related review data using a language model, and extracting keywords based on the response data. Recently, with the increase in online transactions, the types of goods traded online have also diversified. Therefore, prospective buyers often refer to reviews from other buyers before purchasing a product. If a product lacks reviews, potential buyers may hesitate to purchase it, even if it's cheaper than similar products. Thus, product reviews significantly influence purchasing decisions when buying goods online. Buyers can access product reviews through blogs, internet cafes, and product review comments on online shopping malls (or smart stores). However, not all product reviews are reliable; some reviews exaggerate product benefits or are promotional, posted online for advertising purposes. Furthermore, as the volume of product reviews searchable online becomes enormous, sellers and prospective buyers may spend considerable time and effort searching for product information that meets their needs. Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, where similar reference numbers represent similar elements, but are not limited thereto. An example of a product review keyword extraction method provided by one embodiment of this disclosure is shown. This is a schematic diagram showing a configuration in which an information processing system is connected to communicate with multiple user terminals in order to extract review keywords for a product according to one embodiment of this disclosure. This is a block diagram showing the internal configuration of a user terminal and information processing system according to one embodiment of the present disclosure. This figure shows an example of a procedure for collecting review data according to one embodiment of the present disclosure. This figure shows an example of filtering review data according to its properties using one embodiment of the present disclosure. This figure shows an example of a procedure for extracting review keywords from review data according to one embodiment of this disclosure. This figure shows an example of preprocessing review data according to one embodiment of the present disclosure. This figure shows an example of training a language model to generate response data according to one embodiment of the present disclosure. This figure shows an example of post-processing of response data generated by one embodiment of the present disclosure. This figure shows an example of review keywords according to one embodiment of this disclosure. This flowchart shows an example of a method according to one embodiment of the present disclosure. The specific details for implementing this disclosure will be described below with reference to the attached drawings. However, in the following description, specific explanations of widely known functions and configurations will be omitted if there is a risk of unnecessarily obscuring the essence of this disclosure. In the attached drawings, identical or corresponding components are denoted by the same reference numerals. Furthermore, in the following descriptions of embodiments, redundant descriptions of identical or corresponding components may be omitted. However, the omission of a description of a component does not mean that the component is not included in any embodiment. The advantages and features of the disclosed embodiments, and the methods for achieving them, will become clearer by referring to the embodiments described below, along with the accompanying drawings. However, this disclosure is not limited to the embodiments disclosed below, and can be realized in a variety of different forms. These embodiments are merely provided to complete the disclosure and to enable a person of the ordinary skill to fully understand the scope of the invention. This document provides a brief explanation of the terminology used herein and a detailed description of the disclosed embodiments. The terminology used herein has been selected to the greatest extent possible from currently widely used and general terms, taking into account the function of this disclosure; however, this may change depending on the intent of the articulates in the relevant field, case law, and the emergence of new technologies. In certain cases, the applicant has also arbitrarily selected terms, in which case their meaning will be specifically described in the corresponding description of the invention. Therefore, the terminology used in this disclosure should be defined not merely as names of terms, but based on the meaning of the term and the overall context of this disclosure. In this specification, a singular expression includes a plural expression unless expli