US-12619644-B2 - Text recommendation method and apparatus, model training method and apparatus, and readable storage medium
Abstract
Provided in the present disclosure are a text recommendation method and apparatus, a model training method and apparatus, and a readable storage medium. The text recommendation method includes: acquiring text retrieval information from a user; when it is determined that there is historical text retrieval information for the user, determining text information of each text in a text set retrieved by using the text retrieval information; performing embedded representation on the text information of each text based on a self-attention model, and determining a text embedding vector of each text; inputting the text embedding vector of each text into a trained graph convolutional network model, to obtain the probability of interaction between the user and each text in the text set; and screening out, from the text set, target text which meets a preset interaction probability, and recommending the target text to the user.
Inventors
- Ge Ou
- Boran Jiang
- Chao Ji
- Shuqi WEI
- Hongxiang SHEN
Assignees
- Beijing Boe Technology Development Co., Ltd.
- BOE TECHNOLOGY GROUP CO., LTD.
Dates
- Publication Date
- 20260505
- Application Date
- 20210918
Claims (16)
- 1 . A text recommendation method, comprising: obtaining text retrieval information from a user; determining text information of each text in a text collection retrieved by using the text retrieval information in a case of determining there is historical text retrieval information of the user, wherein the text collection comprises at least one text; performing an embedding representation on the text information of each text based on a self-attention model to determine a text embedding vector of each text; inputting the text embedding vector of each text into a trained graph convolutional network model to obtain an interaction probability of the user for each text in the text collection, wherein the trained graph convolutional network model is constructed based on a pre-built text knowledge graph and the historical text retrieval information for the user; and screening out a target text meeting a preset interaction probability from the text collection, and recommending the target text to the user; wherein before obtaining the text retrieval information from the user, the method further comprises: constructing the text knowledge graph according to a preset text resource library, wherein the text knowledge graph comprises multiple triplets in a format of head entity-relationship-tail entity, and each of the multiple triplets comprises an entity set comprising a head entity and a tail entity and a relationship set representing a relationship between the head entity and the tail entity; wherein before obtaining the text retrieval information from the user, the method further comprises: obtaining the historical text retrieval information of the user; determining a historical text collection retrieved by the user according to the historical text retrieval information, wherein the historical text collection comprises at least one historical text; determining a text embedding vector corresponding to each of the historical texts based on the self-attention model; and inputting the text knowledge graph and the text embedding vector corresponding to each of the historical texts into a graph convolutional network model to be trained, training the graph convolutional network model to be trained, and obtaining the trained graph convolutional network model; wherein after determining the historical text collection retrieved by the user according to the historical text retrieval information, the method further comprises: establishing an interaction triplet for representing an interaction between the user and each historical text according to a preset format of user identification number-text identification number-interaction relationship; and determining text information of each historical text according to the interaction triplet and the text knowledge graph.
- 2 . The method according to claim 1 , wherein after determining the text information of each text in the text collection retrieved by using the text retrieval information, the method further comprises: extracting summary information from the text information of each text, wherein the summary information represents a generalized expression of the text; wherein the performing of the embedding representation on the text information of each text based on the self-attention model to determine the text embedding vector of each text, comprises: performing an embedding representation on the summary information of each text based on the self-attention model to determine a summary embedding vector of each text, and using the summary embedding vector as the text embedding vector of the text.
- 3 . The method according to claim 1 , wherein before performing the embedding representation on the text information of each text based on the self-attention model to determine the text embedding vector of each text, the method further comprises: performing data cleaning on the text information of each text, removing stop words, and obtaining cleaned data; and determining a corresponding word embedding vector according to the cleaned data of the text information of each text.
- 4 . The method according to claim 3 , wherein the determining of the corresponding word embedding vector according to the cleaned data of the text information of each text, comprises: segmenting the cleaned data of the text information of each text into words, and converting each of the words in the cleaned data into a corresponding word identification number according to dictionary information of the self-attention model; and using the word identification number as a word embedding vector of the text information.
- 5 . The method according to claim 4 , wherein after converting each word in the cleaned data into the corresponding word identification number, the method further comprises: determining an identification number sequence corresponding to the text information of each text and a corresponding sequence length; and processing the cleaned data corresponding to the text information of each text to adjust the corresponding identification number sequence to a preset sequence length.
- 6 . The method according to claim 1 , wherein the inputting of the text embedding vector of each text into the trained graph convolutional network model to obtain the interaction probability of the user for each text in the text collection, comprises: determining an entity set and a relationship set comprised in each text in the text collection according to the text knowledge graph, wherein the entity set comprises multiple entities, and the relationship set comprises multiple relationships; determining an importance of a target relationship in the relationship set to the user according to a quantity of neighbors of a target entity in the entity set, wherein the larger the quantity of neighbors, the more neighbors connected to the target entity; determining a neighbor representation vector of the target entity according to the importance; aggregating an initial representation vector and the neighbor representation vector of the target entity to determine a first-order entity representation of the target entity, wherein in a case that the target entity is a text entity, the text embedding vector of the text is used as the initial representation vector; obtaining a final representation vector of the target entity after passing through h layers of the trained graph convolutional network model, wherein h is a positive integer; and inputting the final representation vector and a user representation vector representing the user into a prediction function to obtain an interaction probability of the user to a corresponding text.
- 7 . The method according to claim 6 , wherein the importance of the target relationship in the relationship set to the user is determined by a following formula: π r u = u · r + α r , wherein u represents the user representation vector of the user, r represents a vector representation of the target relationship, and α represents the quantity of neighbors of the target entity.
- 8 . The method according to claim 6 , wherein the first-order entity representation of the target entity is determined by a following formula: agg=σ(w(ν+ν S(ν) u )+b), wherein σ represents an activation function, w and b represent trainable parameters, ν represents the initial representation vector of the target entity, and ν S(ν) u represents the neighbor representation vector of the target entity.
- 9 . A non-transitory computer-readable storage medium, wherein: the computer-readable storage medium stores computer instructions; and the computer instructions, when run on the computer, cause the computer to execute the text recommendation method according to claim 1 .
- 10 . The method according to claim 1 , wherein the determining of the text information of each historical text according to the interaction triplet and the text knowledge graph, comprises: determining the text identification number of each historical text from the interaction triplet; and determining the text information of the historical text from the text knowledge graph according to the text identification number of each historical text.
- 11 . A non-transitory computer-readable storage medium, wherein: the computer-readable storage medium stores computer instructions; and the computer instructions, when run on the computer, cause the computer to execute the text recommendation method according to claim 2 .
- 12 . A model training method, comprising: obtaining historical text retrieval information of a user; determining a historical text collection retrieved by the user according to the historical text retrieval information, wherein the historical text collection comprises at least one historical text; determining a text embedding vector corresponding to each of the historical texts based on a self-attention model; and inputting a pre-built text knowledge graph and the text embedding vector corresponding to each historical text into a graph convolutional network model to be trained, training the graph convolutional network model to be trained, and obtaining a trained graph convolutional network model; wherein after determining the historical text collection retrieved by the user according to the historical text retrieval information, the method further comprises: establishing an interaction triplet for representing an interaction between the user and each historical text according to a preset format of user identification number-text identification number-interaction relationship; and determining text information of each historical text according to the interaction triplet and the text knowledge graph.
- 13 . The method according to claim 12 , wherein an operation of inputting the pre-built text knowledge graph and the text embedding vector corresponding to each historical text into the graph convolutional network model to be trained, training the graph convolutional network model to be trained, and obtaining the trained graph convolutional network mode, comprises: determining an entity set and a relationship set comprised in each historical text in the interaction triplet according to the text knowledge graph, wherein the entity set comprises a plurality of entities, and the relationship set comprises a plurality of relationships; determining an importance of a target relationship in the relationship set to the user according to a quantity of neighbors of a target entity in the entity set, wherein the larger the quantity of neighbors, the more neighbors connected to the target entity; determining a neighbor representation vector of the target entity according to the importance; aggregating an initial representation vector and the neighbor representation vector of the target entity to determine a first-order entity representation of the target entity, wherein in a case that the target entity is a text entity, the text embedding vector of the historical text is used as the initial representation vector; obtaining a final representation vector of the target entity after passing through h layers of the graph convolutional network model to be trained, wherein h is a positive integer; inputting the final representation vector and the user representation vector representing the user into a prediction function to predict a predicted interaction probability of the user to a corresponding historical text; calculating a loss value according to the predicted interaction probability and the interaction relationship of the user to the corresponding historical text; and updating a weight of the graph convolutional network model to be trained by using the loss value to obtain the trained graph convolutional network model.
- 14 . A model training apparatus, comprising: a second memory and a second processor; wherein the second memory is configured to store computer programs; and the second processor is configured to execute the computer programs in the second memory to perform the model training method according to claim 12 .
- 15 . A non-transitory computer-readable storage medium, wherein: the computer-readable storage medium stores computer instructions; and the computer instructions, when run on the computer, cause the computer to execute the model training method according to claim 12 .
- 16 . A text recommendation apparatus, comprising: a first memory and a first processor; wherein the first memory is configured to store computer programs; and the first processor is configured to execute the computer programs in the first memory to perform: obtaining text retrieval information from a user; determining text information of each text in a text collection retrieved by using the text retrieval information in a case of determining there is historical text retrieval information of the user, wherein the text collection comprises at least one text; determining a text embedding vector for each text based on a self-attention model; inputting the text embedding vector of each text into a trained graph convolutional network model to obtain an interaction probability of the user for each text in the text collection, wherein the trained graph convolutional network model is constructed based on a pre-built text knowledge graph and the historical text retrieval information for the user; and screening out a target text meeting a preset interaction probability from the text collection, and recommending the target text the user; wherein the first processor is further configured to execute the computer programs in the first memory to perform: constructing the text knowledge graph according to a preset text resource library, wherein the text knowledge graph comprises multiple triplets in a format of head entity-relationship-tail entity, and each of the multiple triplets comprises an entity set comprising a head entity and a tail entity and a relationship set representing a relationship between the head entity and the tail entity; wherein the first processor is further configured to execute the computer programs in the first memory to perform: obtaining the historical text retrieval information of the user; determining a historical text collection retrieved by the user according to the historical text retrieval information, wherein the historical text collection comprises at least one historical text; determining a text embedding vector corresponding to each of the historical texts based on the self-attention model; and inputting the text knowledge graph and the text embedding vector corresponding to each of the historical texts into a graph convolutional network model to be trained, training the graph convolutional network model to be trained, and obtaining the trained graph convolutional network model; wherein the first processor is further configured to execute the computer programs in the first memory to perform: establishing an interaction triplet for representing an interaction between the user and each historical text according to a preset format of user identification number-text identification number-interaction relationship; and determining text information of each historical text according to the interaction triplet and the text knowledge graph.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS The disclosure is a National Stage of International Application No. PCT/CN2021/119434, filed on Sep. 18, 2021, the content of which is hereby incorporated by reference in its entirety. TECHNICAL FIELD The present disclosure relates to the field of computer technology, in particular to a text recommendation method and apparatus, a model training method and apparatus, and a readable storage medium. BACKGROUND With the rapid development of the Internet technology, people can easily obtain a large amount of information such as news and commodities without leaving home. In order to effectively improve the user experience, the recommender system is often used to screen out the fields and content that the user is interested in from a large amount of information to make targeted recommendations to the user. SUMMARY The present disclosure provides a text recommendation method and apparatus, a model training method and apparatus, and a readable storage medium for improving the accuracy of the text recommendation result. In a first aspect, embodiments of the present disclosure provide a text recommendation method, which includes: obtaining text retrieval information from a user;determining text information of each text in a text collection retrieved by using the text retrieval information in a case of determining there is historical text retrieval information of the user, wherein the text collection includes at least one text;performing an embedding representation on the text information of each text based on a self-attention model to determine a text embedding vector of each text;inputting the text embedding vector of each text into a trained graph convolutional network model to obtain an interaction probability of the user for each text in the text collection, wherein the trained graph convolutional network model is constructed based on a pre-built text knowledge graph and the historical text retrieval information for the user; andscreening out a target text meeting a preset interaction probability from the text collection, and recommending the target text to the user. In a possible implementation, after determining the text information of each text in the text collection retrieved by using the text retrieval information, the method further includes: extracting summary information from the text information of each text, wherein the summary information represents a generalized expression of the text;wherein the performing of the embedding representation on the text information of each text based on the self-attention model to determine the text embedding vector of each text, includes:performing an embedding representation on the summary information of each text based on the self-attention model to determine a summary embedding vector of each text, and using the summary embedding vector as the text embedding vector of the text. In a possible implementation, before performing the embedding representation on the text information of each text based on the self-attention model to determine the text embedding vector of each text, the method further includes: performing data cleaning on the text information of each text, removing stop words, and obtaining cleaned data; anddetermining a corresponding word embedding vector according to the cleaned data of the text information of each text. In a possible implementation, the determining of the corresponding word embedding vector according to the cleaned data of the text information of each text, includes: segmenting the cleaned data of the text information of each text into words, and converting each of the words in the cleaned data into a corresponding word identification number according to dictionary information of the self-attention model; andusing the word identification number as a word embedding vector of the text information. In a possible implementation, after converting each word in the cleaned data into the corresponding word identification number, the method further includes: determining an identification number sequence corresponding to the text information of each text and a corresponding sequence length; andprocessing the cleaned data corresponding to the text information of each text to adjust the corresponding identification number sequence to a preset sequence length. In a possible implementation, before obtaining the text retrieval information from the user, the method further includes: constructing the text knowledge graph according to a preset text resource library, wherein the text knowledge graph includes multiple triplets in a format of head entity-relationship-tail entity, and each of the multiple triplets includes an entity set including a head entity and a tail entity and a relationship set representing a relationship between the head entity and the tail entity. In a possible implementation, before obtaining the text retrieval information from the user, the method further includes: obtaining the historical text