Search

CN-115858950-B - Cross-city pre-training sequence interest point recommendation method and computer readable medium

CN115858950BCN 115858950 BCN115858950 BCN 115858950BCN-115858950-B

Abstract

The invention provides a cross-city pre-training sequence interest point recommendation method and a computer scale medium. The method comprises the steps of obtaining a plurality of groups of interest point access record sequences, real interest point access labels, category access record sequences and real category access record labels of each user in each city, obtaining a data-enhanced interest point access record sequence and corresponding real interest point access labels of each user in each city based on category and distance respectively through data enhancement, constructing an improved transform model coding neural network and a loss function model to obtain an optimized pre-training transform model coding neural network, constructing a fine tuning model by combining the pre-training transform model coding neural network and constructing the loss function model to obtain an optimized fine tuning model, collecting a target city user access record sequence in real time, and predicting the user interest point access labels through the optimized fine tuning model. The method and the system greatly improve the recommending effect of the sequence interest points on the target city data.

Inventors

  • Qian Tieyun
  • SUN KE

Assignees

  • 武汉大学

Dates

Publication Date
20260512
Application Date
20221123

Claims (10)

  1. 1. A cross-city pre-training sequence interest point recommendation method is characterized by comprising the following steps: step 1, introducing access records of multiple historical moments of each user in each city, dividing the access records by a sliding window method to obtain multiple groups of interest point access record sequences of each user in each city, marking real interest point access record labels of the interest point access record sequences of each user in each city, obtaining category access records of each interest point in the multiple groups of interest point access record sequences of each user in each city, constructing multiple groups of category access record sequences of each user in each city, and marking real access record labels of the category access record sequences of each user in each city; step 2, sequentially carrying out category-based data enhancement on each group of interest point access record sequences of each user in each city to obtain each group of category-based data enhanced interest point access record sequences of each user in each city, constructing a real interest point access record label of each group of category-based data enhanced interest point access record sequences of each user in each city, sequentially carrying out distance-based data enhancement on each group of interest point access record sequences of each user in each city to obtain each group of distance-based data enhanced interest point access record sequences of each user in each city, and constructing a real interest point access record label of each group of distance-based data enhanced interest point access record sequences of each user in each city; Step 3, constructing an improved Transformer model coding neural network, inputting each user of each city, each group of interest point access record sequence of each city, each group of class access record sequence of each user of each city, each group of interest point access record sequence of each city after class-based data enhancement into the improved Transformer model coding neural network in parallel for prediction to obtain probability distribution of all interest point labels of each user of each group of interest point access record sequence of each city, each group of class access record sequence of each user of each city corresponding to the next moment of the city user, constructing a loss function model by combining the probability distribution of all interest point labels of each user of each city, each group of interest point access record sequence of each user of each city after class-based data enhancement, and optimizing by Adam to obtain a pre-Transformer model coding neural network after optimization; Step 4, constructing a fine tuning model by combining a pre-training transducer model coding neural network, selecting a target city in a plurality of cities, inputting each group of interest point access record sequences of each user in the target city and each group of interest point access record sequences of each user in the target city after distance-based data enhancement into the fine tuning model for prediction to obtain probability distribution of all interest point labels of each user in the target city corresponding to each group of interest point access record sequences of each user in the target city after distance-based data enhancement, constructing a loss function model by combining each group of interest point access record sequences of each user in the target city and the real interest point access record labels of each group of interest point access record sequences of each user in the target city after distance-based data enhancement, and optimizing the training by an Adam algorithm to obtain an optimized fine tuning model; And 5, acquiring access records of the users at a plurality of moments in real time, processing the access record sequences of the interest points of the real-time users in the target city through the step 1, inputting the access record sequences into the optimized fine tuning model, obtaining probability distribution of all interest point labels of the target city corresponding to the interest point access record sequences of the real-time users in the target city, and taking the interest point label corresponding to the maximum probability value from the probability distribution as a predicted interest point access record label of the real-time users.
  2. 2. The cross-city pre-training sequence point of interest recommendation method of claim 1, wherein: in step 1, multiple sets of interest points of each user in each city access the record sequence, which is specifically as follows: in step 1, multiple groups of category access record sequences of each user in each city are specifically as follows: , , , , Wherein, the Indicating that the ith user's ith group of points of interest access record sequence of the (y) th city, Indicating that the ith user group i category access record sequence of the jth city, A jth point of interest access record representing a sequence of jth point of interest access records for a jth user of a jth city, i.e., a point of interest access record representing a (i + j) historical moment of the jth user of the jth city, A j-th category access record representing a sequence of a Y-th city, a u-th user, a i-th group of category access records, Y representing a number of cities in the multi-city dataset, Representing the number of users in the y-th city, L representing the length of each group access record sequence per user per city, Indicating the number of point of interest access records for user u, Representing the number of points of interest in the y-th city, Representing the category number of all the urban interest points; The real interest point access record labels of each group of interest point access record sequences of each user in each city in the step 1 are as follows , Point-of-interest access record representing the (i+L+1) th historic time of the (u) th user in the (y) th city, and ; Step 1, labeling the real access records of each group of category access record sequences of each user in each city as , Category access records representing the (i+L+1) historical moments of the (i+L) th user of the (y) th city, and 。
  3. 3. The cross-city pre-training sequence point of interest recommendation method of claim 2, wherein: the data enhancement based on the category in the step 2 is specifically defined as follows: Wherein, the A class-based data enhancement method is represented, A probability value representing each item in the input sequence being randomly selected; in the category-based data enhanced input, Accessing a record sequence for an ith group of interest points of a ith user in a ith city; In the output of the class-based data enhancement, Accessing a sequence for an ith set of points of interest for a jth user of a jth city The method comprises the steps of outputting an interest point access record sequence after category-based data enhancement, wherein the specific definition is as follows: Wherein j is the ith user ith group input interest point access sequence of the jth city (jth) Middle-middle probability The next randomly selected point of interest accesses the record sequence number, Is the slave Is selected at random from the group consisting of the points of interest, For input of The j-th point of interest in (a) accesses the record, Is in combination with The set of points of interest belonging to the same category of labels, To replace The jth interest point access record in the (a) to generate the ith interest point access sequence of the jth user of the jth city Outputting the interest point access sequence after the distance-based data enhancement Removing In addition, the outer part of the shell is provided with a plurality of grooves, Representation of The j-th access record and The j-th access record is the same; step 2, accessing record sequence of interest points based on category data enhancement Is marked as true interest point access record ; The distance-based data enhancement in step 2 is specifically defined as follows: Wherein, the A method of distance-based data enhancement is shown, A probability value representing each item in the input sequence being randomly selected; in the distance-based data enhanced input, Accessing a record sequence for an ith group of interest points of a ith user in a ith city; In the output of the distance-based data enhancement, Accessing a sequence for an ith set of points of interest for a jth user of a jth city The method comprises the steps of outputting an interest point access record sequence after distance-based data enhancement, wherein the specific definition is as follows: Wherein j is the ith user ith group input interest point access sequence of the jth city (jth) Middle-middle probability The next randomly selected point of interest accesses the record sequence number, Is the slave Is selected at random from the group consisting of the points of interest, For input of The j-th point of interest in (a) accesses the record, Is in combination with A set of 20 points of interest that are geographically nearest to each other, To replace The jth interest point access record in the (a) to generate the ith interest point access sequence of the jth user of the jth city Outputting the interest point access sequence after the distance-based data enhancement Removing In addition, the outer part of the shell is provided with a plurality of grooves, Representation of The j-th access record and The j-th access record is the same; step 2, accessing record sequence of interest points after distance-based data enhancement Is marked as true interest point access record 。
  4. 4. The cross-city pre-training sequence point of interest recommendation method of claim 3, wherein: the construction of the improved transducer model coding neural network in the step 3 comprises the following specific processes: The improved transducer model coding neural network comprises an embedding module, a self-attention module and a label prediction module; the embedding module, the self-attention module and the label prediction module are sequentially cascaded.
  5. 5. The cross-city pre-training sequence point of interest recommendation method of claim 4, wherein: The embedded module is defined as: Wherein, the Representing the embedded module(s), The point of interest embedding matrix representing city y, The category embedding matrix representing city y, Representing a global position matrix, Y representing the number of cities in the multi-city dataset, Representing the number of points of interest in the y-th city, Representing the number of all city interest point categories, d is the embedded representation vector dimension, Representing dimensions as Is a two-dimensional matrix of real numbers, Representing dimensions as Is a two-dimensional matrix of real numbers, Representing dimensions as Is a two-dimensional real number matrix of (a); The input of the embedding module is the access record sequence of the ith group of interest points of the ith user of the ith city Group i category access record sequence for the (y) th city (u) th user Step 2, accessing sequence of ith group of interest points of ith user in ith city (y-th city) after category-based data enhancement ; The output of the embedding module is as follows: Wherein, the Representation of City y interest point-based embedded matrix Accessing a record sequence for an ith set of points of interest for a jth city, a jth user The input of the output is representative of a matrix, Representation of City y-based category embedding matrix Accessing a record sequence for a user group i category of a user u of a city y The input of the output is representative of a matrix, Representation of Accessing a record sequence for a user ith group category of a jth city, based on an all city category embedding matrix The input of the output represents a set of matrices, Representation of Urban-based Category embedding matrix Accessing a record sequence for a user group i category of a user u of a city y The input of the output is representative of a matrix, Representation of City y interest point-based embedded matrix Accessing sequences for ith set of interest points for a jth user of a jth city after class-based data enhancement The input of the output represents a matrix, and Y represents the number of cities in the multi-city dataset.
  6. 6. The cross-city pre-training sequence point of interest recommendation method of claim 5, wherein: the self-attention module comprises an original self-attention mechanism and an improved self-attention mechanism; the original self-attention mechanism is defined as follows: Wherein, the Representing an original self-attention mechanism, M representing the number of layers of self-attention layers in the original self-attention mechanism; the input of the original self-attention mechanism is the ith group interest point access record sequence of the ith user of the ith city output by the embedding module Corresponding input representation matrix Group i category access record sequence for the (y) th city (u) th user Corresponding input representation matrix Category-based data enhancement is performed to access sequence of ith group of interest points of ith user in jth city Corresponding input representation matrix ; The output of the M th layer self-attention layer in the original self-attention mechanism is the ith group interest point access record sequence of the ith user of the ith city Corresponding output matrix Group i category access record sequence for the (y) th city (u) th user Corresponding output matrix Category-based data enhancement is performed to access sequence of ith group of interest points of ith user in jth city Corresponding input representation matrix ; The final output of the original self-attention mechanism is the ith user's ith group point of interest access record sequence of the (y) th city (u) Corresponding final representation vector Group i category access record sequence for the (y) th city (u) th user Corresponding final representation vector Category-based data enhancement is performed to access sequence of ith group of interest points of ith user in jth city Corresponding final representation vector , wherein, Processing for M-th layer self-attention layer in original self-attention mechanism Rear output matrix Is used to determine the vector of the last row of (c), Processing for M-th layer self-attention layer in original self-attention mechanism Rear output matrix Is used to determine the vector of the last row of (c), Processing for M-th layer self-attention layer in original self-attention mechanism Rear output matrix Is the last row vector of (a); the definition of the improved self-attention mechanism is as follows: Wherein, the Representing the post-improvement self-attention mechanism, M representing the number of layers of self-attention layers in the post-improvement self-attention mechanism; The input of the improved self-attention mechanism is the ith group interest point access record sequence of the ith city (u) user (i) output by the embedding module Corresponding input representation matrix Group i category access record sequence for the (y) th city (u) th user Corresponding input representation matrix set Category-based data enhancement is performed to access sequence of ith group of interest points of ith user in jth city Corresponding input representation matrix ; Each self-attention layer in the improved self-attention mechanism is defined as: Wherein, the Represents the mth layer in the improved self-attention mechanism, M represents the number of layers of the self-attention layer in the improved self-attention mechanism, 、 And Is a trainable parameter corresponding to the m-th self-attention layer Query, key, value, Is a kind of will Conversion of the vector of dimension real values to sum to 1 D is a function of the vector representing the vector, Y represents the number of cities in the multi-city dataset, Access record sequence representing ith group of interest points of ith user in ith city In use Calculation of Input of the m-th layer self-attention layer in case, and , Is that The corresponding input is representative of a matrix of the display, Representing a sequence of access records to a ith set of interest points for a jth user of a jth city after class-based data enhancement In use Calculation of Input of the m-th layer self-attention layer in case, and , Is that The corresponding input is representative of a matrix of the display, Representation of Urban-based Category embedding matrix Accessing a record sequence for a user group i category of a user u of a city y The input of the output is representative of a matrix, Representation of Embedding matrices based on all city categories The input of the output represents a set of matrices; The output of the M th layer self-attention layer in the improved self-attention mechanism is the ith group interest point access record sequence of the ith user in the ith city Corresponding output matrix set Category-based data enhancement is performed to access record sequences of ith group interest points of ith user in jth city Corresponding output matrix set , wherein, Representation processing When in use Calculation of The output of the M-th self-attention layer of (c), Representation processing When in use Calculation of The output of the M-th self-attention layer of (c), Representation of Urban-based Category embedding matrix Accessing a record sequence for a user group i category of a user u of a city y The output input represents a matrix, and Y represents the number of cities in the multi-city data set; The final output of the improved self-attention mechanism is the ith user (group i) interest point access record sequence of the (y) th city (u) Corresponding final representation vector set Category-based data enhancement is performed to access record sequences of ith group interest points of ith user in jth city Corresponding final representation vector set , wherein, Representation of Is used to determine the vector of the last row of the vector, Representation of Is used to determine the vector of the last row of the vector, Representation processing When in use Calculation of The output of the M-th self-attention layer of (c), Representation processing When in use Calculation of The output of the M-th self-attention layer of (c), Representation of Urban-based Category embedding matrix Accessing a record sequence for a user group i category of a user u of a city y The input of the output represents a matrix, and Y represents the number of cities in the multi-city dataset.
  7. 7. The cross-city pre-training sequence point of interest recommendation method of claim 6, wherein: the input of the label prediction module is the access record sequence of the ith group of interest points of the ith user in the ith city Corresponding final representation vector Final representation vector set Group i category access record sequence for the (y) th city (u) th user Corresponding final representation vector Category-based data enhancement is performed to access record sequences of ith group interest points of ith user in jth city Corresponding final representation vector Final representation vector set Y represents the number of cities in the multi-city dataset; The output of the tag prediction module is defined as: Wherein, the Representing access to a record sequence using the ith user's ith set of points of interest in the jth city Corresponding final representation vector Predicting the probability distribution of all the point of interest tags in city y, An embedding vector representing the interest point label p, i.e. the city y interest point embedding matrix in the embedding module P-th row of (i.e.) , Representing the number of all points of interest in city y, Representing an ith group class access record sequence for a ith user using a jth city Corresponding final representation vector Predicting the probability distribution of all class labels in city y, An embedding vector representing a category label q, i.e. an embedding matrix of urban y interest points in said embedding module Line q of (i.e.) , Representing the number of categories of points of interest for all cities, Representing final representation vectors corresponding to access record sequences using ith set of interest points of a ith user of a jth city enhanced with category-based data Predicting the probability distribution of all the point of interest tags in city y, Representation utilization Corresponding final representation vector set A set of probability distributions representing all the point of interest tags in the vector predicted city y, Representation of Corresponding final representation vector set Middle (f) Each represents a vector Predicting the probability distribution of all the point of interest tags in city y, Representation utilization Corresponding final representation vector set A set of probability distributions representing all the point of interest tags in the vector predicted city y, Representation utilization Corresponding final representation vector set Middle (f) Each represents a vector Predicting probability distribution of all interest point labels in a city Y, wherein Y represents the number of cities in the multi-city dataset.
  8. 8. The cross-city pre-training sequence point of interest recommendation method of claim 7, wherein: step 3, inputting the loss function model into the ith group interest point access record sequence of the ith user in the ith city Corresponding probability distribution And a set of probability distributions Group i category access record sequence for the (y) th city (u) th user Corresponding probability distribution Category-based data enhancement is performed to access record sequences of ith group interest points of ith user in jth city Corresponding probability distribution And a set of probability distributions , Is a label of (2) , Is a label of (2) , Is a label of (2) ; The construction loss function model in the step 3 is specifically defined as: And, in addition, the method comprises the steps of, Wherein, the Representing processing of a y-th city, a u-th user, and an i-th group category access record sequence The corresponding loss function is used to determine, As a function of the sigmoid, Representation of Corresponding final representation vector Predicting the probability distribution of all class labels in city y, Representation of The first of (3) The value of the one of the values, Is used for the identification of the tag of (c), Is the slave Category labels randomly selected in (a) and , Representation of The first of (3) A value; Representing processing of a sequence of ith set of point of interest access records for a ith city, a ith user, and a ith set of points of interest And the ith group interest point access record sequence of the ith user in the ith city after the enhancement of the data based on the category Is used for the loss function of (a), Is the parameter of the ultrasonic wave to be used as the ultrasonic wave, The maximum value in the input is returned to, Representation utilization Corresponding probability distribution The calculated loss function is used to calculate the loss function, Representation utilization Corresponding probability distribution The calculated loss function is used to calculate the loss function, Representing all of the current input data Sample and method for measuring the concentration of a sample A collection of samples is provided which, Representation of Is used for the number of all samples in the sample, Representation utilization Corresponding sets of probability distributions The calculated loss function is used to calculate the loss function, Representation utilization Corresponding sets of probability distributions The calculated loss function is used to calculate the loss function, Representation of The first of (3) The value of the one of the values, Is that Is used for the identification of the tag of (c), Representation of The first of (3) The value of the one of the values, Is the slave Randomly selected interest point labels in (a) and , Representing the number of points of interest in the y-th city, Representation of The first of (3) The value of the one of the values, Is that Is used for the identification of the tag of (c), Representation of The first of (3) The value of the one of the values, Representation utilization J-th probability distribution in (3) The calculated loss function is used to calculate the loss function, Representation of The first of (3) The value of the one of the values, Representation of The first of (3) The value of the one of the values, Representation utilization Middle (f) Probability distribution of The calculated loss function is used to calculate the loss function, Representation of The first of (3) The value of the one of the values, Representation of The first of (3) And the value Y is the number of cities in the multi-city data set.
  9. 9. The cross-city pre-training sequence point of interest recommendation method of claim 8, wherein: the target city in step 4 is the t-th city selected from a plurality of cities, Y is the number of cities in the multi-city data set; and 4, constructing a fine tuning model, wherein the specific process is as follows: the fine tuning model comprises a pre-training transducer model coding neural network, a downstream interest point sequence coding neural network and a downstream label prediction module; The pre-training transducer model coding neural network, the downstream interest point sequence coding neural network and the downstream label prediction module are sequentially cascaded; the pre-training transducer model coding neural network is used for inputting the ith group interest point access record sequence of the ith user in the (tth) target city In the step 2, the ith user group interest point access record sequence of the target tth city (tth city) after the data enhancement based on the distance is carried out Obtaining Corresponding output representation matrix And Corresponding output representation matrix And outputting the self-attention module to the downstream interest point sequence coding neural network, wherein M represents the number of layers of the self-attention layer in the original self-attention mechanism in the self-attention module in the step 3; The downstream interest point sequence encoding neural network is a downstream interest point sequence encoding neural network SASRec, defined as: Wherein, the Representing a downstream point of interest sequence encoding neural network SASRec, Representing the number of layers of self-attention in the downstream point of interest sequence encoding neural network SASRec; The input of the downstream interest point sequence coding neural network SASRec is the target tth city (tth) user (tth) group interest point access record sequence Corresponding output representation matrix Target tth city tth user ith group interest point access record sequence after distance-based data enhancement Corresponding output representation matrix ; The output of the downstream interest point sequence coding neural network SASRec is the target tth city (tth) user (tth) group interest point access record sequence Corresponding output representation vector And target tth city tth user ith group interest point access record sequence after distance-based data enhancement Corresponding output representation vector ; The input of the downstream label prediction module is the ith group interest point access record sequence of the ith user of the target tth city Corresponding output representation vector Label and method for manufacturing the same Target tth city tth user ith group interest point access record sequence after distance-based data enhancement Corresponding output representation vector Label and method for manufacturing the same ; The output of the downstream label prediction module is defined as: Wherein, the Representing access to a record sequence using a target tth city, a user, and a user, group i points of interest Corresponding output representation vector Predicting the probability distribution of all the interest point tags in the target city t, Representing access to a record sequence using a target tth city, a user, and a user, group i point of interest after distance-based data enhancement Corresponding output representation vector Predicting the probability distribution of all the interest point tags in the target city t, An embedding vector representing the interest point label p, namely an embedding matrix of interest points of a target city t in the embedding module P-th row of (i.e.) , Representing the number of all interest points in the target city t; And 4, constructing a loss function model, wherein the loss function model is defined as follows: And, in addition, the method comprises the steps of, Wherein, the Loss function representing fine tuning model, i.e. processing target tth city tth user ith group interest point access record sequence And target tth city tth user ith group interest point access record sequence after distance-based data enhancement Is used for the loss function of (a), Is the parameter of the ultrasonic wave to be used as the ultrasonic wave, Representation utilization Corresponding probability distribution The calculated loss function is used to calculate the loss function, Representation utilization Is used for the loss function of (a), As a function of the sigmoid, Representation of Is the first of (2) The value of the one of the values, Representation of Is the first of (2) The value of the one of the values, Is the slave Randomly selected interest point labels in (a) and , Representation of Is the first of (2) The value of the one of the values, Representation of Is the first of (2) The value of the one of the values, Representing the number of target tth city points of interest.
  10. 10. A computer readable medium, characterized in that it stores a computer program for execution by an electronic device, which computer program, when run on the electronic device, causes the electronic device to perform the steps of the method according to any one of claims 1-9.

Description

Cross-city pre-training sequence interest point recommendation method and computer readable medium Technical Field The invention relates to the field of a sequence interest point recommendation system, in particular to a cross-city pre-training sequence interest point recommendation method and a computer readable medium. Background With the development of Location Based Social Networks (LBSN) such as golella and Foursquare, predicting the next point of interest (Point ofInterest, POI) that a user might access has facilitated a number of intelligent services such as geographic location aware personalized services and public safety monitoring. Although users in LBSN generate large amounts of historical access data each day, sparseness of the data remains a major issue that hampers efficient sequential POI recommendation. For a POI data set of a single city, the sparsity of the data set has a phenomenon that the number of times that part of POIs are accessed is small, and this phenomenon can seriously affect the recommendation performance of the recommendation system. To avoid the data sparseness problem, it is a viable approach to fuse POI datasets from different cities together to form a more massive POI dataset. The present invention proposes pre-training on multiple city-consolidated POI big datasets to learn general transfer knowledge, and then fine tuning on each city POI dataset to accommodate the unique sequence patterns of the city. The great success of pre-training techniques in the field of Natural Language Processing (NLP), the former making the model not limited to data tags, benefits from its self-supervising task and the vast training corpus, the latter helping the model learn general knowledge. In light of this, researchers have introduced pre-training techniques into the field of recommendation systems. However, unlike datasets in the NLP domain, there is typically no shared object, such as a user and a POI, between datasets in different cities in the field of POI recommendation systems, resulting in isolation between datasets in different cities. Thus, the current work is mainly studying how to design more efficient self-supervising tasks without fully utilizing large data sets to learn general knowledge and conduct knowledge migration. Although part of the work excavates the time information, which makes the model possible to learn general knowledge by using a large data set, the time information belongs to coarse-grained information, and limits the ability of the model to learn general knowledge across cities. Another possible method is to use POI category information shared by all cities, the POI category expresses the inherent function of the POI and belongs to fine-grained information. However, the existing feasible method only adopts a common parameter sharing strategy, and the simple setting cannot efficiently embed the transfer knowledge of the POI category hierarchy into the POI hierarchy. In order to solve the problems, the invention provides a cross-city pre-training-based sequence POI recommendation method, which fully utilizes POI category information, performs pre-training on a cross-city big data set to achieve the purpose of learning general transfer knowledge, and then performs fine adjustment on a target city data set to adapt to the unique context environment of a city, thereby improving the accuracy of sequence POI recommendation. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a cross-city pre-training sequence interest point recommendation method. The method fully utilizes the interest point category information to pre-train and learn general transfer knowledge on the cross-city big data set and perform knowledge migration, and then fine-adjusts the target city data set to adapt to the current city context environment, thereby further improving the accuracy of recommending the sequence interest points. The technical scheme of the method is a cross-city pre-training sequence interest point recommendation method, which comprises the following steps: step 1, introducing access records of multiple historical moments of each user in each city, dividing the access records by a sliding window method to obtain multiple groups of interest point access record sequences of each user in each city, marking real interest point access record labels of the interest point access record sequences of each user in each city, obtaining category access records of each interest point in the multiple groups of interest point access record sequences of each user in each city, constructing multiple groups of category access record sequences of each user in each city, and marking real access record labels of the category access record sequences of each user in each city; step 2, sequentially carrying out category-based data enhancement on each group of interest point access record sequences of each user in each city to obtain each group of category-b