CN-122019873-A - Multi-mode fusion space-time cross attention personalized track recommendation method

CN122019873ACN 122019873 ACN122019873 ACN 122019873ACN-122019873-A

Abstract

The invention discloses a multi-modal fused space-time cross attention personalized track recommendation method, which comprises the steps of generating user personalized embedding based on a mask language model through collaborative embedding of a group interaction rule, constructing space-time data representation by combining multi-granularity time embedding and quadtree space embedding, designing a personalized space-time cross attention mechanism, fusing time stamp cross attention and time interval perception to strengthen space-time relevance, realizing collaborative, personalized and space-time three-modal deep integration by utilizing a dynamic gating fusion mechanism, and finally outputting personalized tracks conforming to multiple constraints through fine adjustment of a large language model through low-rank adaptation. The method and the system remarkably improve the accuracy, space-time rationality and personalized adaptation capability of track recommendation, and are particularly suitable for scenes such as intelligent travel and urban trip planning.

Inventors

HAN LICHUN
ZHAO YONGHUI
WANG LIMING
ZHU JINXIA
YAN XIAOLING
YANG SHUO
LI CHENGXIAN
CHEN GONG
AN CHEN
XUE ZHIHUI

Assignees

中国人民解放军海军工程大学

Dates

Publication Date: 20260512
Application Date: 20260119

Claims (10)

1. A multi-modal fused spatio-temporal cross-attention personalized trajectory recommendation method, comprising: s100, acquiring user set, POI set and user history access track data, and performing multi-mode feature coding on the data to generate collaborative embedding, personalized embedding, space-time data embedding and standardized track sequences; s200, designing a dynamic gating fusion mechanism, and adaptively adjusting the weight ratio of collaborative embedding and personalized embedding through user-dependent gating and POI-dependent gating to generate collaborative and personalized fusion embedding; s300, constructing a personalized space-time cross attention mechanism, fusing personalized timestamp cross attention and personalized time interval perception, dynamically coupling user personalized preference with space-time relevance, and generating space-time optimization embedding; S400, performing semantic level splicing on synergy and personalized fusion embedding and space-time optimization embedding to construct a large language model input sequence, performing fine adjustment on large language model attention layer parameters based on a low-rank adaptation method, and generating a recommended track based on an optimization model through a joint loss function optimization model.
2. The track recommendation method according to claim 1, wherein in S100, the collaborative embedding generation method includes adopting a three-layer graph convolution to aggregate user and POI embedding, fusing multi-scale interaction features through layer weights, and then completing linear transformation and nonlinear mapping through double-layer MLP to generate the user and POI collaborative embedding adapting to the semantic space of a large language model.
3. The track recommendation method according to claim 1, wherein in S100, the personalized embedding generation method includes: Designing a mask language model based on the interaction frequency of the user and the POI, and reserving the inherent semantic of the POI when the interaction frequency is high; And generating personalized embedment according to a dynamic mask strategy for each POI in the user history track, taking an average value of the personalized embedment of all POIs of the user history track, generating the personalized embedment of the user, and accurately describing user preference differences.
4. The track recommendation method according to claim 1, wherein in S100, the spatio-temporal data embedding generation method includes: the time is embedded, namely the time stamp is decomposed into five-dimensional granularity of month, week, date, hour and minute, a multi-scale time period is covered, and the time is linearly transformed to generate the time embedding adapting to the large language model; And (3) embedding the space, namely encoding GPS coordinates by adopting a preset level quadtree, converting the two-dimensional continuous geographic coordinates into preset dimension discrete vectors, quantifying geographic features, and simultaneously reserving space adjacency to generate space embedding through linear transformation.
5. The track recommendation method according to claim 1, wherein in S100, the standardized track sequence generating method includes unifying historical tracks of all users into a sequence of lengths by a truncation or zero padding method using a fixed length processing strategy, and providing a unified format input for a model.
6. The track recommendation method according to claim 1, wherein in S200, the method for realizing the dynamic gating fusion mechanism comprises the steps of adaptively adjusting weights according to user history interaction data, reducing personalized embedding weights and improving collaborative embedding weights when new user history data are deficient, guaranteeing recommendation reliability based on group rules of similar users, improving personalized embedding weights and reducing collaborative embedding weights when old user history data are rich, highlighting individual user preferences, generating gating values through a sigmoid activation function, realizing smooth adjustment of weights, and avoiding recommendation performance fluctuation caused by mutation.
7. The track recommendation method according to claim 1, wherein in S300, the personalized timestamp cross-attention is used to focus on the periodicity of behavior of the user under similar timestamps, and the specific implementation method includes: Generating a query vector, a key vector and a value vector, wherein the query vector is used for bearing a future time query intention, the key vector is used for bearing a historical time matching reference, and the value vector is used for bearing POI characteristics; The attention score is calculated to measure the matching degree of the future time inquiry and the historical time key, a time period similarity and masking mechanism is constructed, and the personalized time period preference of the user is accurately captured.
8. The track recommendation method according to claim 1, wherein in S300, the personalized time interval awareness is used for enhancing space-time continuity of neighboring POIs, and the specific implementation method includes: constructing a personalized time interval weight matrix based on the interval relation between the history and the future time stamp, and capturing the association between the time interval and the user preference; The method comprises the steps of collecting query vectors, key vectors and value vectors, calculating cross attention, merging the cross attention into the weight matrix when calculating the cross attention, strengthening the influence of time intervals on space relevance, introducing residual connection and layer normalization, accelerating model training convergence, and improving the stability of feature representation.
9. The track recommendation method of claim 1, wherein the method implemented by S400 comprises: The method comprises the steps of carrying out semantic level splicing on collaborative and personalized fusion embedding and space-time optimization embedding, introducing [ CLS ] classification token and [ SEP ] separation token to construct a sequence conforming to a large language model input format, adopting LoRA technology to carry out fine adjustment on LLM attention layer parameters, reducing calculation cost of full fine adjustment while retaining strong semantic understanding capability of LLM, designing joint loss functions of fusion recommendation loss, mask language model loss and space-time constraint loss, optimizing recommendation accuracy, personalized characterization capability and space-time rationality, and optimizing a model, wherein the optimization model generates recommendation tracks through three steps of candidate screening and personalized sequencing and space-time optimization.
10. An electronic device, comprising: one or more processors; A memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the recommendation method.

Description

Multi-mode fusion space-time cross attention personalized track recommendation method Technical Field The invention relates to the technical field of track recommendation, in particular to a multi-mode fusion space-time cross attention personalized track recommendation method. Background With the rapid development of mobile internet and location service technologies, track recommendation has become a core technology for connecting user demands and space services, and the core goal is to generate a point of interest (POI) sequence with personalized preference, group collaboration rule and space-time rationality for users. In the actual scenes of intelligent travel, urban travel and the like, the travel decision of the user faces multiple constraints, namely, the user needs to attach the self-interest characteristics (such as natural scenery preference of photographing lovers and amusement facilities preference of parent-child users) and the group selection rules of the similar users (such as tendency of young tourists to punch card net red scenic spots), and meanwhile, the geographic accessibility (such as moderate distance between adjacent POIs) and time rationality (such as meeting expectations of journey time consumption) of the POIs are required to be met. Existing track recommendation methods can be classified into conventional recommendation methods and Large Language Model (LLM) -based recommendation methods. The traditional recommendation method such as matrix decomposition, graphic neural network (GCN) and the like can capture partial group collaborative rules or space-time correlation, but the performance is greatly reduced in a cold start scene (new user/new POI lacks historical interaction data), and multi-modal information is difficult to deeply fuse. The LLM-based recommendation method shows certain advantages in a cold start scene by virtue of strong semantic understanding capability, but still has obvious technical defects: 1. the prior method has insufficient multi-focus collaborative text bimodal fusion, and does not use space-time constraint as independent mode to carry out exclusive modeling, so that space-time imbalance is easy to occur in the generated track (such as too far distance between adjacent POIs and unreasonable time interval); 2. Personalized expression limitation, namely directly splicing text semantics of personalized characterization depending on user history interaction, and not capturing differentiated preferences of a user on the same POI (like a scenic spot, preference early morning access of a photographic user and preference afternoon access of a parent-child user); 3. The fusion mechanism is rigidified, namely a static weight distribution strategy is adopted to fuse multi-mode information, so that the dynamic change of user behaviors (such as the difference of the historical data quantity of a new user and an old user and the dynamic migration of user preference) cannot be adapted; The space-time modeling is coarse, modeling on a multi-scale time period (such as hours, weeks and months) and fine geographic features is lacked, and the space-time behavior mode of a user is difficult to accurately capture. Therefore, there is a need for a personalized track recommendation method with multi-modal fusion and space-time cross attention, which realizes the deep fusion of three modalities of personalization, cooperativity and space-time constraint, and improves the comprehensive performance of track recommendation. Disclosure of Invention The invention aims to solve at least one technical problem in the prior art and provides a multi-mode fusion space-time cross attention personalized track recommendation method. In a first aspect, an embodiment of the present invention provides a multi-modal fused spatio-temporal cross attention personalized track recommendation method, including: s100, acquiring user set, POI set and user history access track data, and performing multi-mode feature coding on the data to generate collaborative embedding, personalized embedding, space-time data embedding and standardized track sequences; s200, designing a dynamic gating fusion mechanism, and adaptively adjusting the weight ratio of collaborative embedding and personalized embedding through user-dependent gating and POI-dependent gating to generate collaborative and personalized fusion embedding; s300, constructing a personalized space-time cross attention mechanism, fusing personalized timestamp cross attention and personalized time interval perception, dynamically coupling user personalized preference with space-time relevance, and generating space-time optimization embedding; S400, performing semantic level splicing on synergy and personalized fusion embedding and space-time optimization embedding to construct a large language model input sequence, performing fine adjustment on large language model attention layer parameters based on a low-rank adaptation method, and generating a recommen