CN-121979385-A - Exhibition hall interest point prediction method and prediction system based on multi-mode data

CN121979385ACN 121979385 ACN121979385 ACN 121979385ACN-121979385-A

Abstract

The application discloses a prediction method and a prediction system for exhibition hall interest points based on multi-mode data, wherein the method is implemented by acquiring sensing data acquired by a sensor; the sensing data comprises voice data, gesture data and track data generated by a target user in a target exhibition hall and interaction data of the target user on a plurality of browsed exhibits in the target exhibition hall, characteristics of the sensing data are extracted, the characteristics of the sensing data are fused to obtain multi-mode fusion characteristics, interest values of the target user on all exhibits in the target exhibition hall are determined based on the multi-mode fusion characteristics, candidate interest exhibits are screened out from the browsed exhibits according to the interest values of the target exhibit hall and serve as candidate interest points, interest point prediction results of the target user in the target exhibition hall are generated based on layout information and the candidate interest points of the target exhibit hall, and the interest point prediction results of the exhibition hall can be generated by combining the multi-mode data through dynamic weighted fusion, so that personalized exhibition guiding is realized, and the operation of the exhibition hall is optimized.

Inventors

WANG HAO
ZHANG HAIMING
LI HUA

Assignees

广西通信规划设计咨询有限公司

Dates

Publication Date: 20260505
Application Date: 20251225

Claims (10)

1. A method for predicting points of interest in a showroom based on multimodal data, the method comprising: The method comprises the steps of acquiring sensing data acquired by a sensor, wherein the sensing data comprises voice data, gesture data and track data generated by a target user in a target exhibition hall and interaction data of the target user on a plurality of browsed exhibits in the target exhibition hall; Extracting the characteristics of the sensing data, and fusing the characteristics of the sensing data to obtain multi-mode fusion characteristics; Determining interest values of the target user on all exhibits in the target exhibition hall based on the multi-mode fusion characteristics, wherein all exhibits in the target exhibition hall comprise browsed exhibits and unbrown exhibits; Screening candidate interest exhibits from the unbrown exhibits according to the interest values of all exhibits in the target exhibition hall, and taking the candidate interest exhibits as candidate interest points; and generating an interest point prediction result of the target user in the target exhibition hall based on the layout information of the target exhibition hall and the candidate interest points.
2. The method for predicting points of interest in a hall based on multi-modal data according to claim 1, wherein the extracting the features of the sensing data and fusing the features of the sensing data to obtain multi-modal fused features comprises: Preprocessing the voice data, the gesture data, the track data and the interaction data respectively to obtain processed voice data, processed gesture data, processed track data and processed interaction data; Coding the processed voice data by adopting a pre-training language model to obtain text feature vectors; Adopting a long-term and short-term memory network to perform time sequence feature mining on the processed track data to obtain a track feature vector; Performing spatial feature extraction on the processed gesture data and the processed interaction data respectively by adopting a preset convolutional neural network to obtain gesture feature vectors corresponding to the processed gesture data and operation feature vectors corresponding to the processed interaction data; The multimodal fusion feature is calculated based on the text feature vector, the trajectory feature vector, the gesture feature vector, and the operational feature vector.
3. The multi-modal data-based exhibition point of interest prediction method of claim 2, wherein the computing the multi-modal fusion feature based on the text feature vector, the trajectory feature vector, the gesture feature vector, and the operational feature vector comprises: constructing an attention query vector; Respectively calculating the association degree between the attention inquiry vector and the text feature vector, the track feature vector, the gesture feature vector and the operation feature vector to obtain the weight of the text feature vector, the weight of the track feature vector, the weight of the gesture feature vector and the weight of the operation feature vector; And respectively weighting the weights of the text feature vector and the text feature vector, the weights of the track feature vector and the track feature vector, the weights of the gesture feature vector and the weights of the operation feature vector and the operation feature vector so as to obtain the multi-mode fusion feature through fusion calculation.
4. The multi-modal data-based exhibition point of interest prediction method according to claim 1, wherein the determining interest values of the target user for all exhibits in the target exhibition based on the multi-modal fusion feature comprises: and inputting the multi-mode fusion features into an interest value scoring model to obtain interest values of the target user, which are output by the interest value scoring model, on all exhibits in the target exhibition hall, wherein the interest value scoring model comprises a gradient lifting decision tree model.
5. The method for predicting points of interest in a showroom based on multimodal data according to claim 4, wherein after the multimodal fusion feature is input into an interest value scoring model to obtain interest values of all exhibits in the target showroom for the target user output by the interest value scoring model, the method further comprises: When the time period between the current moment and the acquisition moment of the sensing data is larger than a preset time interval, second data are acquired, wherein the second data are voice data, gesture data, track data and interaction data generated by the target user in a target exhibition hall in the time period; and training the interest value scoring model based on the second data and the interest values of the target user for all the exhibits in the target exhibition hall so as to obtain an updated exhibition hall interest value scoring model.
6. The multi-modal data-based exhibition hall interest point prediction method according to claim 1, wherein the selecting candidate interest exhibits from the non-browsed exhibits according to interest values of all exhibits in the target exhibition hall as candidate interest points comprises: Extracting interest values of the unbrown exhibits in the target exhibition hall from the interest values of all exhibits in the target exhibition hall to form unbrown exhibit interest value ordering; according to the ordering of the interest values of the non-browsed exhibits, determining the corresponding non-browsed exhibits with the interest value larger than a preset interest value threshold as the candidate interest exhibits; and taking the exhibit positions corresponding to the candidate interest exhibits as the candidate interest points.
7. The multi-modal data-based exhibition hall point of interest prediction method according to claim 1, wherein after the generating the target user's point of interest prediction result in the target exhibition hall based on the layout information of the target exhibition hall and the candidate point of interest location points, the method further comprises: determining the real-time heat sequencing of all exhibits in the target exhibition hall according to the sensing data so as to generate a real-time heat region and a real-time track density of the target exhibition hall; according to the real-time heat ranking of all the exhibits in the target exhibition hall, determining the exhibits with heat values lower than a preset heat threshold in the real-time heat ranking as low-attention exhibits; and generating an optimization scheme of the low-attention exhibit in the target exhibition hall based on the real-time hot area of the target exhibition hall and the real-time track density of the target exhibition hall.
8. A multi-modal data-based exhibition hall point of interest prediction system, the system comprising: The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring sensing data acquired by a sensor, wherein the sensing data comprises voice data, gesture data and track data generated by a target user in a target exhibition hall and interaction data of the target user on a plurality of browsed exhibits in the target exhibition hall; the extraction module is used for extracting the characteristics of the sensing data and fusing the characteristics of the sensing data to obtain multi-mode fusion characteristics; the system comprises a multi-mode fusion feature module, a determining module, a display module and a display module, wherein the multi-mode fusion feature module is used for acquiring multi-mode fusion features of a target user; the screening module is used for screening candidate interest exhibits from the unbrown exhibits according to the interest values of all exhibits in the target exhibition hall, and taking the candidate interest exhibits as candidate interest points; And the generation module is used for generating an interest point prediction result of the target user in the target exhibition hall based on the layout information of the target exhibition hall and the candidate interest point position points.
9. An electronic device comprising at least one control processor and a memory communicatively coupled to the at least one control processor, the memory storing instructions executable by the at least one control processor to enable the at least one control processor to perform a multi-modal data-based exhibition point of interest prediction method of any one of claims 1 to 7.
10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform a multi-modal data-based exhibition point of interest prediction method of any one of claims 1 to 7.

Description

Exhibition hall interest point prediction method and prediction system based on multi-mode data Technical Field The application relates to the technical field of intelligent exhibition hall navigation, in particular to an exhibition hall interest point prediction method and system based on multi-mode data. Background The exhibition hall is used as an important place for cultural spreading and product display, and the guiding service quality directly influences the experience and the display effect of visitors. Along with the development of intelligent technology, the traditional passive navigation can not meet the requirement of acquiring personalized information of a user, so that the intelligent equipment is required to acquire user behavior data, accurately predict interest points, realize personalized navigation recommendation, simultaneously provide data support for exhibition hall operation optimization, and improve the visiting efficiency and operation decision scientificity. The existing exhibition hall guiding mode depends on manual explanation and static display cards, personalized service cannot be achieved, and electronic guiding equipment only supports basic navigation and exhibit introduction, and the interaction mode is single and functions are solidified. The existing exhibition hall interest point prediction method collects single-mode user data, relies on surface layer data such as residence time and the like, is prone to misjudgment caused by environmental interference, has limited analysis dimension, causes insufficient interest point prediction precision, is difficult to provide accurate personalized guide service, and cannot provide scientific decision basis for exhibition hall operation optimization. Disclosure of Invention The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims. The main purpose of the disclosed embodiments is to provide a prediction method and a prediction system for exhibition hall interest points based on multi-mode data, which can generate the prediction result of the exhibition hall interest points by combining the multi-mode data dynamic weighting fusion and the exhibition layout, thereby realizing personalized guide and optimizing the operation of the exhibition hall. A first aspect of an embodiment of the present application provides a method for predicting points of interest in an exhibition hall based on multi-modal data, the method including: The method comprises the steps of acquiring sensing data acquired by a sensor, wherein the sensing data comprises voice data, gesture data and track data generated by a target user in a target exhibition hall and interaction data of the target user on a plurality of browsed exhibits in the target exhibition hall; Extracting the characteristics of the sensing data, and fusing the characteristics of the sensing data to obtain multi-mode fusion characteristics; Determining interest values of the target user on all exhibits in the target exhibition hall based on the multi-mode fusion characteristics, wherein all exhibits in the target exhibition hall comprise browsed exhibits and unbrown exhibits; Screening candidate interest exhibits from the unbrown exhibits according to the interest values of all exhibits in the target exhibition hall, and taking the candidate interest exhibits as candidate interest points; and generating an interest point prediction result of the target user in the target exhibition hall based on the layout information of the target exhibition hall and the candidate interest points. In some embodiments of the present application, the extracting the features of the sensing data and fusing the features of the sensing data to obtain the multi-modal fused features includes: Preprocessing the voice data, the gesture data, the track data and the interaction data respectively to obtain processed voice data, processed gesture data, processed track data and processed interaction data; Coding the processed voice data by adopting a pre-training language model to obtain text feature vectors; Adopting a long-term and short-term memory network to perform time sequence feature mining on the processed track data to obtain a track feature vector; Performing spatial feature extraction on the processed gesture data and the processed interaction data respectively by adopting a preset convolutional neural network to obtain gesture feature vectors corresponding to the processed gesture data and operation feature vectors corresponding to the processed interaction data; The multimodal fusion feature is calculated based on the text feature vector, the trajectory feature vector, the gesture feature vector, and the operational feature vector. In some embodiments of the application, the computing the multimodal fusion feature based on the text feature vector, the trajectory feature vector, the gesture feature vector, and the operation feature vect