CN-121979926-A - Multi-mode target perception retrieval method, device and equipment

CN121979926ACN 121979926 ACN121979926 ACN 121979926ACN-121979926-A

Abstract

The invention provides a multi-mode target perception retrieval method, device and equipment, which comprise the steps of obtaining multi-source real-time data input by a user, wherein the multi-source real-time data comprises at least one of text data, image data and voice data, inputting the multi-source real-time data into a semantic reasoning model to conduct feature fusion processing to obtain a target feature vector, conducting situation feature matching on the target feature vector according to a membership unit of the user to obtain a situation fusion vector, conducting query intention analysis on the situation fusion vector to obtain a user intention result, conducting similarity retrieval on the user intention result in a vector library to obtain a retrieval result, and obtaining the semantic reasoning model through multi-mode historical data training. The invention can improve the efficiency of multi-modal target perception retrieval, cross-modal association degree, feature recognition precision and technology association mining depth.

Inventors

YANG XIAO
LUO XINKAI
GAO TIANMING
ZHANG XINPENG

Assignees

中译文娱科技(青岛)有限公司

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. A multi-modal target-aware retrieval method, comprising: Acquiring multi-source real-time data input by a user, wherein the multi-source real-time data comprises at least one of text data, image data and voice data; inputting the multi-source real-time data into a semantic reasoning model for feature fusion processing to obtain a target feature vector; Carrying out situation feature matching on the target feature vector according to the membership unit of the user to obtain a situation fusion vector; Carrying out query intention analysis on the situation fusion vector to obtain a user intention result; in a vector library, carrying out similarity retrieval on the user intention result to obtain a retrieval result; the semantic reasoning model is obtained through multi-mode historical data training.
2. The multi-mode target perception retrieval method according to claim 1, wherein the feature fusion processing is performed on the multi-source real-time data input semantic reasoning model to obtain a target feature vector, comprising: Preprocessing the multi-source real-time data to obtain target multi-source data; inputting the target multi-source data into a semantic reasoning model for feature extraction to obtain target feature data; And carrying out linear transformation on the target feature data to obtain a target feature vector.
3. The multi-modal target perception retrieval method according to claim 1, wherein the situation feature matching is performed on the target feature vector according to a membership unit of a user to obtain a situation fusion vector, comprising: Obtaining a real-time situation according to the membership unit of the user; And splicing and fusing the real-time situation and the target feature vector to obtain a situation fusion vector.
4. The multi-modal target-aware retrieval method of claim 1, wherein performing query intent analysis on the situational fusion vector to obtain a user intent result comprises: Classifying the situation fusion vector according to query intention to obtain intention classification probability; and obtaining a user intention result according to the intention classification probability.
5. The multi-modal object-aware retrieval method according to claim 1, wherein in a vector library, performing similarity retrieval on the user intention result to obtain a retrieval result, comprising: in a vector library, carrying out sub-mode similarity calculation according to the user intention result to obtain sub-mode similarity; And performing cross-modal weighted fusion on the sub-modal similarity to obtain a retrieval result.
6. The multi-modal target-aware retrieval method of claim 1, wherein the training process of the semantic reasoning model comprises: Performing feature labeling on the multi-mode historical data to obtain a feature data set; according to the characteristic data set, layer contribution values of a preset reasoning model and the activation rate of neurons of each layer are obtained; Pruning the preset reasoning model according to the layer contribution value and the neuron activation rate of each layer to obtain an intermediate reasoning model; And optimizing the intermediate reasoning model according to the semantic constraint loss function to obtain a semantic reasoning model.
7. The multi-modal object-aware retrieval method of claim 1, wherein the vector library construction process comprises: Obtaining a characteristic importance coefficient according to the multi-mode embedded vector; Distributing feature bit widths according to the feature importance coefficients to obtain target embedded vectors; And carrying out relation extraction and reasoning update on the target embedded vector to obtain a vector library.
8. A multi-modal object-aware retrieval device, comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring multi-source real-time data input by a user, and the multi-source real-time data comprises at least one of text data, image data and voice data; the processing module is used for inputting the multi-source real-time data into the semantic reasoning model to perform feature fusion processing to obtain a target feature vector, performing situation feature matching on the target feature vector according to a membership unit of a user to obtain a situation fusion vector, performing query intention analysis on the situation fusion vector to obtain a user intention result, and performing similarity retrieval on the user intention result in a vector library to obtain a retrieval result, wherein the semantic reasoning model is obtained through multi-mode historical data training.
9. A computing device, comprising: One or more processors; Storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program which, when executed by a processor, implements the method according to any of claims 1 to 7.

Description

Multi-mode target perception retrieval method, device and equipment Technical Field The present invention relates to the field of information retrieval technologies, and in particular, to a method, an apparatus, and a device for multi-modal target perception retrieval. Background The traditional database retrieval method adopts keyword matching to realize retrieval (such as 'certain type tank maximum range') by storing text attributes (such as model, parameters and performance indexes) of equipment in a structured database, and typical application comprises a manual database of the equipment for the armor and a naval vessel parameter query system. The method only supports single text mode retrieval, unstructured data such as images, voice and the like cannot be processed, and the mining capability of association relation among equipment is limited. The retrieval system based on single-mode recognition is used for developing a special recognition model for military equipment images (such as a fighter plane and a tank image shot by satellites) and realizing model retrieval through image feature matching, or developing a voice-to-text retrieval system for voice instructions (such as 'inquiring certain military equipment parameters'). The method only supports single-mode input, cross-mode associated retrieval of text description-equipment image-performance voice cannot be realized, and the model is not optimized for professional characteristics of military equipment (such as tank armor thickness and type of weapon mounted on a fighter plane), and the retrieval precision is limited by general recognition capability. The existing military equipment information retrieval technology has the following defects: The traditional database only supports text retrieval, cannot process multi-mode data such as military reconnaissance images and equipment voice explanation, and is difficult to meet the requirements of searching corresponding equipment parameters according to fuzzy images, describing and positioning equipment models through battlefield voices and the like in actual combat. The unimodal identification system lacks cross-modal correlation capability, for example, the text performance description of a certain type of eviction ship cannot be correlated with the satellite image of the ship and the combat deployment voice record, so that information fragmentation is caused. Disclosure of Invention The invention aims to provide a multi-mode target perception retrieval method, a multi-mode target perception retrieval device and multi-mode target perception retrieval equipment. The efficiency of multi-modal target perception retrieval, cross-modal association degree, feature recognition precision and technology association mining depth can be improved. In order to solve the technical problems, the technical scheme of the invention is as follows: a multi-mode target perception retrieval method comprises the following steps: Acquiring multi-source real-time data input by a user, wherein the multi-source real-time data comprises at least one of text data, image data and voice data; inputting the multi-source real-time data into a semantic reasoning model for feature fusion processing to obtain a target feature vector; Carrying out situation feature matching on the target feature vector according to the membership unit of the user to obtain a situation fusion vector; Carrying out query intention analysis on the situation fusion vector to obtain a user intention result; in a vector library, carrying out similarity retrieval on the user intention result to obtain a retrieval result; the semantic reasoning model is obtained through multi-mode historical data training. Optionally, inputting the multi-source real-time data into a semantic reasoning model to perform feature fusion processing to obtain a target feature vector, including: Preprocessing the multi-source real-time data to obtain target multi-source data; inputting the target multi-source data into a semantic reasoning model for feature extraction to obtain target feature data; And carrying out linear transformation on the target feature data to obtain a target feature vector. Optionally, according to a membership unit of the user, performing situation feature matching on the target feature vector to obtain a situation fusion vector, including: Obtaining a real-time situation according to the membership unit of the user; And splicing and fusing the real-time situation and the target feature vector to obtain a situation fusion vector. Optionally, query intent analysis is performed on the situation fusion vector to obtain a user intent result, including: Classifying the situation fusion vector according to query intention to obtain intention classification probability; and obtaining a user intention result according to the intention classification probability. Optionally, in the vector library, performing similarity retrieval on the user intention result to obtain a r