CN-121996807-A - Multi-mode retrieval tracing method and device and computing equipment

CN121996807ACN 121996807 ACN121996807 ACN 121996807ACN-121996807-A

Abstract

The invention discloses a multi-mode searching and tracing method, a device and a computing device, wherein the computing device comprises a first vector library and a second vector library which are constructed based on a plurality of original files in multiple modes, the first vector library is suitable for establishing a mapping relation between metadata and semantic texts of each original file and corresponding vector representations, the second vector library is suitable for establishing a mapping relation between feature vectors of each original file and metadata of the original file, the method comprises the steps of obtaining a query problem, carrying out vectorization processing on the query problem to obtain corresponding query vectors, obtaining corresponding target metadata and target text semantics from the first vector library based on the query vectors, obtaining corresponding target feature vectors from the second vector library based on the target metadata, carrying out similarity matching in the second vector library based on the target feature vectors to obtain a plurality of relevant feature vectors of the original files in multiple modes, obtaining relevant metadata corresponding to each relevant feature vector, and generating a query result and source tracing information corresponding to the query problem based on the target metadata and the target semantic texts and the relevant feature vectors. Based on the method, the efficient cross-modal retrieval can be realized, and a complete traceability link can be provided for the retrieval result, so that the credibility and the interpretability of the retrieval result are ensured.

Inventors

CHEN JIAN
QIAO NAN
CHEN GUANGRONG

Assignees

北京并行科技股份有限公司
北京北龙超级云计算有限责任公司

Dates

Publication Date: 20260508
Application Date: 20260109

Claims (10)

1. A multi-modal searching and tracing method, executed in a computing device, wherein the computing device comprises a first vector library and a second vector library which are constructed based on a plurality of original files with multiple modalities, the first vector library is suitable for establishing a mapping relation between metadata and semantic text of each original file and corresponding vector representation, and the second vector library is suitable for establishing a mapping relation between a feature vector of each original file and metadata of the original file, the method comprises the following steps: acquiring a query problem, and carrying out vectorization processing on the query problem to obtain a corresponding query vector; based on the query vector, corresponding target metadata and target text semantics are obtained from the first vector library; based on the target metadata, obtaining corresponding target feature vectors from the second vector library, performing similarity matching in the second vector library based on the target feature vectors to obtain a plurality of relevant feature vectors of a plurality of original files from a plurality of modes, and obtaining relevant metadata corresponding to each relevant feature vector; And generating a query result and traceability information corresponding to the query problem based on the target metadata, the target semantic text and the related metadata corresponding to each related feature vector.
2. The method of claim 1, further comprising: Extracting semantic text from each original file respectively, and acquiring metadata of the original file, wherein the semantic text is used for describing contents in the original file; Generating a table file containing a plurality of metadata entries based on metadata and semantic texts of a plurality of original files in a plurality of modes, wherein each metadata entry contains metadata and semantic text of one original file, and the metadata comprises a file path, a file size and a file type; Carrying out vectorization processing on each metadata item to obtain a corresponding vector representation, and storing the vector representation and the metadata item in a first vector library in an associated manner; and extracting feature vectors from each original file respectively, and storing the feature vectors in a second vector library in association with metadata of the original files.
3. The method of claim 2, wherein extracting feature vectors from each of the original files, respectively, and storing the feature vectors in association with metadata of the original files in a second vector library, comprises: And extracting feature vectors from each original file respectively, and storing the feature vectors in a second vector library in association with file paths in metadata of the original files.
4. A method according to claim 2 or 3, wherein obtaining corresponding target metadata and target text semantics from the first vector library based on the query vector comprises: Based on the query vector, a corresponding target metadata entry is obtained from the first vector library, wherein the target metadata entry comprises target metadata and target text semantics.
5. The method of claim 4, wherein generating query results and trace information corresponding to the query question based on the target metadata and the target semantic text and the related metadata corresponding to each of the related feature vectors, comprises: Generating a query result and traceability information corresponding to the query problem based on the query problem, the target metadata item and the related metadata corresponding to each related feature vector by using a large language model; the tracing information comprises a file path of the target original file, target metadata items corresponding to the target metadata, similarity calculation results in the first vector library and the second vector library and original file information corresponding to each related metadata.
6. The method of any of claims 1-5, wherein obtaining corresponding target metadata and target text semantics from the first vector library based on the query vector comprises: Calculating first similarity between the query vector and each vector representation in the first vector library to obtain one or more target vector representations with highest first similarity, and obtaining target metadata and target text semantics corresponding to each target vector representation; Based on the target metadata, obtaining a corresponding target feature vector from the second vector library, and performing similarity matching in the second vector library based on the target feature vector to obtain a plurality of related feature vectors of a plurality of original files from a plurality of modes, wherein the method comprises the following steps: And matching the target metadata with each metadata in the second vector library to obtain a corresponding target feature vector, and calculating second similarity between the target feature vector and each feature vector in the second vector library to obtain a plurality of related feature vectors of a plurality of original files from a plurality of modes, wherein the second similarity is highest.
7. The method of claim 2, wherein the plurality of original files of the plurality of modalities includes a text file, an image file, an audio file, and a video file; Extracting feature vectors from each original file respectively, wherein the feature vectors comprise: extracting text feature vectors from the text file by using a pre-training language model; extracting image feature vectors from the image file by using a visual transducer model; Extracting an audio feature vector from the audio file by using an audio frequency spectrum analysis model; and extracting video characteristic vectors from the video file by using the video frame sequence analysis model and the audio frequency spectrum analysis model.
8. A multi-modal retrieval traceability apparatus deployed in a computing device adapted to perform the method of any of claims 1-7, the computing device comprising a first vector library and a second vector library constructed based on a plurality of original files of multiple modalities, wherein the first vector library is adapted to establish a mapping relationship between metadata and semantic text of each of the original files and corresponding vector representations, and the second vector library is adapted to establish a mapping relationship between feature vectors of each of the original files and metadata of the original files, the apparatus comprising: the vectorization processing module is suitable for acquiring a query problem, and vectorizing the query problem to obtain a corresponding query vector; the first retrieval module is suitable for acquiring corresponding target metadata and target text semantics from the first vector library based on the query vector; The second retrieval module is suitable for acquiring corresponding target feature vectors from the second vector library based on the target metadata, performing similarity matching in the second vector library based on the target feature vectors to acquire a plurality of relevant feature vectors of a plurality of original files from a plurality of modes, and acquiring relevant metadata corresponding to each relevant feature vector; And the generation module is suitable for generating a query result and tracing information corresponding to the query problem based on the target metadata, the target semantic text and the related metadata corresponding to each related feature vector.
9. A computing device, comprising: At least one processor, and A memory storing program instructions, wherein the program instructions are configured to be adapted to be processed by the at least one processor, the program instructions comprising instructions for processing the method of any of claims 1-7.
10. A computer program product comprising computer program instructions which, when executed by a processor, implement the method of any of claims 1-7.

Description

Multi-mode retrieval tracing method and device and computing equipment Technical Field The present invention relates to the technical field of multi-modal retrieval, and in particular, to a multi-modal retrieval tracing method, a multi-modal retrieval tracing device, and a computing device. Background Multimodal data typically includes text data, image data, audio data, and video data. Text data is usually composed of character sequences, words or sentences, image data is usually represented by pixel points and three channel color values thereof, audio data is mainly represented by waveforms or spectrograms of time sequences, and video data is composed of continuous frame images and accompanying audio signals, wherein the continuous frame images comprise space information and time information. The multi-mode data with obvious differences are uniformly stored and efficiently retrieved, and the multi-mode data is the main responsibility of the multi-mode knowledge base system. At present, the construction and retrieval of knowledge bases of data in different modes have the following problems: 1) The problem of modal isomerism is that the representation modes of different modal data have larger difference, and certain difficulties exist in constructing a unified representation space and realizing cross-modal alignment and association. 2) The problem of efficient indexing and retrieval is how to design a hybrid retrieval mechanism which ensures high recall rate and high precision in the face of massive multi-mode data, and the problem is to be solved. 3) The traceability and interpretability problems are that in the process of cross-modal retrieval and reasoning, each result needs to be guaranteed to be traceable to the original data source, and meanwhile an interpretable evidence chain is provided to enhance the credibility and auditability of the system. Therefore, a multi-mode searching and tracing method is needed to solve the problems in the above technical solutions. Disclosure of Invention Therefore, the invention provides a multi-mode searching and tracing method and device for solving or at least alleviating the problems. According to one aspect of the invention, a multi-mode searching and tracing method is provided and executed in a computing device, wherein the computing device comprises a first vector library and a second vector library which are constructed based on a plurality of original files with multiple modes, the first vector library is suitable for establishing a mapping relation between metadata and semantic texts of each original file and corresponding vector representations, the second vector library is suitable for establishing a mapping relation between feature vectors of each original file and metadata of the original file, a query problem is acquired, vectorization processing is conducted on the query problem to obtain corresponding query vectors, corresponding target metadata and target text semantics are acquired from the first vector library based on the query vectors, corresponding target feature vectors are acquired from the second vector library based on the target metadata, similarity matching is conducted on the target feature vectors in the second vector library to obtain a plurality of relevant feature vectors of the plurality of original files from multiple modes, the relevant feature vectors are acquired, the relevant feature vectors are corresponding to the relevant feature vectors, the corresponding target metadata and the corresponding target text semantics are generated based on the target metadata and the corresponding target text semantics. Optionally, the multi-mode searching and tracing method according to the invention further comprises the steps of respectively extracting semantic texts from each original file and acquiring metadata of the original file, wherein the semantic texts are used for describing the content in the original file; generating a table file containing a plurality of metadata entries based on metadata and semantic texts of a plurality of original files of a plurality of modes, wherein each metadata entry contains metadata and semantic text of one original file, the metadata comprises a file path, a file size and a file type, vectorizing each metadata entry to obtain a corresponding vector representation, storing the vector representation and the metadata entry in a first vector library, extracting feature vectors from each original file, and storing the feature vectors and the metadata association of the original file in a second vector library. Optionally, in the multi-mode searching and tracing method according to the invention, extracting the feature vector from each original file and storing the feature vector in association with the metadata of the original file in a second vector library respectively comprises extracting the feature vector from each original file and storing the feature vector in association with a file path in the meta