CN-121996662-A - Multi-mode database-oriented adaptive index structure selection method
Abstract
The invention discloses a multi-modal database-oriented self-adaptive index structure selection method, which is characterized in that the hidden dimension of each modal is calculated and the effective data complexity is described by analyzing the statistical characteristics of the data dimension, variance, sparsity, distance distribution and the like of each modal, each modal is automatically mapped to the optimal index type in a candidate index structure set based on a preset index adaptation rule, the automatic construction of local indexes and the binding of the local indexes with a global routing structure are completed according to the optimal index type, and on the basis, multi-modal query can be automatically routed to corresponding indexes according to an index mapping relation to perform retrieval. The invention realizes an index structure self-adaptive selection mechanism without manual configuration, can obviously reduce the index maintenance cost of the multi-mode database, and improves the query efficiency and the expandability under the complex retrieval scene.
Inventors
- GAO YUNJUN
- QIU JUNYI
- Zhu Diefan
- QIAN TANG
- ZHENG BOLONG
- ZHOU MINGWEI
- JI SHOULING
Assignees
- 浙江大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260409
Claims (10)
- 1. The adaptive index structure selection method for the multi-mode database is characterized by comprising the following steps of: (1) Acquiring a multi-mode vector data set and mode information thereof, carrying out statistical feature analysis on a sample set of each mode, and calculating a hidden dimension; (2) Mapping each mode to a target index type in a candidate index structure set according to a preset index adaptation rule; (3) Automatically constructing local indexes corresponding to all modes based on the mapping result, and associating the local indexes with a global routing structure to form multi-mode index structure configuration; (4) When a multi-mode query request is received, automatically routing the query to a corresponding index to execute the search according to the multi-mode index structure configuration and the association relation, and fusing the search results.
- 2. The method for selecting a multi-modal database-oriented adaptive index structure according to claim 1, wherein the specific implementation manner of the statistical feature analysis of the sample set of each modality in the step (1) is as follows: s21, acquiring a multi-modal vector data set stored in a multi-modal database and describing modal information of each modal data type, a metric function and a value range; S22, for any mode, extracting a sample set from the full data or the sampling data; S23, calculating the number of dimensions, the mean value and the variance of each dimension in the mode according to the sample set, and estimating the overall variance level and the variance distribution of the mode; S24, counting the sample scale, vector sparsity and value domain distribution of the mode according to the sample set to form a density characteristic used for representing data density and distribution form; and S25, combining the dimension number, the variance statistical information, the sample scale, the vector sparsity and the density characteristics to form a modal characteristic parameter vector of the modal.
- 3. The method for selecting the adaptive index structure for the multi-modal database according to claim 1, wherein the step (1) calculates a hidden dimension of each modal, which characterizes an effective dimension of the modal under a given distance metric, and the specific implementation manner is as follows: S31, for any mode, calculating the average value of the distance distribution based on the statistical result of the points in the mode sample set and the distance And standard deviation ; S32, estimating hidden dimension of the mode according to the following formula based on the distance concentration theory : S33, hiding dimension And the key input parameters in the effective dimension index and index type judging process of the mode under a given distance function are used for judging the subsequent index type.
- 4. The method for selecting a multi-modal database-oriented adaptive index structure according to claim 3, wherein the specific implementation manner of the step (2) is as follows: S41, predefining a candidate index structure set for the multi-mode database, wherein the candidate index structure set at least comprises one or more of a tree structure-based index, a metric decomposition-based index and an inverted index; s42 when hidden dimension of a certain mode When the vector sparsity is lower than a first threshold and the vector sparsity is lower than a second threshold, mapping the mode into an index type based on a tree structure; S43 when hidden dimension of a certain mode When the distance distribution variance is higher than a first threshold value and the distance distribution variance is larger or the obvious concentration trend is presented, mapping the mode into an index type based on measurement decomposition; s44, mapping a certain mode into an inverted index type when the data vector of the mode is high-dimensional and the vector sparsity is higher than a second threshold value; S45, for the modes meeting a plurality of conditions or incapable of being judged uniquely by a threshold, selecting an optimal type from the candidate index structure set according to a preset priority or based on an online query performance feedback strategy.
- 5. The method for selecting a multi-modal database-oriented adaptive index structure according to claim 1, wherein the specific implementation manner of the step (3) is as follows: s51, for any mode, extracting a data column or a feature vector set corresponding to the mode from a multi-mode database according to the target index type obtained by mapping; S52, calling an index construction interface corresponding to the target index type, and generating a local index instance for the mode in a storage system according to preset index construction parameters; S53, distributing a unique identifier for each local index instance, recording the mode identifier, the index type and the storage position information in a global routing structure, and generating a mapping table from the mode to the index instance; And S54, registering the mapping table to a query execution engine so that a query optimizer can complete automatic routing and execution plan generation of the multi-mode query according to the mapping table.
- 6. The method for selecting a multi-modal database according to claim 5, wherein the multi-modal database includes global routing structures for coarse-grained routing and local indexing structures respectively constructed according to modes, candidate partitions are screened according to the global routing structures during query, and then similarity retrieval is performed by the local indexing structures.
- 7. The method for selecting a multi-modal database-oriented adaptive index structure according to claim 1, wherein the specific implementation manner of the step (4) is as follows: S71, analyzing a multi-mode query request to obtain a related mode set and query sub-conditions of each mode; s72, determining a corresponding local index instance and a node thereof for each mode according to a mapping table from the mode to the index instance; s73, respectively issuing modal sub-queries to the corresponding local index examples, and executing similarity retrieval to obtain candidate result sets of all modes; S74, carrying out normalization processing and weighted fusion on each mode candidate result set according to a preset multi-mode fusion strategy to obtain a cross-mode comprehensive similarity score; And S75, sorting the candidate objects according to the comprehensive similarity score, intercepting the first k candidate objects and returning the k candidate objects as query results, wherein k is a natural number greater than 1.
- 8. The method for selecting a multi-modal database oriented adaptive index structure according to claim 1, wherein the method monitors performance of different modal indexes during operation and dynamically adjusts index adaptation rules accordingly, in particular: S81, acquiring performance indexes including query delay, throughput and recall rate of each modal index in the running process of the system, wherein the performance indexes are used for monitoring the performances of different index types in an actual query environment; S82, dynamically updating a hidden dimension threshold, a sparseness threshold and an index priority strategy in an index adaptation rule according to the performance index, so that an index type selection process can be adaptively adjusted along with the change of query performance, and the index type corresponding to each mode is dynamically determined; And S83, triggering the reconstruction of the mode index or the index type switching when detecting that the performance of a certain mode under the current index structure is lower than a preset reference, and keeping the parallel use of the old index and the new index during the switching period so as to ensure that the self-adaptive index structure selection process does not interrupt the retrieval service.
- 9. A computer device comprising a memory and a processor, wherein the memory has a computer program stored therein, and wherein the processor is configured to execute the computer program to implement the method for selecting an adaptive index structure for a multimodal database according to any of claims 1 to 8.
- 10. A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the multi-modal database oriented adaptive index structure selection method according to any one of claims 1-8.
Description
Multi-mode database-oriented adaptive index structure selection method Technical Field The invention belongs to the technical field of database index management and retrieval, and particularly relates to a multi-mode database-oriented self-adaptive index structure selection method. Background Multimodal similarity retrieval is an important foundation for database management, and has profound effects on multiple fields such as intelligent recommendation, information retrieval, medical analysis, e-commerce search and the like. In the prior art, different modes generally adopt independent index structures, such as tree structure indexes for low-dimensional dense vectors, metric decomposition structures for high-dimensional vectors and inverted indexes for sparse text modes, and good query efficiency can be obtained in the respective modes. However, although these index types can play an advantage on a single mode, when facing a multi-mode scene, such methods still rely on manually configuring an index structure, and cannot automatically make optimal selection according to the characteristics of the mode data, so that the applicability of the method in a practical complex scene is limited. Currently, the industry does not have a unified solution capable of automatically analyzing the characteristics of multi-modal data and selecting a proper index structure, and the existing system usually designates index types manually, so that the existing system is difficult to adapt to the huge difference of the multi-modal data in dimension, sparsity and distance distribution, and the query performance fluctuation is obvious. In addition, the traditional method is often used for independently constructing indexes for each mode, lacks the unified modeling capability of cross modes, is used for separately managing different modes in a system, is difficult to dynamically adjust an index strategy according to data characteristics, and is further incapable of realizing self-adaptive index replacement based on performance feedback. Therefore, the current technology cannot meet the index management requirement of the multi-modal database in the scene of large scale, high variation and high real-time. In recent years, academia and industry try to combine multiple index structures to support similarity retrieval of multi-mode data, and although the schemes can improve the retrieval capability to a certain extent, the problems of fixed index type, high construction cost, high update cost and the like still exist, so that the method is difficult to adapt to rapid growth of large-scale and multi-type data. Therefore, how to design a multi-modal vector index management method capable of adaptively selecting an index structure according to modal statistics features, to achieve low-cost index construction and adaptive index configuration, and to support efficient and general multi-modal similarity retrieval has become a difficult problem to be solved in academia and industry. For example, in an e-commerce platform, commodity description texts, commodity pictures, attribute vectors and the like belong to different modes, the platform needs to perform search in a multi-mode vector set at the same time to return more accurate related commodities, meanwhile, commodity updating is frequent, manual configuration of index structures of all modes is high in cost and low in efficiency, and timely adaptation to data change is difficult. However, the existing index structure has obvious limitation under a multi-mode scene, for example, space tree structures such as R-tree (rectangular tree) are easy to generate dimension disasters when high-dimensional vectors are processed, so that index degradation is linear scanning, text indexes based on inverted structures are difficult to exert filtering advantages when facing dense vectors or low-sparsity modes, and metric trees are applicable to medium-high-dimensional vectors, but have unstable efficiency under sparse modes and complex distribution. Therefore, conventional index structures are typically only locally optimized for a single modality, lacking a uniform adaptation capability across modalities. In addition, in the existing multi-modal vector retrieval method, the selection of an index structure generally depends on manual experience, certain index type is fixedly used, the data dimension, sparsity and distance distribution characteristics of different modalities are difficult to adapt, and when the modality difference is large, the unreasonable index structure can cause the decrease of query efficiency, the increase of index space occupation and even the accuracy of a retrieval result cannot be ensured. In recent years, some researches attempt to support multi-modal retrieval by combining multiple index structures, but the problem of how to automatically select an index structure according to modal characteristics and execute efficient retrieval on the premise of low index space overhead is no