CN-121979886-A - Adaptive index recommendation method based on deep reinforcement learning

CN121979886ACN 121979886 ACN121979886 ACN 121979886ACN-121979886-A

Abstract

The invention discloses a self-adaptive index recommendation method based on deep reinforcement learning, which comprises the following steps of S1, compressing workload data to generate a representative query subset, S2, carrying out feature extraction on the compressed query subset to generate feature vectors, training the feature vectors through a deep neural network to generate feature representation capable of reflecting data characteristics, S3, constructing a deep reinforcement learning model containing an online network and a target network according to the feature representation to generate an optimal index subset, S4, deploying the trained model into a real database, and dynamically adjusting index configuration according to the state of the real database and the query load. The invention can intelligently recommend the optimal index scheme, and effectively improve the database query efficiency and response speed.

Inventors

QIAO SHAOJIE
LUO XIMEI
LI BO
WANG ZHENYU
SONG ZIYU
HAN NAN
HE HAN
TANG MINGJING
YUAN GUAN
WU TAO
GOU HAOSONG
Xiang Chaocan

Assignees

成都信息工程大学

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (10)

1. The adaptive index recommendation method based on deep reinforcement learning is characterized by comprising the following steps of: s1, compressing workload data to generate a representative query subset; s2, extracting features of the compressed query subsets to generate feature vectors, and training the feature vectors through a deep neural network to generate feature representations capable of reflecting data characteristics; S3, constructing a deep reinforcement learning model containing an online network and a target network according to the feature representation, and generating an optimal index subset; and S4, deploying the trained model to a real database, and dynamically adjusting index configuration according to the state and the query load of the real-time database.
2. The adaptive index recommendation method based on deep reinforcement learning of claim 1, wherein the step S1 comprises the following sub-steps: S101, according to the query in the input workload and the estimated cost of the optimizer, calculating the utility of each query, wherein the calculation formula is as follows: B(c i )= Where W= { c 1 ,c 2 ,…,c i ,c j ,…,c n } represents the input workload, c i represents the query, B (c i ) represents the utility of the query, i.e., the ratio of the estimated cost reduction to the total estimated cost reduction for all queries in the workload, Δ (c i ) represents the estimated execution cost reduction after adding all relevant indexes to query c i in the index-free state, Representing the total estimated cost reduction for all queries; S102, calculating the similarity of each query and other queries according to the indexable list condition in the query, wherein the calculation formula is as follows: Wherein, the Representing a set containing columns of queries c i that can be indexed, D (c i ,c j ) representing the similarity between query c i and query c j , the formula measuring the degree of overlap of two queries in terms of indexable columns using a set operation, reflecting the similarity relationship between the queries; S103, according to the utility and similarity of each query, calculating the influence degree of each query on other queries, wherein the calculation formula is as follows: Wherein, the (C j ) represents the impact that can be made on query c j when index optimization is performed on query c i ; s104, obtaining the income of each query according to the sum of the utility of each query and the influence degree of other queries, wherein the calculation formula is as follows: G(c i )=B(c i )+ wherein G (c i ) represents the benefit of query c i , Representing the sum of the effects of query c i on other queries; s105, constructing a query subset compression method based on a greedy algorithm according to the benefit of each query, and generating a compressed query subset.
3. The adaptive index recommendation method based on deep reinforcement learning of claim 2, wherein the greedy algorithm-based query subset compression method in step S105 specifically comprises the following steps: (1) Constructing a maximum heap according to all the queries and the benefits corresponding to the queries, and obtaining the maximum heap with the maximum conditional benefit query corresponding to the heap top element; (2) Taking out a heap top element according to the constructed maximum heap to obtain a currently selected query; (3) Removing the query from the rest of the unselected query sets according to the selected query, and updating benefits of other queries to obtain an updated compressed query subset and the rest of the unselected query sets, wherein the calculation formula for updating the benefits of other queries is as follows: G′(c j )=max(G(c j )- (c j ),0) Where G' (c j ) is the benefit of the updated query c j , (C j ) represents the impact that can be made on query c j when index optimization is performed on query c i , and the updated benefit G' (c j ) is not less than zero; (4) And (3) performing continuous iteration steps (1) and (2) until reaching a preset value according to the comparison of the size of the compressed query subset and the preset value, and obtaining the compressed query subset.
4. The adaptive index recommendation method based on deep reinforcement learning of claim 1, wherein the step S2 comprises the following sub-steps: s201, constructing a high-dimensional query feature vector E c of each query according to the compressed query subset, wherein the high-dimensional query feature vector comprises query self features, data distribution features and database state features; the self characteristics comprise query types, related tables, columns, predicates, connection types and/or aggregation functions, the data distribution characteristics comprise the base numbers, the data inclination degrees and/or the index selection degrees of related columns, and the database state characteristics comprise the current CPU utilization rate, the memory use, the disk I/O and/or the existing index structure after the sliding window smoothing treatment; S202, according to the high-dimensional feature vector E c , performing nonlinear transformation by using a coding network, and mapping the high-dimensional feature to a low-dimensional potential space to obtain a low-dimensional potential representation Z c : Z c =g φ (f encoder (E c )) Wherein Z c is a low-dimensional potential representation of the query c, g φ () is a nonlinear transformation function of an output layer of the coding network, f encoder () is the coding network of the multi-layer perceptron structure, and phi is a network parameter; S203, performing reconstruction mapping through a decoding network according to the low-dimensional potential representation Z c to obtain a reconstructed query feature E c ': E c ′=h φ (Z c ) Where E c ' is the reconstructed query feature and h φ () is the decoding network; s204, performing error calculation according to the original input feature E c and the reconstruction feature E c ' to obtain a loss function, wherein the calculation formula is as follows: where Γ is the loss function, m is the total number of data samples; and S205, performing back propagation and gradient optimization according to the loss function to obtain updated network parameters.
5. The adaptive index recommendation method based on deep reinforcement learning of claim 4, wherein in the step S202, the encoding network is designed into a multi-layer perceptron structure, the number of neurons of an input layer is equal to the dimension of a high-dimensional query feature vector, the number of neurons of a hidden layer is 1/2 of the dimension of the vector, the number of neurons of an output layer is the desired low-dimensional potential representation dimension, the high-dimensional feature is mapped to the low-dimensional potential space, a ReLU function is used for the hidden layer in the aspect of an activation function, and a linear activation function is used for the output layer; In step S203, the decoding network also adopts a multi-layer perceptron structure, corresponding to the encoding network, the number of neurons of the input layer is set to be 1/2 of the dimension of the high-dimensional vector, the number of neurons of the hidden layer is set to be 1/2 of the dimension of the high-dimensional vector, the number of neurons of the output layer is restored to be the dimension of the high-dimensional query feature vector, the low-dimensional potential representation is reconstructed to be the high-dimensional query feature, the activation function hidden layer uses a ReLU function, the output layer selects the activation function according to the data range of the high-dimensional query feature vector E c , if the feature values of E c are all within the (0, 1) interval, a Sigmoid function is adopted, and the formula is: Where S () is a sigmoid function, x is an input value, e is a natural constant, and the output value is mapped to a (0, 1) interval.
6. The adaptive index recommendation method based on deep reinforcement learning of claim 1, wherein the deep reinforcement learning model in step S3 comprises an online network and a target network; The online network and the target network are identical in structure, and each of the online network and the target network comprises an input layer, a full-connection layer, a cost function layer, a dominance function layer and an output layer, wherein the input end of the input layer is connected with a state representation feature vector obtained in the step S2, the state representation feature vector comprises features of a current index set and query features generated in the step S2, the output end of the state representation feature vector is connected with the input end of the full-connection layer, the output end of the full-connection layer is respectively connected with the input ends of the cost function layer and the dominance function layer, the cost function layer is used for calculating the overall value of the current state, the dominance function layer is used for evaluating the dominance of each action relative to the average action, the output ends of the cost function layer and the dominance function layer are connected with the input end of the output layer, and the output end of the output layer serves as the output end of the online network or the target network, and the value estimation of each action in the current state is output.
7. The adaptive index recommendation method based on deep reinforcement learning of claim 1 or 6, wherein the step S3 comprises the following specific steps: S301, inputting the state representation feature vector generated in the step S2 into an online network, and outputting Q value estimation corresponding to each index action under the current state by the online network, wherein the index action comprises physical storage attribute operation of creating a new index, deleting an existing index or modifying the existing index for a specific column; S302, adopt -Greedy policy for action selection Randomly selecting an index action to search a new action space to The index action with the maximum Q value estimation is selected as the execution action, and decision is made by utilizing the knowledge which is learned currently; S303, executing the selected indexing action, acquiring a new database state and a real-time query load, executing the workload compression flow of the step S1 on the real-time query load, generating a representative query subset, generating a new low-dimensional state representation feature vector through the same feature extraction flow of the step S2, and simultaneously calculating instant rewards; s304, representing the feature vector according to the new state, and calculating a target Q value by using a target network; S305, forming a sample from the current state, the execution action, the instant rewards and the new state, and storing the sample into an experience playback buffer area; S306, randomly extracting a batch of samples from the buffer area for training when the number of samples in the experience playback buffer area reaches a preset threshold value, and calculating a loss value for each sample; s307, updating parameters of the online network by using the loss value through a back propagation algorithm; s308, training is carried out every K steps, parameters of the online network are copied to the target network to update the parameters of the target network, and after training converges or reaches the preset training steps, a trained deep reinforcement learning model is obtained and used for generating a recommended index set.
8. The adaptive index recommendation method based on deep reinforcement learning of claim 7, wherein in the step S302, The values decay with increasing training steps as follows: Wherein, the For initial purposes The value of the sum of the values, To the end The value, T, is the current training step, and T is the total training step; In the step S303, the instant prize r is calculated by the following steps that if the index action is executed, the query execution time is shortened and the system throughput is improved, r is a positive prize, the prize value is related to the query execution time shortening proportion and the system throughput improving proportion, and if the query execution time is prolonged or the system throughput is deteriorated, r is a negative prize, and the prize value is related to the deterioration degree, wherein the specific calculation formula is as follows: Wherein r is instant rewarding, T old and T new are respectively query average execution time before and after executing action, X old and X new are respectively system throughput before and after executing action, alpha and beta are weight coefficients dynamically adjusted according to service requirements, alpha+beta=1 is satisfied, and a value range [0,1] is set according to the service requirements; in the step S304, the calculation formula of the target Q value is: Q target =r+γ×max{Q online (s′,a′)} Wherein Q target is a target Q value, wherein r is an instant prize, gamma is a discount factor, s 'is a new state, a' is an action in the new state, and max { Q online (s ', a') } represents that the value estimation of each action in the new state is maximized; in the step S306, the calculation formula of the loss value is: Where N is the number of samples and Q estimation is the Q value estimate of the selected action in the current state by the on-line network.
9. The adaptive index recommendation method based on deep reinforcement learning of claim 1, wherein the step S4 comprises the following specific steps: S401, evaluating the state of the current database through an online network according to the deep reinforcement learning model trained in the step S3, and obtaining benefit prediction of each index configuration under the state of the current database; S402, selecting index configuration with maximum benefit according to the benefit prediction result and the state of the current database; s403, dynamically adjusting an index strategy and an index set of the database according to the optimal index configuration output by the training model, and updating the index configuration of the database in real time; s404, collecting actual running conditions of the database, and feeding back the effect of the current recommended index configuration in the actual running to the deep reinforcement learning model, and iterating and optimizing model parameters and training strategies.
10. The adaptive index recommendation method based on deep reinforcement learning of claim 9, wherein in the step S404, the method for iteratively optimizing model parameters specifically comprises the following steps: (1) Obtaining an actual reward R' according to the inquiry execution time and the resource utilization rate of index configuration in the actual operation of the database and the instant reward calculation formula in the step S303; (2) According to the target Q target calculated in step S304 and the Q value estimation Q estimation of the online network on the selected action in the current state, calculating the absolute value of the time sequence difference error of the model, and using the absolute value as the priority index of the sample, wherein the calculation formula is as follows: ΔR=∣Q target -Q estimation ∣ Wherein DeltaR represents the absolute value of the model prediction error, reflects the difference between the model prediction and the target, and shows that the larger the error is, the larger the difference between the model prediction and the actual situation is; (3) Taking the absolute value delta R of the model prediction error as the sampling priority of a sample in experience playback to guide effective learning of the model; (4) According to the current running state s of the database, the model configures an action a, an actual reward R 'and a new state s' after the action is executed aiming at the index configuration action a, the actual reward R 'and the new state s' after the action is executed, and a sample containing the current state, the executed action, the actual reward and the new state is obtained, namely (s, a, R ', s'); (5) Adding the generated samples to an experience playback buffer area to obtain an updated experience playback buffer area, enabling the buffer area to contain more samples reflecting actual running conditions, and removing the samples with the lowest priority if the experience playback buffer area reaches the upper limit of capacity; (6) According to the absolute value delta R of the sample deviation, a sampling priority is given to each newly generated sample, the larger the deviation is, the higher the sampling priority is, and the larger the probability of being extracted in experience playback is, and the calculation formula is as follows: P=(ΔR+ε) η wherein epsilon is an extremely small constant, eta is a super parameter for controlling the influence degree of the priority, the self-adaptive adjustment strategy is adopted to dynamically adjust according to the training error, if the training fluctuation is large, eta is reduced, and if the convergence is slow, eta is increased; (7) Sampling samples from the experience playback buffer area according to the sampling priority of the samples to obtain samples for model training; (8) And according to the extracted sample, adjusting parameters and training strategies of the deep reinforcement learning model to obtain the deep reinforcement learning model with updated parameters and training strategies.

Description

Adaptive index recommendation method based on deep reinforcement learning Technical Field The invention relates to the technical fields of artificial intelligence, deep learning and databases, in particular to a self-adaptive index recommendation method based on deep reinforcement learning. Background Index recommendation is a key technology for improving the query efficiency of a relational database, and directly affects the overall performance and user experience of a database system. Because of high query operation frequency and more consumed system resources, constructing an optimal index set becomes an important research direction in the database field. Index recommendation automatically generates an index scheme capable of shortening query execution time to the greatest extent by intelligently analyzing query load and data characteristics, and can avoid blindness and hysteresis of manual setting of indexes. The efficient index recommendation system can accurately evaluate index benefits, balance maintenance cost, continuously optimize a database access path, and ensure that a database application program always keeps excellent response speed and user satisfaction when a query mode is continuously changed. Database index recommendation and optimization relies mainly on three types of methods, rules-based, statistical and machine learning. The method based on rules recommends indexes by virtue of experience of a database manager or predefined rules, has poor flexibility, is difficult to process complex and changeable query modes by virtue of analyzing query logs and recommending indexes by virtue of database metadata, is raised by machine learning, particularly by deep learning technology, and brings new ideas for intelligent index recommendation by learning query and index relations through training models. However, the existing methods have several drawbacks in coping with large-scale query loads, dynamically adjusting index configuration, and processing complex query patterns. The existing database index recommendation technology has the main defects that the efficiency is low when a large-scale query load is processed, an effective index recommendation scheme is difficult to give in a short time, the capability of dynamically adjusting index configuration is lacking, self-adaptive optimization cannot be performed according to the state of a real-time database and the change of the query load, the processing capability of a complex query mode is limited, and potential relations between query features and indexes are difficult to accurately capture, so that recommendation results are inaccurate. Disclosure of Invention Aiming at the defects in the prior art, the self-adaptive index recommendation method based on deep reinforcement learning solves the problems of low query efficiency and low response speed caused by the lack of the capability of dynamically adjusting index configuration of the existing database index. In order to achieve the aim of the invention, the technical scheme adopted by the invention is that the self-adaptive index recommendation method based on deep reinforcement learning comprises the following steps: s1, compressing workload data to generate a representative query subset; s2, extracting features of the compressed query subsets to generate feature vectors, and training the feature vectors through a deep neural network to generate feature representations capable of reflecting data characteristics; S3, constructing a deep reinforcement learning model containing an online network and a target network according to the feature representation, and generating an optimal index subset; and S4, deploying the trained model to a real database, and dynamically adjusting index configuration according to the state and the query load of the real-time database. Further, step S1 includes the following sub-steps: S101, according to the query in the input workload and the estimated cost of the optimizer, calculating the utility of each query, wherein the calculation formula is as follows: B(ci)= Where W= { c 1,c2,…,ci,cj,…,cn } represents the input workload, c i represents the query, B (c i) represents the utility of the query, i.e., the ratio of the estimated cost reduction to the total estimated cost reduction for all queries in the workload, Δ (c i) represents the estimated execution cost reduction after adding all relevant indexes to query c i in the index-free state, Representing the total estimated cost reduction for all queries; S102, calculating the similarity of each query and other queries according to the indexable list condition in the query, wherein the calculation formula is as follows: Wherein, the Representing a set containing columns of queries c i that can be indexed, D (c i,cj) representing the similarity between query c i and query c j, the formula measuring the degree of overlap of two queries in terms of indexable columns using a set operation, reflecting the similarity re