Search

CN-121681575-B - Database query statement intelligent conversion and analysis method based on natural language

CN121681575BCN 121681575 BCN121681575 BCN 121681575BCN-121681575-B

Abstract

The invention provides an intelligent conversion and analysis method of database query sentences based on natural language, which relates to the technical field of natural language processing and database query, and comprises the steps of constructing a joint semantic space of fusion language and database modes, performing multi-level analysis on natural language query, constructing a multi-hop reasoning path based on a graph attention propagation mechanism, combining a history query record to establish mapping of entity relation and mode elements, and generating formalized query sentences under the constraint of a database. The invention realizes the accurate conversion from natural language to database query, improves the understanding accuracy of query semantics and the relativity of query results, and reduces the technical threshold of using the database by users.

Inventors

  • MA CHAOYANG

Assignees

  • 上海熙软科技有限公司

Dates

Publication Date
20260508
Application Date
20260210

Claims (8)

  1. 1. The intelligent conversion and analysis method for the database query statement based on the natural language is characterized by comprising the following steps: Constructing a joint semantic space integrating the language characterization subspace and the database mode characterization subspace, and forming semantic alignment distribution in the joint semantic space by comparing semantic vectors of natural language query expression and structure vectors of mode elements in target database structure description through comparison learning, wherein the semantic alignment distribution comprises the following steps: Initializing the language representation subspace and the database mode representation subspace, uniformly mapping vector dimensions of the language representation subspace and the database mode representation subspace to preset dimensions through a projection layer sharing parameters, and forming the joint semantic space; encoding the natural language query expression into a query semantic vector in a language representation subspace through a bidirectional mapping network, encoding a pattern element in a target database structure description into a pattern structure vector in a database pattern representation subspace, and mapping the query semantic vector and the pattern structure vector into the joint semantic space through cross-space projection transformation; In the joint semantic space, the query semantic vector and the pattern structure vector are used as anchor point pairs to construct a positive sample triplet, and the query semantic vector and the pattern structure vector irrelevant to the semantic in the anchor point pairs are randomly combined to construct a negative sample triplet; Synchronously adjusting network parameters of the projection layer of the bidirectional mapping network and the shared parameters through a contrast learning mechanism, so that the vector distance between the query semantic vector and the mode structure vector in the joint semantic space in the positive sample triplet is smaller than the vector distance between the query semantic vector and the mode structure vector in the negative sample triplet, and semantic alignment distribution is formed; performing multi-level semantic analysis on the natural language query expression, extracting an intention representation describing a query target and an entity relationship representation describing query conditions, including: Performing lexical analysis and syntactic analysis on the natural language query expression, and identifying a query action word, a target object word and a condition limiting word in the natural language query expression; taking a main query action word in the query action words as a root node and auxiliary query action words as child nodes, establishing intention level association and mapping the intention level association as an intention dominant edge; Taking the query action word as a central node, taking the target object word as an association node, and constructing an intention semantic graph for describing a query target by connecting an action-object association edge with the intention dominance edge; dividing the condition qualifiers into numerical constraint types, category screening types and relation filtering types, and endowing corresponding type labels for entity-attribute association edges; The target object word is used as an entity node, the condition limiting word is used as an attribute node, and the entity node and the attribute node are connected through an entity-attribute association edge with a type label, so that an entity relation semantic graph for describing query conditions is constructed; Coding the intention semantic graph by adopting a heterogeneous graph neural network, respectively setting an associated side attention weight and a dominant side attention weight for the action-object associated side and the intention dominant side, and aggregating to form an intention representation describing a query target; coding the entity relation semantic graph by adopting a type perception graph convolution network, respectively calculating constraint intensity, attribution confidence and path connectivity according to the type label, and aggregating to form entity relation representation describing query conditions; Constructing a multi-hop inference path between the pattern elements based on a graph attention propagation mechanism, carrying out semantic propagation and aggregation on entity relation representations marked by the pattern elements along the multi-hop inference path, carrying out self-adaptive adjustment on the weight of the multi-hop inference path by combining with co-occurrence statistical characteristics of the pattern elements in a history query execution record, and establishing a mapping relation with context dependency between the entity relation representations and the pattern elements; Selecting the mapping relation with the confidence coefficient meeting a preset confidence coefficient threshold as effective mapping; taking the integrity constraint, the foreign key dependency relationship and the query grammar rule of the target database as decoding constraint, and generating formal query sentences which meet the query language specification of the target database and meet semantic consistency under the limitation of the decoding constraint based on the intention representation and the effective mapping.
  2. 2. The method of claim 1, wherein synchronizing the network parameters of the projection layer of the bidirectional mapping network and the shared parameters by a contrast learning mechanism such that a vector distance of the query semantic vector and the pattern structure vector in the joint semantic space in the positive sample triplet is smaller than a vector distance of the query semantic vector and the pattern structure vector in the negative sample triplet, forming a semantically aligned distribution comprises: Calculating the interval degree between the positive sample vector distance and the negative sample vector distance to obtain a distance difference measurement value; obtaining a partial derivative of the distance difference measurement value with respect to the coding parameter of the bidirectional mapping network to obtain a coding parameter gradient, obtaining a mapping parameter gradient by obtaining a partial derivative of the distance difference measurement value with respect to the mapping parameter of the projection layer of the shared parameter, wherein the coding parameter gradient and the mapping parameter gradient together form a gradient updating direction of network parameters; When the distance difference metric value indicates that the positive sample vector distance is greater than the negative sample vector distance, adjusting coding parameters of the bidirectional mapping network to reduce the positive sample vector distance, and adjusting mapping parameters of a projection layer of the shared parameters to increase the negative sample vector distance; And synchronously applying the gradient updating direction of the network parameter to the projection layers of the bidirectional mapping network and the shared parameter through a back propagation mechanism, and iteratively updating the mapping parameters of the coding parameter of the bidirectional mapping network and the projection layer of the shared parameter until the distance difference metric value converges to a state that the distance between the positive sample vector and the negative sample vector is smaller than that between the positive sample vector and the negative sample vector, so as to form semantically aligned distribution.
  3. 3. The method of claim 1, wherein constructing a multi-hop inference path between the schema elements based on a graph attention propagation mechanism, and semantically propagating and aggregating entity relationship representations annotated with the schema elements along the multi-hop inference path comprises: calculating semantic association strength among the pattern elements, and calculating a path confidence coefficient attenuation factor according to the number of the intermediate pattern elements; screening a mode element pair based on the semantic association strength and the path confidence attenuation factor, and identifying a connection path between a starting mode element and a target mode element in the mode element pair as the multi-hop reasoning path; Sequentially acquiring semantic representations of intermediate mode elements along the multi-hop reasoning path by taking entity relation representations marked by the mode elements as initial semantic representations, calculating semantic matching degree of the initial semantic representations and the semantic representations of the intermediate mode elements, calculating semantic consistency of the initial semantic representations and semantic representations of target mode elements, and determining attention propagation weights through weighted combination of the semantic matching degree and the semantic consistency; According to the attention propagation weight, the initial semantic representation is propagated hop by hop from an initial mode element to a target mode element along the multi-hop reasoning path, and the initial semantic representation and the semantic representation of the intermediate mode element are subjected to weighted fusion in the propagation process to obtain the propagated semantic representation; And fusing all the propagated semantic representations reaching the same target mode element through aggregation operation to obtain an aggregate semantic representation corresponding to the target mode element.
  4. 4. The method of claim 1, wherein adaptively adjusting the weights of the multi-hop inference paths in combination with co-occurrence statistics of schema elements in a historical query execution record, establishing a mapping relationship between the entity relationship representation and the schema elements comprising: counting the co-occurrence frequency of pattern element pairs of pattern element co-occurrence sequences in the history query execution records, calculating a time attenuation factor according to the time stamp of the history query execution records, and accumulating after scaling the co-occurrence frequency by using the time attenuation factor to obtain the history co-occurrence intensity of the pattern element pairs; Performing product operation on the historical co-occurrence intensities of each adjacent mode element pair on the multi-hop inference path to obtain the historical co-occurrence intensity of the multi-hop inference path, and fusing the historical co-occurrence intensity with the current weight of the multi-hop inference path to obtain a path adjustment coefficient; inquiring entity type mode elements and relationship type mode elements corresponding to the entity type information and relationship type information in the entity relationship representation in a knowledge graph mode layer; Calculating weighted type similarity between the entity relation representation and the entity class pattern element based on the path adjustment coefficient, calculating weighted relation similarity between the entity relation representation and the relation class pattern element, and splicing the weighted type similarity and the weighted relation similarity into a context feature vector; and splicing and fusing the context feature vector and the entity relation representation, mapping the context feature vector and the entity relation representation into a pattern element labeling vector through a nonlinear transformation network, and establishing a mapping relation with context dependency between the entity relation representation and the pattern element.
  5. 5. The method of claim 1, wherein generating a formalized query statement that meets the target database query language specification and meets semantic consistency under the definition of the decoding constraint based on the intent representation and the efficient mapping comprises: Extracting a query target entity based on the intent representation, constructing a table field mapping table according to the query target entity by retrieving table names and field names from the effective mapping, and extracting an inter-table foreign key constraint relation from the decoding constraint to construct a table connection dependency graph; determining a related table set according to the table segment mapping table, searching candidate connection paths for connecting the table set in the table connection dependency graph, distributing path cost weights according to the use frequency of each external key relation in historical query and query execution efficiency statistical information on the candidate connection paths, and selecting candidate connection paths with minimum weighted sum of path lengths and the path cost weights to generate table connection condition clauses; Extracting an attribute mapping relation between the entity relation representation and a database field from the effective mapping, converting attribute constraint conditions in the attribute mapping relation into a field filtering expression and generating a filtering condition clause, and converting the intention representation into a desired semantic graph structure and into a vocabulary selection preference vector; And adopting an autoregressive sequence generation decoder, limiting a candidate vocabulary space according to the decoding constraint at each decoding time step, carrying out weighted adjustment on the generation probability of each vocabulary in the candidate vocabulary space by using the vocabulary selection preference vector, selecting the vocabulary with the highest generation probability, splicing the vocabulary to the generated query sentence fragments, using the table connection condition clause as a connection condition, using the filtering condition clause as a filtering condition, and obtaining the formalized query sentence.
  6. 6. A natural language based database query statement intelligent conversion and analysis system for implementing the method of any one of claims 1-5, comprising: The first unit is used for constructing a joint semantic space integrating the language characterization subspace and the database mode characterization subspace, and forming semantic alignment distribution in the joint semantic space by comparing and learning semantic vectors expressed by natural language query with structural vectors of mode elements in the target database structural description; The second unit is used for carrying out multi-level semantic analysis on the natural language query expression and extracting an intention representation describing a query target and an entity relation representation describing a query condition; The third unit is used for constructing a multi-hop inference path between the mode elements based on a graph attention propagation mechanism, carrying out semantic propagation and aggregation on entity relation representations marked by the mode elements along the multi-hop inference path, carrying out self-adaptive adjustment on the weight of the multi-hop inference path by combining with co-occurrence statistical characteristics of the mode elements in a history query execution record, and establishing a mapping relation with context dependency between the entity relation representations and the mode elements; a fourth unit, configured to select, as an effective mapping, the mapping relationship in which the confidence coefficient meets a preset confidence coefficient threshold; And a fifth unit, configured to take the integrity constraint, the foreign key dependency relationship, and the query grammar rule of the target database as decoding constraints, and generate, based on the intent representation and the valid mapping, a formalized query statement that meets the query language specification of the target database and satisfies semantic consistency under the limitation of the decoding constraints.
  7. 7. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 5.
  8. 8. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 5.

Description

Database query statement intelligent conversion and analysis method based on natural language Technical Field The invention relates to the technical field of natural language processing and database query, in particular to an intelligent conversion and analysis method for a database query statement based on natural language. Background With the rapid development of information technology, databases have become the infrastructure for organizing and business to store, manage, and access data. Traditional database queries require users to master structured query languages such as SQL, which constitutes a high usage threshold for users of non-technical background. To increase the availability and popularity of databases, natural language query interfaces have become research hotspots that allow users to interact with databases using daily languages rather than specialized query languages. The technology of converting natural language into database query language has undergone an evolution from rule-based methods, to semantic parsing-based methods, to deep learning-based methods in recent years. Early systems relied primarily on manually defined conversion rules and templates, and later techniques utilized more machine learning techniques to automatically learn the mapping between language and query sentences from data. With the progress of natural language processing technology, deep learning-based methods exhibit strong capabilities in terms of processing complex queries, adapting to different database structures, and the like. There is a significant gap in representation between semantic understanding and database structure, the diversity of natural language expressions and the lack of efficient semantic alignment mechanisms between strict structures of database schemas, resulting in lower accuracy in handling complex query intent, especially when the terms used by the user differ from the table and field names in the database. The prior art lacks the ability of deep modeling of complex relationships between database schema elements, is difficult to handle the scene of associated query and multi-step reasoning which need to cross a plurality of tables, and particularly when complex structures such as multi-table connection, nested query and the like are involved, a reasonable query path cannot be constructed. Disclosure of Invention The embodiment of the invention provides a database query statement intelligent conversion and analysis method based on natural language, which can solve the problems in the prior art. In a first aspect of an embodiment of the present invention, a method for intelligently converting and analyzing a database query statement based on natural language is provided, including: Constructing a joint semantic space integrating the language characterization subspace and the database mode characterization subspace, and forming semantic alignment distribution in the joint semantic space by comparing and learning semantic vectors of natural language query expression and structural vectors of mode elements in target database structural description; Performing multi-level semantic analysis on the natural language query expression, and extracting an intention representation describing a query target and an entity relationship representation describing a query condition; Constructing a multi-hop inference path between the pattern elements based on a graph attention propagation mechanism, carrying out semantic propagation and aggregation on entity relation representations marked by the pattern elements along the multi-hop inference path, carrying out self-adaptive adjustment on the weight of the multi-hop inference path by combining with co-occurrence statistical characteristics of the pattern elements in a history query execution record, and establishing a mapping relation with context dependency between the entity relation representations and the pattern elements; Selecting the mapping relation with the confidence coefficient meeting a preset confidence coefficient threshold as effective mapping; taking the integrity constraint, the foreign key dependency relationship and the query grammar rule of the target database as decoding constraint, and generating formal query sentences which meet the query language specification of the target database and meet semantic consistency under the limitation of the decoding constraint based on the intention representation and the effective mapping. Constructing a joint semantic space integrating the language characterization subspace and the database mode characterization subspace, and forming semantic alignment distribution of semantic vectors of natural language query expression and structure vectors of mode elements in target database structure description in the joint semantic space through comparison learning comprises the following steps: Initializing the language representation subspace and the database mode representation subspace, uniformly mapping ve