Search

CN-121979879-A - Method and device for rebuilding association of external keys of database table

CN121979879ACN 121979879 ACN121979879 ACN 121979879ACN-121979879-A

Abstract

The application relates to the technical field of databases and discloses a method and a device for rebuilding association of external keys of a database table. The method comprises the steps of generating column pairs by pairwise combination of a plurality of target data tables, constructing a first set based on all column pairs, pruning the first set based on a target unique key through a target large language model, constructing a second set based on column pairs reserved after pruning, performing foreign key association detection on the column pairs in the second set through the target large language model to obtain a detection result, traversing all column pairs with foreign key association, deleting at least one column pair with foreign key association through the target large language model under the condition that foreign key conflict exists between the column pairs so as to eliminate the foreign key conflict, and reconstructing the foreign key association for the plurality of target data tables under the condition that foreign key conflict does not exist between the column pairs. By adopting the method, the accuracy and generalization of the external key association reconstruction of the database can be improved, and the reconstruction quality is improved.

Inventors

  • TANG ZIJIAN
  • Gan Mingran
  • ZHANG YING
  • CAI SIBO
  • WANG RUOZHU
  • Zhang Tongen
  • JING XIANG
  • HU FENG
  • LIU ZHENJUN
  • YUE SHIBIN

Assignees

  • 北京大学

Dates

Publication Date
20260505
Application Date
20251223

Claims (10)

  1. 1. The method for rebuilding the association of the external keys of the database table is characterized by comprising the following steps: combining the columns of the plurality of target data tables in pairs to generate column pairs, and constructing a first set based on all the column pairs; Pruning the first set, wherein the pruning comprises deleting column pairs of which the referenced columns are not contained in all target unique keys from the first set through a target large language model, wherein the target unique keys are candidate keys which are most likely to be referenced and are determined from all candidate keys corresponding to the target data table; Performing foreign key association detection on column pairs in a second set through the target large language model to obtain a detection result, wherein the detection result is that the column pairs have foreign key association or the column pairs do not have foreign key association; traversing all column pairs with foreign key association, and determining whether foreign key conflict exists between each column pair; deleting at least one column pair with foreign key association through the target large language model under the condition that foreign key conflict exists between the column pairs so as to eliminate the foreign key conflict; And reconstructing foreign key associations for the plurality of target data tables based on all column pairs currently having foreign key associations and having no foreign key conflicts in the case that foreign key conflicts do not exist between the column pairs.
  2. 2. The database table foreign key association reconstruction method of claim 1, wherein deleting column pairs from the first set that are referenced columns that are not included in all target unique keys by a target large language model comprises: Acquiring the minimum unique column combination of each target data table, and determining each minimum unique column combination as a candidate key, wherein the minimum unique column combination is the combination of the minimum columns capable of uniquely identifying one row of data; Intercepting the foremost partial form data from each target data table; Constructing a corresponding first prompting word based on the minimum unique column combination corresponding to each target data table and the partial table data, wherein the first prompting word comprises the following analysis rules: Rule one, can confirm one row of data of the form uniquely, can be referred to; Rule II, the more front the position of the sequence number in the data table is, the more likely it is to be cited; rule three, field name ending with key or id, may be referenced; Rule IV, the data type is an integer or a character string, and can be cited; the text length is short, and the text is suitable for human reading and can be cited; Rule six, the fewer the number of fields, the more likely it is to be referenced; Rule seven, based on entity meaning of business of the affiliated form, the business logic conforming to the business logic can be cited; respectively inputting a first prompt word corresponding to each target data table into the target large language model, so that the target large language model determines a candidate key which is most likely to be referenced from all candidate keys corresponding to the target data table based on the analysis rule, and the candidate key is used as a target unique key corresponding to the target data table; Traversing each column pair in the first set, deleting column pairs for which the referenced column is not contained in all target unique keys.
  3. 3. The method for reconstructing the foreign key association of the database table according to claim 1, wherein the detecting the foreign key association of the column pairs in the second set by the target large language model to obtain a detection result comprises: determining the type of target information required by reconstructing external key association for the plurality of target data tables through the target large language model, wherein the type of target information comprises a maximum value, a minimum value, a line number, a column sequence number and an average character length; inquiring the plurality of target data tables based on the types of the target information to acquire corresponding target information; Constructing a corresponding second prompt word based on the target information and each column pair in the second set; And carrying out foreign key association detection on each column pair through the target large language model based on the second prompt word to obtain a detection result.
  4. 4. The database table foreign key association reconstruction method of claim 3, wherein determining, by the target large language model, a type of target information required to reconstruct foreign key associations for the plurality of target data tables comprises: constructing a first data example based on all column pairs in the second set and table data of the forefront part in a target data table to which two columns in the column pairs belong; inputting the first data example into the target large language model so that the target large language model analyzes the column pairs and corresponding partial table data to obtain the type of target information required for performing foreign key association detection on the column pairs; And inquiring the plurality of target data tables based on the type of the target information, acquiring corresponding target information, and associating the acquired target information with corresponding column pairs in the second set.
  5. 5. The method of claim 3, wherein constructing a corresponding second hint word based on the target information and each column pair in the second set comprises: Based on each column pair and the associated target information, respectively constructing a second prompt word based on a thinking chain, wherein the second prompt word comprises the following analysis rules: The first rule is that the semantic validity of the foreign key relation of the column pair is judged based on the semantic of the data table, and the higher the semantic validity is, the higher the foreign key association possibility is; The second rule is to judge the similarity of the character strings of the two columns and whether the suffix of the character string has id, key or no based on grammar rules, wherein the higher the similarity of the character string is, the greater the possibility of external key association is under the condition that the suffix of the character string has id, key or no; The third rule is that judging is carried out based on the data containing relation, coverage rate, average length difference and out-of-range ratio of two columns in the column pair, wherein the data of the two columns are father subsets, the higher the coverage rate is, the smaller the average length difference is, and the lower the out-of-range ratio is, the greater the possibility of foreign key association is; Based on the second prompt word, performing foreign key association detection on each column pair through the target large language model to obtain a detection result, wherein the detection result comprises: And respectively inputting the second prompt words of each column pair into the target large language model, so that the target large language model comprehensively analyzes the column pairs based on the analysis rules and the associated target information, and generates a corresponding detection result.
  6. 6. The method for rebuilding foreign key association of database table according to claim 1, wherein the foreign key conflict comprises multiple reference conflicts and cyclic reference conflicts, traversing all column pairs with foreign key association, determining whether foreign key conflict exists between each column pair, comprising: Detecting all column pairs with foreign key association, judging whether a situation that one column references a plurality of columns exists or not, judging that multi-reference conflict exists among the column pairs if the situation that one column references a plurality of columns exists, and judging that multi-reference conflict does not exist among the column pairs if the situation that one column references a plurality of columns does not exist; Under the condition that multi-reference conflict does not exist, a directed graph of foreign key association is built based on all column pairs with foreign key association, whether closed loops exist in the directed graph is judged, each node of the directed graph is used for representing one column, each side of the directed graph is connected with two nodes and used for representing reference relations between the two columns, under the condition that closed loops exist in the directed graph, circular reference conflict exists between the column pairs, and under the condition that closed loops do not exist in the directed graph, circular reference conflict does not exist between the column pairs.
  7. 7. The method for rebuilding a foreign key association of a database table according to claim 6, wherein in case of a multi-reference conflict between column pairs, deleting at least one column pair having a foreign key association by the target large language model to eliminate the foreign key conflict, comprises: Constructing a third prompting word based on a thinking chain based on all column pairs corresponding to each multi-reference conflict, wherein the third prompting word comprises the following analysis rules: the first rule is that judging is carried out according to the semantic association strength of the two columns, wherein the higher the semantic association strength is, the higher the score of the column pair is; Judging according to the database design practice, wherein the higher the external key association of the two columns is in accordance with the database design practice, the higher the score of the column pair is; Inputting the third prompt word and all column pairs with multi-reference conflict into the target large language model, so that the target large language model performs the following steps of analyzing each column pair with multi-reference conflict and generating corresponding scores; and deleting the column pair which is currently required to be deleted from the second set.
  8. 8. The method for rebuilding a foreign key association of a database table according to claim 6, wherein in case of a circular reference conflict between column pairs, deleting at least one column pair having a foreign key association by the target large language model to eliminate the foreign key conflict, comprises: based on all column pairs corresponding to each cyclic reference conflict, a fourth prompting word based on a thinking chain is constructed, wherein the fourth prompting word comprises the following analysis rules: the first rule is that judging is carried out according to the semantic association strength of the two columns, wherein the higher the semantic association strength is, the higher the score of the column pair is; Judging according to the database design practice, wherein the higher the external key association of the two columns is in accordance with the database design practice, the higher the score of the column pair is; Inputting all column pairs of the fourth prompt word and the circular reference conflict into the target large language model so that the target large language model performs the following steps of analyzing each column pair with the circular reference conflict and generating corresponding scores; sorting all column pairs according to the score from high to low, determining the column pair with the lowest score as the column pair needing to be deleted currently, and outputting the column pair; deleting the column pairs which need to be deleted currently from the second set; Reconstructing the directed graph associated with the foreign key based on all column pairs in the current second set to determine whether there is a circular reference conflict between the current column pairs.
  9. 9. The method for rebuilding a database table foreign key association according to claim 1, wherein pruning the first set further comprises: Deleting column pairs containing empty columns, and deleting column pairs corresponding to columns in the data table with the latest access time greater than the current time by a first threshold; detecting whether the data types of two columns in each column pair are the same or not, and deleting the column pair corresponding to the two columns with different data types; Deleting column pairs of which the contained data types are floating point numbers, boolean type and binary large objects; and judging whether two columns in each column pair are in a parent-child set relationship or not, and deleting the column pair which is not in the parent-child set relationship.
  10. 10. A database foreign key association reconstruction device, configured to perform a method according to any one of claims 1 to 9, comprising: The preprocessing module is configured to combine the columns of the plurality of target data tables in pairs to generate column pairs, and construct a first set based on all the column pairs; pruning the first set, wherein the pruning comprises deleting column pairs of which the referenced columns are not contained in all target unique keys from the first set through a target large language model, wherein the target unique keys are candidate keys which are most likely to be referenced and are determined from all candidate keys corresponding to the target data table; the association identification module is configured to detect the external key association of the column pairs in the second set through the target large language model to obtain a detection result, wherein the detection result is that the external key association exists in the column pairs or the external key association does not exist in the column pairs; The conflict processing module is configured to traverse all column pairs with foreign key association and determine whether foreign key conflict exists between the column pairs; deleting at least one column pair with foreign key association through the target large language model under the condition that foreign key conflict exists between the column pairs so as to eliminate the foreign key conflict; And the reconstruction module is configured to reconstruct the foreign key association for the plurality of target data tables based on all column pairs which have foreign key association and have no foreign key conflict currently under the condition that no foreign key conflict exists between the column pairs.

Description

Method and device for rebuilding association of external keys of database table Technical Field The application relates to the technical field of databases, in particular to a method and a device for rebuilding the association of external keys of a database table. Background The database is the basis of software application, and the integrated development of the system often involves the connection and circulation among database tables, however, under the actual application scene, the external key association of the database tables is often lost due to various reasons such as document deletion, code variation, library table change, personnel flow and the like, so that the integrated development is difficult to be carried out continuously with high efficiency. When a large database table is faced, the problem of inefficiency exists by manually identifying foreign key association, which wastes time and labor costs, so that a way of automatically reconstructing the foreign key association needs to be found. The related automatic foreign key association reconstruction scheme relies on manually defined heuristic rules, namely, the foreign key association recognition and reconstruction of the database table are performed based on manually defined rules, and as the data types in the database table are various and the foreign key association relationship is complex, the method relying on the heuristic rules cannot capture the semantics of the foreign key association, and the reconstruction accuracy is low. Moreover, since the manually defined rule is often based on a specific feature, the recognition effect is drastically reduced when applied to a database without the feature, and the scheme of manually defining the rule has a problem of low generalization. Particularly, as the size of the database is larger and larger, the external key association relationship of the data table involved in hundreds of data tables of the software system is more complex, which results in poorer reconstruction effect and is difficult to meet the requirement. Disclosure of Invention The embodiment of the application aims to provide a method and a device for reconstructing an external key association of a database table, so as to improve the accuracy and generalization of the external key association reconstruction of the database table. In order to achieve the above purpose, the technical scheme of the application is as follows: in a first aspect, an embodiment of the present application provides a method for rebuilding a table foreign key association of a database, where the method includes: combining the columns of the plurality of target data tables in pairs to generate column pairs, and constructing a first set based on all the column pairs; Pruning the first set, wherein the pruning comprises deleting column pairs of which the referenced columns are not contained in all target unique keys from the first set through a target large language model, wherein the target unique keys are candidate keys which are most likely to be referenced and are determined from all candidate keys corresponding to the target data table; Performing foreign key association detection on column pairs in a second set through the target large language model to obtain a detection result, wherein the detection result is that the column pairs have foreign key association or the column pairs do not have foreign key association; traversing all column pairs with foreign key association, and determining whether foreign key conflict exists between each column pair; deleting at least one column pair with foreign key association through the target large language model under the condition that foreign key conflict exists between the column pairs so as to eliminate the foreign key conflict; And reconstructing foreign key associations for the plurality of target data tables based on all column pairs currently having foreign key associations and having no foreign key conflicts in the case that foreign key conflicts do not exist between the column pairs. Optionally, deleting, from the first set, column pairs for which the referenced columns are not included in all target unique keys, by the target large language model, including: Acquiring the minimum unique column combination of each target data table, and determining each minimum unique column combination as a candidate key, wherein the minimum unique column combination is the combination of the minimum columns capable of uniquely identifying one row of data; Intercepting the foremost partial form data from each target data table; Constructing a corresponding first prompting word based on the minimum unique column combination corresponding to each target data table and the partial table data, wherein the first prompting word comprises the following analysis rules: Rule one, can confirm one row of data of the form uniquely, can be referred to; Rule II, the more front the position of the sequence number in the data table i