CN-114297238-B - Data query method, device and system based on distributed database system

CN114297238BCN 114297238 BCN114297238 BCN 114297238BCN-114297238-B

Abstract

The method comprises the steps of sending a first query task to first working equipment storing a first table, wherein the data query request is used for obtaining data to be queried, the first query task is used for indicating and querying an association value corresponding to the data query request in the first table, receiving the association value fed back by the first working equipment, sending a second query task to second working equipment storing a second table according to the association value, the second query task is used for indicating and querying feedback data corresponding to the association value in the second table, receiving feedback data sent by the second working equipment, determining and outputting the data to be queried according to the feedback data, reducing data scanning amount and improving data query efficiency.

Inventors

ZHANG QIWEI

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260512
Application Date: 20211223

Claims (20)

1. A data query method based on a distributed database system, the method comprising: the method comprises the steps of responding to a received data query request, determining first working equipment for storing a first table, and sending a first query task to the first working equipment, wherein the data query request is used for acquiring data to be queried, the first query task is used for indicating and querying an association value corresponding to the data query request in the first table, and a data association relationship exists between the first table and a second table; Receiving the association values fed back by the first working devices, if the number of the first working devices is multiple, performing de-duplication and integration processing on the association values fed back by the first working devices to obtain an association value set, and sending a second query task to a second working device storing the second table according to the association values or the association value set, wherein the second query task is used for indicating the second working device to query feedback data corresponding to the association values or the association value set in the second table; Receiving feedback data sent by the second working devices, if the number of the second working devices is multiple, performing de-duplication and integration processing on the feedback data sent by each second working device to obtain summarized data, determining the summarized data as data to be queried, and outputting the data to be queried; sending a second query task to a second working device storing the second table according to the association value, including: predicting and obtaining predicates corresponding to the association values according to the association values; generating the second query task according to the association value and predicates corresponding to the association value, and sending the second query task to the second working equipment; The number of the association values is a plurality, predicates corresponding to the association values are obtained according to the prediction of the association values, and the predicate prediction method comprises the following steps: predicting a commonality feature of each association value corresponding to data in the second table; and determining predicates corresponding to the association values according to the commonality characteristics.
2. The method of claim 1, wherein sending a second query task to a second work device storing the second table according to the association value comprises: carrying out grammar analysis processing and semantic analysis processing on the data query request to obtain a processed data query request; Generating execution logic information according to the processed data query request, wherein the execution logic information is used for executing information related to execution logic of data query; And generating a second query task comprising the association value and the execution logic information, and sending the second query task to the second working equipment.
3. The method of claim 1, wherein, in response to receiving the data query request, sending a first query task to a first work device storing a first table comprises: Determining a connection key and a predicate between the first table and the second table, generating the first query task according to the connection key and the predicate, and sending the first query task to the first working device.
4. The method of claim 1, wherein the first table is a left table and the second table is a right table.
5. The method of claim 1, wherein the first table is a right table and the second table is a left table, the content of the right table being greater than or equal to the content of the left table.
6. A data query method for a distributed database system, comprising: Receiving a first query task sent by scheduling equipment, wherein the first query task is determined according to a data query request, the query request is used for acquiring data to be queried, the first query task is used for indicating and querying an association value corresponding to the data query request in a first table, and a data association relationship exists between the first table and a second table; the first table in the working equipment is subjected to query processing according to the first query task to obtain the association value, if the working equipment is one of a plurality of first working equipment, the association value is fed back to the scheduling equipment so that the scheduling equipment can perform de-duplication and integration processing on the association value fed back by each first working equipment to obtain an association value set, wherein the association value or the association value set is used for generating a second query task, and the second query task is used for acquiring and outputting the data to be queried; The second query task is generated according to the association value and predicates corresponding to the association value, and predicates corresponding to the association value are predicted according to the association value; The number of the association values is a plurality, and predicates corresponding to the association values are determined according to common characteristics of the association values corresponding to the data in the second table.
7. The method of claim 6, further comprising: receiving a second query task sent by the scheduling equipment; and inquiring a second table in the working equipment according to the second inquiring task to obtain and send feedback data to the scheduling equipment, wherein the feedback data is used for determining the data to be inquired.
8. The method of claim 7, wherein the second query task is generated according to the association value and execution logic information, the execution logic information is generated according to the processed data query request by performing syntax analysis processing and semantic analysis processing on the data query request, and the execution logic information is used for executing information related to execution logic of the data query.
9. The method of claim 6, wherein the first query task is to determine a join key and predicate between the first table and the second table, and to generate from the join key and predicate.
10. The method of claim 6, wherein the first table is a left table and the second table is a right table.
11. The method of claim 6, wherein the first table is a right table and the second table is a left table, the content of the right table being greater than or equal to the content of the left table.
12. A data query device based on a distributed database system, comprising: The system comprises a first sending unit, a first working device, a second sending unit and a first query unit, wherein the first sending unit is used for responding to a received data query request, determining a first working device for storing a first table and sending a first query task to the first working device, the data query request is used for acquiring data to be queried, the first query task is used for indicating and querying a correlation value corresponding to the data query request in the first table, and a data correlation relationship exists between the first table and the second table; The first receiving unit is used for receiving the association values fed back by the first working devices, and if the number of the first working devices is multiple, performing de-duplication and integration processing on the association values fed back by the first working devices to obtain an association value set; The second sending unit is used for sending a second query task to second working equipment storing the second table according to the association value or the association value set, wherein the second query task is used for indicating the second working equipment to query feedback data corresponding to the association value or the association value set in the second table; the second receiving unit is used for receiving feedback data sent by the second working equipment; The determining unit is used for determining the data to be queried according to the feedback data; the output unit is used for outputting the data to be queried; The second working equipment is a plurality of second working equipment; the determination unit includes: the summarizing subunit is used for summarizing the feedback data sent by each second working device to obtain summarized data; The first determining subunit is used for determining the summarized data as data to be queried; The second transmitting unit includes: A prediction subunit, configured to predict and obtain a predicate corresponding to the association value according to the association value; the first generation subunit is used for generating the second query task according to the association value and predicates corresponding to the association value; a first sending subunit, configured to send the second query task to the second working device; The number of the associated values is a plurality of, and the prediction subunit comprises: a prediction module for predicting a commonality feature of each associated value corresponding to data in the second table; and the determining module is used for determining predicates corresponding to the association values according to the commonality characteristics.
13. The apparatus of claim 12, wherein the second transmitting unit comprises: The processing subunit is used for carrying out grammar analysis processing and semantic analysis processing on the data query request to obtain a processed data query request; the second generation subunit is used for generating execution logic information according to the processed data query request, wherein the execution logic information is used for executing information related to the execution logic of the data query; A third generating subunit, configured to generate a second query task that includes the association value and the execution logic information; and the second sending subunit is used for sending the second query task to the second working equipment.
14. The apparatus of claim 12, wherein the first transmitting unit comprises: A second determination subunit configured to determine a join key and a predicate between the first table and the second table; a fourth generation subunit, configured to generate the first query task according to the join key and the predicate; And the third sending subunit is used for sending the first query task to the first working equipment.
15. The apparatus of claim 12, wherein the first table is a left table and the second table is a right table.
16. The apparatus of claim 12, wherein the first table is a right table and the second table is a left table, the content of the right table being greater than or equal to the content of the left table.
17. A data querying device for a distributed database system, comprising: The third receiving unit is used for receiving a first query task sent by the scheduling equipment, wherein the first query task is determined according to a data query request, the query request is used for acquiring data to be queried, the first query task is used for indicating and querying a correlation value corresponding to the data query request in a first table, and a data correlation relationship exists between the first table and a second table; the first query unit is used for carrying out query processing on a first table in the working equipment according to the first query task to obtain a correlation value, wherein the correlation value is used for generating a second query task, and the second query task is used for acquiring and outputting the data to be queried; a feedback unit, configured to feed back the association value to the scheduling device; The feedback unit is further configured to, if the working device is one of the plurality of first working devices, feed back the association value to the scheduling device, so that the scheduling device performs deduplication and integration processing on the association value fed back by each first working device to obtain an association value set, where the association value set is used to generate a second query task; The second query task is generated according to the association value and predicates corresponding to the association value, and predicates corresponding to the association value are predicted according to the association value; The number of the association values is a plurality, and predicates corresponding to the association values are determined according to common characteristics of the association values corresponding to the data in the second table.
18. The apparatus of claim 17, further comprising: a fourth receiving unit, configured to receive a second query task sent by the scheduling device; the second query unit is used for performing query processing on a second table in the working equipment according to the second query task to obtain feedback data, wherein the feedback data is used for determining the data to be queried; And the third sending unit is used for sending the feedback data to the scheduling equipment.
19. The apparatus of claim 18, wherein the second query task is generated based on the association value and execution logic information, the execution logic information being information related to execution logic for executing a data query, the execution logic information being generated based on the processed data query by performing a syntax parsing process and a semantic analysis process on the data query to obtain a processed data query.
20. The apparatus of claim 17, wherein the first query task is to determine a join key and predicate between the first table and the second table, and to generate from the join key and predicate.

Description

Data query method, device and system based on distributed database system Technical Field The disclosure relates to the field of big data and cloud services in the technical field of artificial intelligence, in particular to a data query method, device and system based on a distributed database system. Background In database technology, such as in a distributed database system based on database technology, a linked list query (which may also be referred to as a join query, etc.) is typically one of the most resource consuming operations. In a distributed database system, the table-joining query usually includes two tables, one is a left table and the other is a right table, the right table is usually queried first and then the left table is queried, and at present, a dynamic filtering (may also be referred to as dynamic partition clipping) method is usually adopted to perform the table-joining query, however, when the dynamic partition clipping method is adopted, the right table is broadcasted to the working device storing the left table, so that the left table query is performed based on the working device storing the left table, and corresponding data is obtained. However, when the dynamic partition cutting method is adopted to perform the table lookup, broadcasting is required, which results in more consumed resources. Disclosure of Invention The disclosure provides a data query method, device and system based on a distributed database system for reducing resource loss. According to a first aspect of the present disclosure, there is provided a data query method based on a distributed database system, including: The method comprises the steps of responding to a received data query request, sending a first query task to first working equipment storing a first table, wherein the data query request is used for obtaining data to be queried, the first query task is used for indicating and querying an association value corresponding to the data query request in the first table, and a data association relationship exists between the first table and a second table; Receiving an association value fed back by the first working device, and sending a second query task to a second working device storing the second table according to the association value, wherein the second query task is used for indicating and querying feedback data corresponding to the association value in the second table; and receiving feedback data sent by the second working equipment, and determining and outputting the data to be queried according to the feedback data. According to a second aspect of the present disclosure, there is provided a data query method for a distributed database system, comprising: Receiving a first query task sent by scheduling equipment, wherein the first query task is determined according to a data query request, the query request is used for acquiring data to be queried, the first query task is used for indicating and querying an association value corresponding to the data query request in a first table, and a data association relationship exists between the first table and a second table; And carrying out query processing on a first table in the working equipment according to the first query task to obtain and feed back a correlation value to the scheduling equipment, wherein the correlation value is used for generating a second query task, and the second query task is used for acquiring and outputting the data to be queried. According to a third aspect of the present disclosure, there is provided a data query device based on a distributed database system, including: The data query system comprises a first sending unit, a first working device, a second working device and a first sending unit, wherein the first sending unit is used for responding to a received data query request and sending a first query task to a first working device storing a first table, the data query request is used for obtaining data to be queried, the first query task is used for indicating and querying a correlation value corresponding to the data query request in the first table, and a data correlation relationship exists between the first table and the second table; The first receiving unit is used for receiving the association value fed back by the first working equipment; The second sending unit is used for sending a second query task to second working equipment storing the second table according to the association value, wherein the second query task is used for indicating and querying feedback data corresponding to the association value in the second table; the second receiving unit is used for receiving feedback data sent by the second working equipment; The determining unit is used for determining the data to be queried according to the feedback data; and the output unit is used for outputting the data to be queried. According to a fourth aspect of the present disclosure, there is provided a data query apparatus for a distributed dat