Search

CN-121979875-A - Data cleaning method, device, equipment, storage medium and program product

CN121979875ACN 121979875 ACN121979875 ACN 121979875ACN-121979875-A

Abstract

The application provides a data cleaning method, a device, equipment, a storage medium and a program product, which can be applied to the technical fields of big data and financial science and technology. The method comprises the steps of responding to a plurality of candidate database requests, screening the candidate database requests to obtain a target database request matched with batch operation characteristics, wherein the batch operation characteristics indicate operation conditions of the database requests to be screened, screening the candidate temporary tables based on the target database requests and a global view to obtain a target temporary table, and executing cleaning operation on the target temporary table in response to a completion instruction of the target database request, wherein the global view indicates the dependency relationship of the candidate temporary table.

Inventors

  • ZHU JIANG

Assignees

  • 中国工商银行股份有限公司

Dates

Publication Date
20260505
Application Date
20260123

Claims (11)

  1. 1. A method of data cleansing, the method comprising: In response to receiving a plurality of candidate database requests, screening among the plurality of candidate database requests to obtain a target database request matched with batch job features, wherein the batch job features indicate the running conditions of the database requests to be screened; Screening among a plurality of candidate temporary tables based on the target database request and a global view to obtain a target temporary table, wherein the global view indicates the dependency relationship of the candidate temporary tables; and in response to receiving a completion instruction of the target database request, executing a cleaning operation on the target temporary table.
  2. 2. The method according to claim 1, wherein the method further comprises: responding to a transaction state triggering instruction indicated by a query request, extracting a calculation result in a temporary table matched with the query request, and storing the calculation result to an application layer; and in response to the application layer receiving the calculation result, executing a clear table command on the temporary table, and executing a rollback operation on the transaction indicated by the query request.
  3. 3. The method of claim 1, wherein screening among the plurality of candidate database requests for a target database request matching a batch job feature comprises: Extracting keywords from the plurality of candidate database requests to obtain a plurality of candidate job features, wherein the candidate job features comprise at least one of sources, application identifiers, duration and database modes; screening among the plurality of candidate job features based on the operating conditions to obtain target job features, wherein the operating conditions comprise at least one of source matching conditions, application identification matching conditions, duration matching conditions and database pattern matching conditions; and taking the candidate database request corresponding to the target job feature as a target database request.
  4. 4. The method of claim 3, wherein keyword extraction of the plurality of candidate database requests to obtain a plurality of candidate job features comprises: based on a preset service dictionary, performing word segmentation processing on the candidate database requests respectively to obtain a plurality of groups of vocabulary entries, wherein the service dictionary indicates word segmentation references in the database field; Screening the plurality of groups of vocabulary entries based on a plurality of preset recognition strategies to obtain a plurality of candidate operation features, wherein the plurality of recognition strategies respectively have matched feature types, and the plurality of recognition strategies indicate a plurality of judging conditions corresponding to the plurality of feature types.
  5. 5. The method of claim 1, wherein screening in a global view based on the target database request to obtain a target temporary table comprises: Screening in the global view based on the request identifier indicated by the target database request to obtain a plurality of candidate temporary tables, wherein the global view comprises a plurality of temporary tables, and each temporary table has a matched request identifier; and searching in the temporary tables based on the dependency relationship of the candidate temporary tables to obtain the target temporary table.
  6. 6. The method of claim 5, wherein the global view is obtained using: Generating a global view by taking the temporary tables as nodes, taking temporary table basic information as node attributes and taking the dependency relationship among the temporary tables as edges; wherein the temporary table base information includes at least one of a temporary table creation time, a request identification, a table name, and a table type, and the dependency relationship includes at least one of a direct dependency, an indirect dependency, and a structural dependency.
  7. 7. The method of claim 1, wherein the cleaning operation comprises cleaning and analyzing, complete cleaning, standard cleaning, automatic cleaning, or specified object cleaning.
  8. 8. A data cleaning device, the device comprising: the matching module is used for responding to the received multiple candidate database requests, screening the multiple candidate database requests to obtain a target database request matched with batch operation characteristics, wherein the batch operation characteristics indicate the operation conditions of the database requests to be screened; a screening module for screening among the candidate temporary tables based on the target database request and a global view indicating the dependency relationship among the candidate temporary tables, and And the cleaning module is used for responding to the completion instruction of the received target database request and executing cleaning operation on the target temporary table.
  9. 9. An electronic device, comprising: One or more processors; a memory for storing one or more computer programs, Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-7.
  10. 10. A computer-readable storage medium, on which a computer program or instructions is stored, which, when executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
  11. 11. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 7.

Description

Data cleaning method, device, equipment, storage medium and program product Technical Field The present application relates to the field of big data, financial science and technology, and more particularly, to a data cleaning method, apparatus, device, storage medium, and program product. Background In databases, with frequent execution of data operations, a large number of dead tuples marked as deleted or dead but still occupying memory space are generated, and the presence of these dead tuples can lead to a constant expansion of the table space, increase the overhead of data scanning, and reduce query performance. In the related art, an automatic cleaning mechanism is introduced, a background thread is utilized to periodically scan a database table, the number of dead tuples is detected, and cleaning operation is automatically executed when a preset threshold condition is met, so that the space occupied by the dead tuples is recovered. However, depending on the preset threshold condition to trigger the cleaning operation, the initial data amount of the temporary table is 0, the growth mode is predictable, but the total amount is uncertain, and the conventional automatic cleaning mechanism cannot timely and accurately trigger the cleaning of the dead tuples generated by the temporary table at the end of the use period of the temporary table. Disclosure of Invention In view of the foregoing, embodiments of the present application provide a data cleaning method, apparatus, device, storage medium, and program product. According to a first aspect of the application, a data cleaning method is provided, and the method comprises the steps of responding to a plurality of candidate database requests, screening the candidate database requests to obtain a target database request matched with batch operation characteristics, wherein the batch operation characteristics indicate operation conditions of the database requests to be screened, screening the candidate temporary tables based on the target database requests and a global view to obtain a target temporary table, wherein the global view indicates the dependency relationship of the candidate temporary tables, and responding to a completion instruction of the target database requests to execute cleaning operation on the target temporary table. According to the embodiment of the application, the method further comprises the steps of responding to the transaction state triggering instruction indicated by the query request, extracting the calculation result in the temporary table matched with the query request, storing the calculation result to the application layer, responding to the received calculation result by the application layer, executing a clearing table command on the temporary table, and executing a rollback operation on the transaction indicated by the query request. According to the embodiment of the application, screening among a plurality of candidate database requests to obtain a target database request matched with batch job features comprises the steps of extracting keywords from the plurality of candidate database requests to obtain a plurality of candidate job features, wherein the candidate job features comprise at least one of sources, application identifiers, duration and database modes, screening among the plurality of candidate job features based on operation conditions to obtain the target job features, wherein the operation conditions comprise at least one of source matching conditions, application identifier matching conditions, duration matching conditions and database mode matching conditions, and taking the candidate database request corresponding to the target job features as the target database request. According to the embodiment of the application, keyword extraction is carried out on a plurality of candidate database requests to obtain a plurality of candidate operation features, wherein the method comprises the steps of respectively carrying out word segmentation processing on the plurality of candidate database requests based on a preset service dictionary to obtain a plurality of groups of vocabulary entries, wherein the service dictionary indicates word segmentation references in the database field, respectively screening the plurality of groups of vocabulary entries based on a plurality of preset identification strategies to obtain the plurality of candidate operation features, wherein the plurality of identification strategies respectively have matched feature types, and the plurality of identification strategies indicate a plurality of judgment conditions corresponding to the feature types. According to the embodiment of the application, the target temporary table is obtained by screening in the global view based on the target database request, wherein the screening in the global view based on the request identification indicated by the target database request comprises a plurality of candidate temporary tables, the global v