Search

CN-119759729-B - Automatic detection method, system, equipment and storage medium for data development code

CN119759729BCN 119759729 BCN119759729 BCN 119759729BCN-119759729-B

Abstract

The application provides an automatic detection method, system, equipment and storage medium of a data development code, wherein the method comprises the steps of obtaining the data development code to be detected, judging whether the data development code to be detected is consistent with a code examination rule, if not, modifying the data development code to be detected, if so, establishing a data blood-edge relation based on the data development code to be detected, carrying out quality check on the data development code to be detected, if not, locating and modifying an error code based on the data blood-edge relation, and if the quality check is passed, the data development code to be detected is passed. The application can rapidly and automatically detect the data development code, improves the detection efficiency of error codes in the data development code, and can check the blood relationship between the fields and the indexes at the service level, thereby rapidly carrying out service quality check on the data development code.

Inventors

  • Xing Longqiang
  • ZOU GUOSHENG
  • WANG QI
  • DONG ZIYU
  • CHEN LIANG
  • LI BO
  • CHEN PENG

Assignees

  • 国药控股数字科技(上海)有限公司

Dates

Publication Date
20260508
Application Date
20241129

Claims (6)

  1. 1. An automatic detection method of data development codes, characterized in that the method comprises the following steps: Acquiring a data development code to be detected; Judging whether the data development code to be detected is consistent with a code examination rule, if not, modifying the data development code to be detected, and if so, establishing a data blood relationship based on the data development code to be detected; And performing quality check on the data development code to be detected, if the quality check is not passed, positioning and modifying an error code based on the data blood-edge relation, and if the quality check is passed, the data development code to be detected is passed, wherein the establishing of the data blood-edge relation based on the data development code to be detected comprises the following steps: Determining a task type corresponding to the data development code to be detected; performing structural analysis on the data development code to be detected based on the task type to obtain a plurality of SQL sentences of different categories; establishing a field-level data blood-edge relation based on a plurality of SQL sentences of different categories, wherein the method comprises the steps of establishing a first corresponding relation and a second corresponding relation based on the SQL sentences of different categories, wherein the first corresponding relation is a corresponding relation between a source-end table and a target table in the SQL sentences, the second corresponding relation is a corresponding relation between a source-end field and a target field in the SQL sentences, and establishing the field-level data blood-edge relation based on the first corresponding relation and the second corresponding relation, and the step of acquiring the SQL sentences of different categories comprises the steps of: When the task type is determined to be a lake entering type or an ETL type, SPARK SQL code analysis is carried out on the data development code to be detected, and a plurality of SQL sentences are obtained; classifying the SQL sentences based on the grammar structures and the semantic information to obtain a plurality of SQL sentences of different categories; When the task type is FineBI, carrying out SQL code analysis on the data development code to be detected to obtain a plurality of SQL sentences, obtaining grammar structures, semantic information and using functions in the SQL sentences, and classifying the SQL sentences based on the grammar structures, the semantic information and the using functions to obtain a plurality of SQL sentences of different categories.
  2. 2. The automatic detection method of data development code according to claim 1, wherein the method comprises: Acquiring a target type of specification statement based on a data development specification, wherein the target type of specification statement corresponds to the type of the data development code to be detected; Extracting a conditional variable, a logic variable and a grammar structure in the standard statement of the target type, and generating the code examination rule corresponding to the type of the data development code to be detected.
  3. 3. The automatic detection method of a data development code according to claim 1, wherein performing quality check on the data development code to be detected comprises: Determining a quality check rule based on development requirements; Acquiring an operation result of the data development code to be detected; And judging whether the operation result accords with the quality check rule, if not, positioning and modifying an error code based on the data blood-edge relation, and if so, passing the quality check.
  4. 4. An automatic detection system for data development code, the system comprising: the acquisition module is used for acquiring the data development code to be detected; The code examination module is used for judging whether the data development code to be detected is consistent with a code examination rule, if not, modifying the data development code to be detected, and if so, establishing a data blood-edge relationship based on the data development code to be detected; the quality check module is used for carrying out quality check on the data development code to be detected, if the quality check is not passed, the error code is positioned and modified based on the data blood-edge relation, and if the quality check is passed, the data development code to be detected is passed, wherein the establishing of the data blood-edge relation based on the data development code to be detected comprises the following steps: Determining a task type corresponding to the data development code to be detected; performing structural analysis on the data development code to be detected based on the task type to obtain a plurality of SQL sentences of different categories; establishing a field-level data blood-edge relation based on a plurality of SQL sentences of different categories, wherein the method comprises the steps of establishing a first corresponding relation and a second corresponding relation based on the SQL sentences of different categories, wherein the first corresponding relation is a corresponding relation between a source-end table and a target table in the SQL sentences, the second corresponding relation is a corresponding relation between a source-end field and a target field in the SQL sentences, and establishing the field-level data blood-edge relation based on the first corresponding relation and the second corresponding relation, and the step of acquiring the SQL sentences of different categories comprises the steps of: When the task type is determined to be a lake entering type or an ETL type, SPARK SQL code analysis is carried out on the data development code to be detected, and a plurality of SQL sentences are obtained; classifying the SQL sentences based on the grammar structures and the semantic information to obtain a plurality of SQL sentences of different categories; When the task type is FineBI, carrying out SQL code analysis on the data development code to be detected to obtain a plurality of SQL sentences, obtaining grammar structures, semantic information and using functions in the SQL sentences, and classifying the SQL sentences based on the grammar structures, the semantic information and the using functions to obtain a plurality of SQL sentences of different categories.
  5. 5. An electronic device is characterized by comprising a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program stored in the memory, so that the electronic device executes the automatic detection method of the data development code according to any one of claims 1 to 3.
  6. 6. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by an electronic device, implements the automatic detection method of data development code according to any one of claims 1-3.

Description

Automatic detection method, system, equipment and storage medium for data development code Technical Field The application belongs to the technical field of big data, and particularly relates to an automatic detection method, system, equipment and storage medium for data development codes. Background With the development of big data technology, data development has become an important component of enterprise data processing. There are some methods of automated code inspection that do not require human involvement at all in the overall code inspection process, such as some automated code inspection tools SonarQube, checkstyle on the market, which are mainly used for code quality analysis in general-purpose programming languages. Existing automated code review tools are not typically specific to the data development scenario and therefore may not meet specific business scenario requirements. For example, for the specific requirements in the development of data in medical big data, additional human code quality review based on business scenarios is required, which is not only time-consuming and laborious, but also may be subject to human error. Disclosure of Invention The application aims to provide an automatic detection method, an automatic detection system, automatic detection equipment and an automatic detection storage medium for a data development code, which can automatically locate an error part of the data development code, can check the blood-margin relation between a field and an index at a service level, and can rapidly verify the service quality of the data development code. In a first aspect, the present application provides a method for automatically detecting a data development code, the method comprising: Acquiring a data development code to be detected; Judging whether the data development code to be detected is consistent with a code examination rule, if not, modifying the data development code to be detected, and if so, establishing a data blood relationship based on the data development code to be detected; And carrying out quality check on the data development code to be detected, if the quality check is not passed, positioning and modifying an error code based on the data blood-edge relationship, and if the quality check is passed, the data development code to be detected is passed. In one implementation manner of the first aspect, a target type of specification statement is acquired based on a data development specification, wherein the target type of specification statement corresponds to the type of the data development code to be detected; Extracting a conditional variable, a logic variable and a grammar structure in the standard statement of the target type, and generating the code examination rule corresponding to the type of the data development code to be detected. In an implementation manner of the first aspect, establishing a data blood-lineage relationship based on the data development code to be detected includes: Determining a task type corresponding to the data development code to be detected; performing structural analysis on the data development code to be detected based on the task type to obtain a plurality of SQL sentences of different categories; A field-level data blood relationship is established based on a plurality of SQL statements of different categories. In one implementation manner of the first aspect, obtaining the plurality of SQL statements of different categories includes: when the task type is determined to be a lake entering type or an ETL type, SPARK SQL code analysis is carried out on the data development code to be detected, and a plurality of SQL sentences are obtained; acquiring grammar structures and semantic information of a plurality of SQL sentences; classifying the SQL sentences based on the grammar structure and the semantic information to obtain the SQL sentences of different categories. In one implementation manner of the first aspect, obtaining the plurality of SQL statements of different categories includes: when the task type is FineBI, carrying out SQL code analysis on the data development code to be detected to obtain a plurality of SQL sentences; Acquiring grammar structures, semantic information and using functions in the SQL sentences of a plurality of SQL sentences; Classifying a plurality of SQL sentences based on the grammar structure, the semantic information and the usage function to obtain a plurality of SQL sentences of different categories. In one implementation manner of the first aspect, a field-level data blood-edge relationship is established based on a plurality of the SQL statements of different categories: Establishing a first corresponding relation and a second corresponding relation based on a plurality of SQL sentences of different categories, wherein the first corresponding relation is the corresponding relation between a source end table and a target table in the SQL sentences, and the second corresponding relatio