CN-122018967-A - Data analysis method, device, computer equipment and storage medium

CN122018967ACN 122018967 ACN122018967 ACN 122018967ACN-122018967-A

Abstract

The embodiment of the application discloses a data analysis method, a data analysis device, computer equipment and a storage medium. The method comprises the steps of obtaining corresponding header files from kernel source code files of each version, obtaining a plurality of lines of data to be analyzed in the header files, obtaining annotation variables of the data to be analyzed for data annotation identification and structure variables of the data structure identification, analyzing the plurality of lines of data to be analyzed line by line according to the annotation variables and the structure variables to obtain a plurality of groups of structure data under the kernel source code files of each version and structure data names of the structure data of each group, associating each group of structure data under the kernel source code files of each version, the corresponding structure data names and the version names of the kernel source code files of the corresponding version to form a second mapping relation, and finding out the corresponding plurality of groups of target data to be analyzed under the second mapping relation according to the target version names of the structure data to be analyzed and the target structure data names. The difference analysis efficiency between the data is improved.

Inventors

ZHOU FEIZHOU
TU KANG

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260512
Application Date: 20241108

Claims (14)

1. A method of data analysis, comprising: acquiring kernel source code files of multiple versions, and acquiring corresponding header files from the kernel source code files of each version; Acquiring a plurality of lines of data to be analyzed in the header file, and acquiring annotation variables of each line of data to be analyzed for data annotation identification and structure variables for data structure identification; analyzing the plurality of lines of data to be analyzed line by line according to the annotation variable and the structure variable to obtain a plurality of groups of structure data under each version of kernel source code file and the structure data name of each group of structure data; Associating each group of structure data under each version of kernel source code file with a corresponding structure data name to form a first mapping relation, and associating the first mapping relation with the version name of the corresponding version of kernel source code file to form a second mapping relation; And acquiring a target version name and a target structure data name of the structure data to be analyzed, and searching corresponding multiple groups of target data to be analyzed under the second mapping relation according to the target version name and the target structure data name for analysis.
2. The method of claim 1, wherein the obtaining a plurality of rows of data to be parsed in the header file includes: Matching target characters in the data of the header file according to a first regular expression; And carrying out character replacement on the target character according to a preset character to obtain multi-row analysis data.
3. The method of claim 1, wherein the parsing the plurality of rows of data to be parsed row by row according to the annotation variable and the structure variable to obtain a plurality of groups of structure data under each version of kernel source code file and a structure data name of each group of structure data, includes: determining a first state value of an annotation variable of the current line of data to be analyzed in the plurality of lines of data to be analyzed; When the first state value of the annotation variable is a first preset value, filtering unstructured data in the current line of data to be analyzed to obtain preprocessed data to be analyzed; Analyzing the preprocessed data to be analyzed according to the structure variables to obtain structure data and structure data names corresponding to the structure data, grouping the structure data, determining the next line of data to be analyzed as the current line of data to be analyzed until each line of data to be analyzed is traversed, and obtaining multiple groups of structure data under each version of kernel source code file and the structure data names of each group of structure data; And when the first state value of the annotation variable is a second preset value and the current line to-be-analyzed data does not contain the annotation ending symbol, skipping over the current line to-be-analyzed data, determining the next line to-be-analyzed data as the current line to-be-analyzed data until each line to-be-analyzed data is traversed, and obtaining a plurality of groups of structure data under each version of kernel source code file and the structure data name of each group of structure data.
4. A data analysis method according to claim 3, further comprising, after said determining the first state value of the annotation variable of the current line of data to be parsed: And when the first state value of the annotation variable is a second preset value and the current line to-be-analyzed data contains the annotation ending symbol, updating the first state value to the first preset value, and executing the step of filtering the unstructured data in the current line to-be-analyzed data.
5. The data analysis method according to claim 3, wherein the analyzing the data to be analyzed according to the structure variable to obtain and group the structure data and the structure data name corresponding to the structure data includes: Determining a second state value of a structural variable of the data to be analyzed in preprocessing; When the second state value is zero and the data to be analyzed in preprocessing is matched with a second regular expression corresponding to a preset data structure, determining the data to be analyzed in preprocessing as structural data, acquiring a structural data name of the structural data, and grouping the structural data according to the structural data name; And performing value-added updating on the second state value to obtain an updated second state value, and determining the updated second state value as the second state value of the structural variable of the data to be analyzed of the next row.
6. The method for analyzing data according to claim 5, wherein analyzing the data to be analyzed according to the structure variable to obtain and group the structure data and the structure data name corresponding to the structure data, comprises: When the second state value is larger than zero and the preprocessed data to be analyzed is matched with a second regular expression corresponding to a preset data structure, determining the preprocessed data to be analyzed as structural data, determining a structural data name corresponding to the last row of data to be analyzed as the structural data name of the structural data, and grouping the structural data according to the structural data name; when the preprocessing data to be analyzed contains a start locator, value-added updating is carried out on the second state value to obtain an updated second state value, and the updated second state value is determined to be the second state value of the structural variable of the data to be analyzed in the next row; and when the preprocessing data to be analyzed contains an end locator and does not contain the start locator, performing value reduction update on the second state value to obtain an updated second state value, and determining the updated second state value as the second state value of the structural variable of the data to be analyzed in the next row.
7. The data analysis method according to claim 3, wherein before filtering the unstructured data in the current line of data to be analyzed to obtain the preprocessed data to be analyzed, the method further comprises: when macro definition characters exist in the current line to-be-resolved data, discarding the current line to-be-resolved data, and determining the next line to-be-resolved data as the current line to-be-resolved data; The filtering processing is performed on the unstructured data in the current line of data to be analyzed to obtain preprocessed data to be analyzed, and the method comprises the following steps: When the macro definition character does not exist in the current line to-be-analyzed data, an annotation identifier corresponding to the current line to-be-analyzed data is obtained; Deleting the annotation data corresponding to the current line of data to be analyzed according to the annotation identifier, and determining the data after deleting the annotation data as the data to be analyzed in a preprocessing mode.
8. The method of claim 7, wherein deleting the annotation data corresponding to the current line of data to be parsed according to the annotation identifier comprises: When the annotation identifier is the first type of annotation identifier, deleting the annotation identifier and annotation data after the annotation identifier in the current line of data to be parsed; when the annotation identifier is a second type of annotation identifier and the annotation identifier comprises an annotation start and an annotation end, deleting the annotation start, the annotation end and annotation data between the annotation start and the annotation end in the current line of data to be parsed.
9. The method of claim 8, wherein deleting annotation data corresponding to the current line of data to be parsed according to the annotation identifier comprises: Deleting the annotation start identifier and annotation data after the annotation start identifier in the current line of data to be parsed when the annotation identifier is a second type of annotation identifier and the annotation identifier contains the annotation start identifier and does not contain the annotation end identifier; and updating the first state value to the second preset value to obtain an updated first state value, and determining the updated first state value as the first state value of the annotation variable of the data to be analyzed in the next row.
10. The data analysis method according to claim 1, wherein the searching for the corresponding multiple sets of target data to be analyzed according to the target version name and the target structure data name in the second mapping relationship for analysis includes: searching a plurality of groups of target data to be analyzed stored in a database under the second mapping relation according to the target version name and the target structure data name; deleting space characters in each group of target data to be analyzed to generate preprocessing data to be analyzed, and determining a check value of each group of preprocessing data to be analyzed; Generating a data analysis table according to the target data to be analyzed, the target version name and the target structure data name, and generating a corresponding color mark in the data analysis table according to the similarity between the check values corresponding to each group of target data to be analyzed; And analyzing the corresponding target data to be analyzed according to the color mark and the data analysis table.
11. A data analysis device, comprising: The first acquisition module is used for acquiring kernel source code files of multiple versions and acquiring corresponding header files from the kernel source code files of each version; The second acquisition module is used for acquiring a plurality of lines of data to be analyzed in the header file, and acquiring annotation variables of each line of data to be analyzed for data annotation identification and structure variables for data structure identification; the analysis module is used for analyzing the plurality of lines of data to be analyzed line by line according to the annotation variable and the structure variable to obtain a plurality of groups of structure data under each version of kernel source code file and the structure data name of each group of structure data; The association module is used for associating each group of structure data under each version of kernel source code file with a corresponding structure data name to form a first mapping relation, and associating the first mapping relation with the version name of the corresponding version of kernel source code file to form a second mapping relation; the analysis module is used for acquiring a target version name and a target structure data name of the structure data to be analyzed, and searching corresponding multiple groups of target data to be analyzed under the second mapping relation according to the target version name and the target structure data name for analysis.
12. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the data analysis method of any one of claims 1 to 10.
13. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the data analysis method according to any of claims 1 to 10 when executing the computer program.
14. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the data analysis method of any one of claims 1 to 10.

Description

Data analysis method, device, computer equipment and storage medium Technical Field The present application relates to the field of computer technologies, and in particular, to a data analysis method, a data analysis device, a computer device, and a storage medium. Background With the development of computer technology, many open-source operating systems, such as Linux operating systems, have emerged, which implement update iterations on the operating system by continuously updating kernel source codes of the operating system. Some structural data are contained in kernel source codes of an operating system, and the structural data are constructed through related data structures and used for realizing various functions such as process description, address description, file description and the like, and the structural data in the kernel source codes of different versions are different. In order to realize software development or system development of an operating system, differences between structural data in kernel source codes of different versions need to be analyzed, and then subsequent software development or system development is guided based on the differences. In the related art, a history change record of the structural data is searched by the distributed version control system, but only the difference between files of different versions can be known by the history change record, but the difference between the structural data under the source codes of the cores of different versions cannot be determined, and the difference between the structural data is determined in a manual mode later, so that the difference analysis efficiency between the structural data is low. Disclosure of Invention The embodiment of the application provides a data analysis method, a data analysis device, computer equipment and a storage medium, which can improve the difference analysis efficiency between the structural data under different versions of kernel source codes. In order to achieve the above object, an aspect of an embodiment of the present application provides a data analysis method, including: acquiring kernel source code files of multiple versions, and acquiring corresponding header files from the kernel source code files of each version; Acquiring a plurality of lines of data to be analyzed in the header file, and acquiring annotation variables of each line of data to be analyzed for data annotation identification and structure variables for data structure identification; analyzing the plurality of lines of data to be analyzed line by line according to the annotation variable and the structure variable to obtain a plurality of groups of structure data under each version of kernel source code file and the structure data name of each group of structure data; Associating each group of structure data under each version of kernel source code file with a corresponding structure data name to form a first mapping relation, and associating the first mapping relation with the version name of the corresponding version of kernel source code file to form a second mapping relation; And acquiring a target version name and a target structure data name of the structure data to be analyzed, and searching corresponding multiple groups of target data to be analyzed under the second mapping relation according to the target version name and the target structure data name for analysis. In order to achieve the above object, an aspect of an embodiment of the present application provides a data analysis apparatus, including: The first acquisition module is used for acquiring kernel source code files of multiple versions and acquiring corresponding header files from the kernel source code files of each version; The second acquisition module is used for acquiring a plurality of lines of data to be analyzed in the header file, and acquiring annotation variables of each line of data to be analyzed for data annotation identification and structure variables for data structure identification; the analysis module is used for analyzing the plurality of lines of data to be analyzed line by line according to the annotation variable and the structure variable to obtain a plurality of groups of structure data under each version of kernel source code file and the structure data name of each group of structure data; The association module is used for associating each group of structure data under each version of kernel source code file with a corresponding structure data name to form a first mapping relation, and associating the first mapping relation with the version name of the corresponding version of kernel source code file to form a second mapping relation; the analysis module is used for acquiring a target version name and a target structure data name of the structure data to be analyzed, and searching corresponding multiple groups of target data to be analyzed under the second mapping relation according to the target version name and the target s