Search

CN-120872406-B - Method and device for analyzing code file and nonvolatile storage medium

CN120872406BCN 120872406 BCN120872406 BCN 120872406BCN-120872406-B

Abstract

The application discloses a method and a device for analyzing a code file and a nonvolatile storage medium. The method comprises the steps of receiving a code file to be identified, traversing the code file, identifying an effective statement set in the code file, wherein the effective statement set comprises statements except code annotation statements in the code file, identifying identifier segments of each statement in the effective statement set, determining the identifier segments according to a plurality of characters defined in advance, determining target identifier segments meeting preset function judging rules from the identifier segments, determining the target identifier segments as called target functions, and determining the positions of the target identifier segments in the code file as the positions of the target functions. The method solves the technical problems of low function positioning efficiency and low accuracy caused by only relying on manual retrieval when extracting the specific positions of the functions in the codes in the related technology.

Inventors

  • ZHANG MIN
  • LU YIXIAO

Assignees

  • 北京大学

Dates

Publication Date
20260508
Application Date
20250709

Claims (10)

  1. 1. A method for analyzing a code file, comprising: Receiving a code file to be identified; Traversing the code file, and identifying an effective statement set in the code file, wherein the effective statement set comprises statements except code annotation statements in the code file; identifying an identifier segment for each sentence in the set of valid sentences, wherein the identifier segment is determined from a predefined plurality of characters; Determining a target identifier section meeting a preset function judging rule from the identifier sections, determining the target identifier section as a called target function, and determining the position of the target identifier section in the code file as the position of the target function; The determining the target identifier segment meeting the preset function judging rule from the identifier segments comprises verifying a first type identifier segment by a first mode to obtain the target identifier segment, wherein the first type identifier Fu Duanwei meets the identifier segment required by the preset code environment in the preset function judging rule, the first mode is a preset function identification mode corresponding to the preset code environment requirement, and verifying a second type identifier segment by a second mode to obtain the target identifier segment, wherein the second type identifier Fu Duan is an identifier segment outside the first type identifier segment, and the second mode is a function identification mode predefined in the preset function judging rule and aiming at the second type identifier segment; Verifying first-type identifier segments in a first mode to obtain target identifier segments, wherein each first-type identifier segment takes a first preset keyword as a starting point, and a character area taking a second preset keyword as an ending point is determined as a scope, and the first preset keyword is used for defining a function, and the second preset keyword is used for representing the ending of the function; for each scope, removing characters matched with character names in a preset variable declaration list to obtain an initial target identifier section, wherein each scope corresponds to one preset variable declaration list; determining an initial target identifier segment that does not belong to a function call context as the target identifier segment; The verifying the second type identifier section in the second mode to obtain the target identifier section includes determining that the second type identifier Fu Duan is the target identifier section when the conditions that the second type identifier section does not start with a number, the second type identifier section does not belong to a preset global variable or a preset functional character stated by the code file, and the second type identifier section does not belong to a preset variable stated in a statement corresponding to the second type identifier section are met.
  2. 2. The method according to claim 1, wherein the method further comprises: Forming an objective function set by all objective functions determined according to all identifier segments in the valid statement set; Determining function call relations among all target functions in a target function set according to a preset script file, wherein the preset script file is used for indicating all function names contained in the code file; and constructing a function call tree based on the function call relation and the target function set, wherein the function call tree is used for indicating call paths and hierarchical relations among the target functions.
  3. 3. The method of claim 1, wherein the step of determining the position of the substrate comprises, The code file to be identified comprises a class definition file, and the method further comprises: identifying a first field and a second field belonging to a first preset format from an identifier section of the class definition file, and identifying a preset symbol from the class definition file; identifying target fields conforming to a second preset format from the class definition file, wherein the second preset format comprises the first fields, the preset symbols and the second fields which are sequentially arranged in the following order; in the case that the target field is determined to be non-variable declaration and the first field is a declared variable, determining a first function defining the first field, and determining a class of the first function; And when the class of the first function is the same as the name of any one of the defined classes, determining the defined classes with the same name as a target class, and determining the first field as an object of the target class, wherein the target field is a called field.
  4. 4. The method of claim 1, wherein the step of determining the position of the substrate comprises, The code file to be identified comprises a class definition file, and the method further comprises: identifying a first field and a second field belonging to a first preset format from an identifier section of the class definition file, and identifying a preset symbol from the class definition file; identifying target fields conforming to a second preset format from the class definition file, wherein the second preset format comprises the first fields, the preset symbols and the second fields which are sequentially arranged in the following order; And under the condition that the target field is determined to be a variable declaration, determining that the target field is not a called field, and storing the target field as a variable.
  5. 5. The method of claim 1, wherein the step of determining the position of the substrate comprises, The code file to be identified comprises a class definition file, and the method further comprises: identifying a first field and a second field belonging to a first preset format from an identifier section of the class definition file, and identifying a preset symbol from the class definition file; identifying target fields conforming to a second preset format from the class definition file, wherein the second preset format comprises the first fields, the preset symbols and the second fields which are sequentially arranged in the following order; And under the condition that the target field is not a variable declaration and the first field is not a declaration variable, determining the first field as a class name, wherein the target field is a field called according to the class name.
  6. 6. The method of claim 1, wherein the step of determining the position of the substrate comprises, The code file to be identified comprises a class definition file, and the method further comprises: identifying a first field and a second field belonging to a first preset format from an identifier section of the class definition file, and identifying a preset symbol from the class definition file; Identifying target fields conforming to a second preset format from the class definition file, wherein the second preset format comprises the first fields, preset symbols and the second fields which are sequentially arranged in the following order; And after the target field non-variable declaration is determined, the first field is a declared variable, a first function defining the first field cannot be determined or the first function defining the first field is determined, and the class of the first function is different from the name of any defined class, determining that the first field is an undefined class or a variable which is directly input.
  7. 7. An apparatus for analyzing a code file, comprising: the receiving module is used for receiving the code file to be identified; The first identification module is used for traversing the code file and identifying an effective statement set in the code file, wherein the effective statement set comprises statements except code annotation statements in the code file; A second recognition module for recognizing an identifier segment of each sentence in the set of valid sentences, wherein the identifier segment is determined according to a plurality of predefined characters; The determining module is used for determining a target identifier section meeting a preset function judging rule from the identifier sections, determining the target identifier section as a called target function, and determining the position of the target identifier section in the code file as the position of the target function, wherein the determining of the target identifier section meeting the preset function judging rule from the identifier sections comprises the steps of verifying a first type identifier section by a first mode to obtain the target identifier section, wherein the first type identifier Fu Duanwei meets the identifier section meeting the preset code environment requirement in the preset function judging rule, and verifying a second type identifier section by a second mode to obtain the target identifier section, wherein the second type identifier Fu Duan is an identifier section outside the first type identifier section and is a function identification mode predefined in the preset function judging rule and aiming at the second type identifier section; Verifying first-type identifier segments in a first mode to obtain target identifier segments, wherein each first-type identifier segment takes a first preset keyword as a starting point, and a character area taking a second preset keyword as an ending point is determined as a scope, and the first preset keyword is used for defining a function, and the second preset keyword is used for representing the ending of the function; for each scope, removing characters matched with character names in a preset variable declaration list to obtain an initial target identifier section, wherein each scope corresponds to one preset variable declaration list; determining an initial target identifier segment that does not belong to a function call context as the target identifier segment; The verifying the second type identifier section in the second mode to obtain the target identifier section includes determining that the second type identifier Fu Duan is the target identifier section when the conditions that the second type identifier section does not start with a number, the second type identifier section does not belong to a preset global variable or a preset functional character stated by the code file, and the second type identifier section does not belong to a preset variable stated in a statement corresponding to the second type identifier section are met.
  8. 8. A non-volatile storage medium, wherein a program is stored in the non-volatile storage medium, wherein the program, when run, controls a device in which the non-volatile storage medium is located to perform the method of analyzing a code file according to any one of claims 1 to 6.
  9. 9. An electronic device comprising a memory and a processor for executing a program stored in the memory, wherein the program is run to perform the method of analyzing a code file according to any of claims 1 to 6.
  10. 10. A computer program product comprising computer instructions which, when executed by a processor, implement a method of analysing a code file according to any one of claims 1 to 6.

Description

Method and device for analyzing code file and nonvolatile storage medium Technical Field The present application relates to the field of computers, and in particular, to a method and apparatus for analyzing a code file, and a nonvolatile storage medium. Background With the continuous development of software technology, the scale and complexity of software also tend to increase, and especially with the rising of software open source motion, a large amount of codes are shared through an open source platform, so that the number of software developers is greatly increased. However, due to the dramatic increase in the amount of code, maintenance of the code becomes more and more difficult, especially when software errors are encountered, it is not easy to locate the function where the error occurred. Therefore, how to efficiently locate functions in code is currently a problem to be solved. The function positioning is usually based on the grammar analysis of the source code file, the source code is traversed, so that the appearance position of the function is determined, when the function is positioned in the related technology, the whole code file is needed to be manually browsed for grammar analysis, the user needs to manually input the function name to be positioned to search the function, the structure analysis and extraction of the complex function or the long function cannot be realized, and the efficiency and the accuracy are low when the function structure is analyzed and extracted. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a method and a device for analyzing a code file and a nonvolatile storage medium, which at least solve the technical problems of low function positioning efficiency and low accuracy caused by manual retrieval only when specific positions of functions in codes are extracted in the related technology. According to one aspect of the embodiment of the application, a method for analyzing a code file is provided, which comprises the steps of receiving the code file to be identified, traversing the code file, identifying a valid statement set in the code file, wherein the valid statement set comprises statements outside a code annotation statement in the code file, identifying an identifier section of each statement in the valid statement set, wherein the identifier section is determined according to a plurality of characters which are defined in advance, determining a target identifier section which meets a preset function judging rule from the identifier sections, determining the target identifier section as a called target function, and determining the position of the target identifier section in the code file as the position of the target function. According to some embodiments of the application, determining the target identifier segment meeting the preset function judging rule from the identifier segments comprises verifying the first type identifier segment in a first mode to obtain the target identifier segment, wherein the first type identifier Fu Duanwei meets the identifier segment meeting the preset code environment requirement in the preset function judging rule, and the first mode is a preset function identification mode corresponding to the preset code environment requirement. According to some embodiments of the application, determining the target identifier segment satisfying the preset function determination rule from the identifier segments includes verifying the second type identifier segment by a second manner to obtain the target identifier segment, wherein the second type identifier Fu Duanwei is an identifier segment other than the first type identifier segment, and the second manner is a function identification manner predefined in the preset function determination rule and specific to the second type identifier segment. According to some embodiments of the application, the first type identifier segments are verified in a first mode to obtain target identifier segments, wherein the first type identifier segments comprise a character area taking a first preset keyword as a starting point and taking a second preset keyword as an ending point, and the character area is determined to be a scope, wherein the first preset keyword is used for defining a function, the second preset keyword is used for representing the ending of the function, characters matched with character names in a preset variable declaration list are removed for each scope, initial target identifier segments are obtained, each scope corresponds to one preset variable declaration list, and initial target identifier segments which do not belong to the context of function calling are determined to be the target identifier segments. According to some embodiments of the application, the second type identifier section is verified in a second mode to obtain the target identifier section, wherein the seco