Search

CN-121979842-A - Code index method, device, equipment and medium

CN121979842ACN 121979842 ACN121979842 ACN 121979842ACN-121979842-A

Abstract

The invention relates to the technical field of data query and discloses a code indexing method, a device, equipment and a medium, which comprise the steps of acquiring a source code file list from a preset project catalog, carrying out grammar analysis on each file and obtaining a result; and creating index information for each file in the list based on the result, and storing the index information into a preset index database. And monitoring the change events of all the source code files in real time, and updating the index information in the index database according to the event types. After analyzing the query condition in the target user query request, searching the matched result in the memory buffer of the updated index database preferentially, if the data meeting the query condition is queried, directly outputting the data to the user, if the data meeting the query condition is not queried, constructing a query sentence according to the query request, searching the matched code fragments by combining the index information of the index database, generating an index result and outputting the index result to the user. The invention improves the efficiency and accuracy of code indexing.

Inventors

  • CAO YUANFENG
  • ZHENG WENZHE
  • YAN WEI

Assignees

  • 招商局金融科技有限公司

Dates

Publication Date
20260505
Application Date
20251230

Claims (10)

  1. 1. A code indexing method, comprising: Acquiring a plurality of source code files according to a preset project catalog to obtain a file list, and carrying out grammar analysis on each source code file in the file list to obtain a grammar analysis result; Creating index information of each source code file in the file list according to the grammar analysis result, and storing each source code file and the index information thereof into a preset index database; Monitoring change events of all source code files in the file list in real time, and updating index information in the index database according to the change events to obtain an updated index database; analyzing query conditions in a query request of a target user, which is acquired in advance; Executing query operation in the updated index database according to the query condition; if the data meeting the query conditions is queried, outputting the query result to the target user; If the data meeting the query conditions is not queried, constructing a query statement according to the query request, searching code fragments matched with the query statement in a source code file of the updated index database according to the index information of the updated index database to obtain an index result, and outputting the index result to the target user.
  2. 2. The code indexing method of claim 1 wherein said parsing each source code file in said file list to obtain a parsed result comprises: Identifying the programming language type of each source code file in the file list, and selecting a parser corresponding to the programming language type from a preset parser library; Dividing the source codes in each source code file into lexical units by using the analyzer to obtain a lexical unit set; constructing a grammar tree according to the lexical unit set; searching grammar nodes conforming to a preset grammar query rule in each source code file according to the grammar tree, and extracting structural information of each grammar node to obtain a code structural information set; And constructing a grammar analysis result of each source code file according to the code structuring information set and the grammar tree.
  3. 3. The code indexing method of claim 2, wherein constructing the syntax analysis result of each source code file from the code structured information set and the syntax tree comprises: converting the grammar tree into an abstract grammar tree according to the code structuring information set and a preset grammar tree conversion rule; extracting code fragments conforming to the grammar rule of the abstract syntax tree in each source code file by using the abstract syntax tree to obtain a code fragment set; Analyzing the dependency relationship among the code segments in the code segment set; And packaging the code structuring information set, the code fragment set and the dependency relationship and outputting the packaged code structuring information set, the code fragment set and the dependency relationship as a grammar analysis result.
  4. 4. The code indexing method of claim 1, wherein creating index information for each source code file in the file list based on the parsing result comprises: performing de-duplication treatment on the code segment set contained in the grammar analysis result to obtain a de-duplicated code segment set; generating a content hash key, a semantic key, a position key, a type key and a language type key for each code segment in the de-duplicated code segment set to obtain a multidimensional index key of each code segment; Converting the dependency relationship contained in the grammar analysis result into a graph structure to obtain a dependency relationship graph; Calculating a relevance score between each code segment in the de-duplicated set of code segments; And converting the multidimensional index key and the correlation score into structured data and outputting the structured data to obtain index information of each source code file in the file list.
  5. 5. The code indexing method of claim 1, wherein updating the index information in the index database according to the change event to obtain an updated index database comprises: Acquiring a content hash value of each source code file in the file list after the change event; According to the unique identifier of each source code file and the changed content hash value, executing inquiry in the index database, and confirming the change type of the change event according to the inquiry result; If the change type is file creation, identifying a newly added file according to the comparison result, generating index information of the newly added file and updating the index information into the index database to obtain an updated index database; If the change type is file modification, identifying a modification file according to the comparison result, regenerating index information of the modification file, and updating the index information into the index database to obtain an updated index database; and if the change type is file deletion, identifying a deleted file according to the comparison result, deleting index information corresponding to the deleted file in the index database, and obtaining an updated index database.
  6. 6. The code indexing method of claim 5 wherein said determining a change type of said change event based on a query result comprises: When the query result is that the source code files with the same unique identifier are not queried in the index database in the file list, judging that the change type is file creation; when the query result is that the source code files with the same unique identifier but the source code files with the same content hash value are not found in the index database in the file list, judging that the change type is file modification; and when the query result is that the index database has one source code file but the unique identifier of the source code file is not stored in the file list, judging that the change type is file deletion.
  7. 7. The code indexing method of claim 1, wherein said parsing the query conditions in the query request of the pre-acquired target user comprises: extracting character strings in the query request to obtain a query character string set; performing format standardization processing on the query strings in the query string set to obtain a standardized string set; Identifying a logic operator, a range definition symbol and a wild card symbol in the standardized character string set according to a preset query grammar to obtain a query operator identification result; Constructing a query abstract syntax tree according to the query operator identification result; searching the query condition in the query abstract syntax tree based on the preset query condition type definition to obtain a query condition set; And packaging and formatting the query condition set to obtain the query condition.
  8. 8. A code indexing apparatus, comprising: The grammar analysis module is used for acquiring a plurality of source code files according to a preset project catalog to obtain a file list, and carrying out grammar analysis on each source code file in the file list to obtain a grammar analysis result; The index creating module is used for creating index information of each source code file in the file list according to the grammar analysis result and storing each source code file and the index information thereof into a preset index database; the index updating module is used for monitoring the change events of all source code files in the file list in real time, and updating the index information in the index database according to the change events to obtain an updated index database; the query analysis module is used for analyzing the query conditions in the query request of the target user acquired in advance; And the user query module is used for executing query operation in the updated index database according to the query conditions, outputting the query result to the target user if the data meeting the query conditions are queried, constructing a query statement according to the query request if the data meeting the query conditions are not queried, searching code fragments matched with the query statement in the source code file of the updated index database according to the index information of the updated index database, obtaining an index result, and outputting the index result to the target user.
  9. 9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the code indexing method of any one of claims 1 to 7 when the computer program is executed by the processor.
  10. 10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the code indexing method of any one of claims 1 to 7.

Description

Code index method, device, equipment and medium Technical Field The present invention relates to the field of data query technologies, and in particular, to a code indexing method, apparatus, device, and medium. Background In the scenes of software development, code multiplexing, project maintenance, open source collaboration and the like, efficient retrieval of source code files is a key link for improving research and development efficiency. Along with the continuous expansion of the software project size, the quantity of source code files is increased, programming languages are diversified (such as JavaScript, python, java, etc.), the dependency relationship among the files is increasingly complex, and a plurality of pain points are gradually exposed in the traditional source code retrieval mode. The traditional retrieval scheme is based on full text keyword matching, only relies on text character string similarity to search, and lacks of deep analysis on a source code grammar structure and semantic information, so that a retrieval result often contains a large number of irrelevant fragments, codes conforming to specific grammar rules (such as function definition and class inheritance) are difficult to accurately locate, meanwhile, the traditional scheme adopts a static indexing mechanism, when a project file is changed such as creation, modification or deletion, index information cannot be synchronously updated in real time, reconstruction needs to be triggered manually, maintenance cost is increased, and retrieval accuracy is influenced due to index hysteresis. In addition, the existing search tool has insufficient support for complex queries, is difficult to analyze multi-condition combined queries comprising logical operators, range definitions and wildcards, does not consider dependency relationships (such as function call and module import) and relevance ordering among code fragments, causes unordered search results and low practicability, and simultaneously, repeatedly stores the code fragments, lacks a standardized deduplication mechanism and further reduces the search efficiency. Disclosure of Invention The invention provides a code indexing method, a code indexing device, computer equipment and a medium, which are used for solving the problem of low efficiency and low accuracy of the existing code indexing method in the current market. In a first aspect, a code indexing method is provided, including: Acquiring a plurality of source code files according to a preset project catalog to obtain a file list, and carrying out grammar analysis on each source code file in the file list to obtain a grammar analysis result; Creating index information of each source code file in the file list according to the grammar analysis result, and storing each source code file and the index information thereof into a preset index database; Monitoring change events of all source code files in the file list in real time, and updating index information in the index database according to the change events to obtain an updated index database; analyzing query conditions in a query request of a target user, which is acquired in advance; Executing query operation in the updated index database according to the query condition; if the data meeting the query conditions is queried, outputting the query result to the target user; If the data meeting the query conditions is not queried, constructing a query statement according to the query request, searching code fragments matched with the query statement in a source code file of the updated index database according to the index information of the updated index database to obtain an index result, and outputting the index result to the target user. In a second aspect, there is provided a code indexing apparatus comprising: The grammar analysis module is used for acquiring a plurality of source code files according to a preset project catalog to obtain a file list, and carrying out grammar analysis on each source code file in the file list to obtain a grammar analysis result; The index creating module is used for creating index information of each source code file in the file list according to the grammar analysis result and storing each source code file and the index information thereof into a preset index database; the index updating module is used for monitoring the change events of all source code files in the file list in real time, and updating the index information in the index database according to the change events to obtain an updated index database; the query analysis module is used for analyzing the query conditions in the query request of the target user acquired in advance; And the user query module is used for executing query operation in the updated index database according to the query conditions, outputting the query result to the target user if the data meeting the query conditions are queried, constructing a query statement according to the query request if the