CN-122001780-A - Database traffic discovery method, device, equipment and storage medium
Abstract
The application discloses a database traffic self-discovery method, a device, equipment and a storage medium, which are applied to preset traffic self-discovery service and relate to the technical field of databases, and comprise the steps of analyzing database traffic data in a user network in a current period; after the current period is finished, sorting the initial combination based on TCP connection times, inputting a data packet when the target combination meeting target conditions first establishes TCP connection into a pre-trained target AI large model so as to carry out classified archiving operation on network assets corresponding to the target combination, determining the next period as a new current period, and jumping to the step of collecting database flow data in a user network in the current period until the classified archiving operation of all periods is completed. Erroneous identification of assets is avoided.
Inventors
- ZHANG DONGSONG
- ZHANG HAICHUAN
Assignees
- 杭州安恒信息技术股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260210
Claims (10)
- 1. A database traffic self-discovery method, applied to a preset traffic self-discovery service, comprising: Collecting database flow data in a user network in a current period, and analyzing the database flow data to obtain target information, wherein the target information comprises an IP address, a port number and TCP connection information; Judging whether TCP connection information corresponding to the database flow data characterizes TCP connection for the first time in the current period, and judging whether an initial combination is in a target set based on the obtained judging result, wherein the target set comprises a preset database information set and a preset non-database information set; If the initial combination is not in the target set, adding the initial combination to a preset set to be identified; After the current period is finished, sorting all initial combinations in the preset set to be identified based on the connection times corresponding to the TCP connection to obtain a sorted combination list, and selecting a target combination meeting a target condition from the sorted combination list; Acquiring a data packet when the target combination first establishes TCP connection, converting the data packet into a text in a target format, inputting the text into a pre-trained target AI large model to obtain a corresponding identification result, performing classified archiving operation on network assets corresponding to the target combination based on the identification result, determining the next period as a new current period, and then jumping to the step of collecting database flow data in a user network in the current period until the classified archiving operation of all periods is completed.
- 2. The database traffic discovery method according to claim 1, wherein the determining whether the TCP connection information corresponding to the database traffic data characterizes the first TCP connection performed in the current period, and determining whether the initial combination is in the target set based on the obtained determination result includes: Judging whether TCP connection information corresponding to the database flow data represents that TCP connection is performed for the first time in the current period; If the TCP connection information represents that the TCP connection is performed for the first time in the current period, the TCP connection times in the TCP connection information are increased, and whether the initial combination is in a target set is judged.
- 3. The database traffic discovery method according to claim 1, wherein the determining whether the TCP connection information corresponding to the database traffic data characterizes the first TCP connection in the current period further includes: And if the TCP connection information is not characterized in that the TCP connection is performed for the first time in the current period, determining the next period as a new current period, and then jumping to the step of collecting database flow data in the user network in the current period.
- 4. The database traffic discovery method according to claim 2, wherein after the current period is finished, sorting all initial combinations in the preset set to be identified based on the connection times corresponding to the TCP connection to obtain a sorted combination list, selecting a target combination satisfying a target condition from the sorted combination list, and including: After the current period is finished, sorting all initial combinations in the preset set to be identified from high to low based on the connection times corresponding to the TCP connection to obtain a sorted combination list; determining the first initial combination in the ordered combination list as a candidate combination, and judging whether TCP connection times in TCP connection information of the candidate combination reach a target connection times threshold; and if the TCP connection times reach a target connection times threshold, determining the candidate combination as a target combination.
- 5. The database traffic discovery method according to claim 4, wherein after the determining whether the number of TCP connections in the candidate combined TCP connection information reaches the target number of connections threshold, further comprising: and if the TCP connection times do not reach the target connection times threshold, determining the next period as a new current period, and then jumping to the step of collecting database flow data in the user network in the current period.
- 6. The database traffic discovery method according to any one of claims 1 to 5, wherein the training process of the target AI large model includes: Collecting version information corresponding to databases of different types, determining flow data packets when the databases operate normally, and converting the flow data packets into hexadecimal target texts; And constructing a training sample based on the target text, the corresponding database type and the version information, and training the AI large model by utilizing the training sample and a low-rank self-adaptive technology until the identification accuracy of the AI large model reaches a target accuracy threshold so as to obtain a target AI large model.
- 7. The database traffic discovery method according to claim 6, wherein the obtaining the data packet when the target combination first establishes a TCP connection, converting the data packet into a text in a target format, inputting the text into a pre-trained target AI big model to obtain a corresponding recognition result, and performing a classified archiving operation on network assets corresponding to the target combination based on the recognition result, includes: acquiring a data packet when the target combination first establishes TCP connection, converting the data packet into hexadecimal text, and inputting the text into a pre-trained target AI large model to obtain a corresponding identification result; If the identification result represents that the target combination is a database asset, adding a target database type and target version information corresponding to the target combination in the identification result to an asset list of database auditing equipment, deleting the target combination from the preset set to be identified, and adding the target combination to the preset database information set; and if the identification result indicates that the target combination is not a database asset, deleting the target combination from the preset set to be identified, and adding the target combination to the preset non-database information set.
- 8. A database traffic self-discovery device, for use in a preset traffic self-discovery service, comprising: The flow data analysis module is used for collecting database flow data in a user network in a current period and analyzing the database flow data to obtain target information, wherein the target information comprises an IP address, a port number and TCP connection information; the initial combination judging module is used for judging whether TCP connection information corresponding to the database flow data represents TCP connection for the first time in the current period and judging whether an initial combination is in a target set based on the obtained judging result, wherein the target set comprises a preset database information set and a preset non-database information set, and the initial combination is constructed by utilizing the IP address and the port number; An initial combination adding module, configured to add the initial combination to a preset set to be identified if the initial combination is not in the target set; The target combination selecting module is used for sorting all initial combinations in the preset set to be identified based on the connection times corresponding to the TCP connection after the current period is finished so as to obtain a sorted combination list, and selecting target combinations meeting target conditions from the sorted combination list; The network asset classification module is used for acquiring a data packet when the target combination establishes TCP connection for the first time, converting the data packet into a text in a target format, inputting the text into a target AI large model trained in advance to obtain a corresponding identification result, classifying and archiving network assets corresponding to the target combination based on the identification result, determining the next period as a new current period, and jumping to the step of collecting database flow data in a user network in the current period until the classified archiving operation of all periods is completed.
- 9. An electronic device, comprising: A memory for storing a computer program; a processor for executing the computer program to implement the database traffic discovery method according to any one of claims 1 to 7.
- 10. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the database traffic discovery method according to any one of claims 1 to 7.
Description
Database traffic discovery method, device, equipment and storage medium Technical Field The present invention relates to the field of database technologies, and in particular, to a database traffic discovery method, device, apparatus, and storage medium. Background In the field of database auditing, users are generally faced with the condition of numerous database assets, and it is difficult to completely comb an asset list. The database auditing equipment can receive the mirror image flow containing all the database assets of the user, so that the problem that the user is difficult to comb the assets can be solved by automatically identifying the added assets through the flow to audit. The conventional database auditing product can collect database flow first, then the related expert analyzes the flow, and the characteristics of various database flows are summarized. And finally comparing the summarized characteristics with the collected user service flow to judge the type of the database flow. The accuracy of the flow characteristics of the database which is manually analyzed and summarized is not high, a large number of assets can be mistakenly identified in an actual user service flow scene, so that the problems of unstable audit service and large number of error logs are caused, and meanwhile, the resources of audit equipment are wasted. From the above, how to avoid misidentification of assets in the actual user traffic scenario is a current urgent problem to be solved. Disclosure of Invention In view of the above, the present invention aims to provide a database traffic discovery method, a device and a storage medium, which can avoid misidentification of assets in an actual user traffic scenario. The specific scheme is as follows: In a first aspect, the present application provides a database traffic self-discovery method, applied to a preset traffic self-discovery service, including: Collecting database flow data in a user network in a current period, and analyzing the database flow data to obtain target information, wherein the target information comprises an IP address, a port number and TCP connection information; Judging whether TCP connection information corresponding to the database flow data characterizes TCP connection for the first time in the current period, and judging whether an initial combination is in a target set based on the obtained judging result, wherein the target set comprises a preset database information set and a preset non-database information set; If the initial combination is not in the target set, adding the initial combination to a preset set to be identified; After the current period is finished, sorting all initial combinations in the preset set to be identified based on the connection times corresponding to the TCP connection to obtain a sorted combination list, and selecting a target combination meeting a target condition from the sorted combination list; Acquiring a data packet when the target combination first establishes TCP connection, converting the data packet into a text in a target format, inputting the text into a pre-trained target AI large model to obtain a corresponding identification result, performing classified archiving operation on network assets corresponding to the target combination based on the identification result, determining the next period as a new current period, and then jumping to the step of collecting database flow data in a user network in the current period until the classified archiving operation of all periods is completed. Optionally, the determining whether the TCP connection information corresponding to the database traffic data characterizes that the TCP connection is performed for the first time in the current period, and determining whether the initial combination is in the target set based on the obtained determination result includes: Judging whether TCP connection information corresponding to the database flow data represents that TCP connection is performed for the first time in the current period; If the TCP connection information represents that the TCP connection is performed for the first time in the current period, the TCP connection times in the TCP connection information are increased, and whether the initial combination is in a target set is judged. Optionally, determining whether the TCP connection information corresponding to the database traffic data characterizes that the TCP connection is performed for the first time in the current period further includes: And if the TCP connection information is not characterized in that the TCP connection is performed for the first time in the current period, determining the next period as a new current period, and then jumping to the step of collecting database flow data in the user network in the current period. Optionally, after the current period ends, sorting all initial combinations in the preset set to be identified based on the connection times corresponding to the T