Search

CN-122020666-A - Identifier pattern based cross-source vulnerability association method for identifying and linking with bidirectional atoms

CN122020666ACN 122020666 ACN122020666 ACN 122020666ACN-122020666-A

Abstract

The invention discloses a cross-source vulnerability association method based on identifier mode identification and bidirectional atomic link, which can be used for remarkably supplementing hidden aliases and delayed references through a three-channel candidate discovery mechanism, so that the alignment proportion of the same vulnerability among different sources is improved, the problem of association omission caused by identifier deletion or heterogeneous naming in the prior art is effectively solved, the situation that symmetry invariance can be still met under the condition of concurrent writing is ensured through the synergistic effect of the bidirectional atomic link, concurrent retry and a symmetry audit self-healing mechanism, the occurrence probability of unidirectional edges and lost update is remarkably reduced, the bidirectional accessibility and logic consistency of association relations are ensured, a field-level atomic idempotent update strategy is adopted, global locks are avoided, the parallel execution requirement of distributed acquisition and batch processing tasks can be well adapted, and the engineering feasibility of practical deployment in a large-scale production environment is realized.

Inventors

  • HOU SHIYU
  • ZHU YANTAO
  • LI YUQI
  • LI KUN
  • LI DUO
  • LIANG YANAN
  • Zhao Jiaxuan
  • Bian Jizong
  • YIN SHUO

Assignees

  • 公安部第一研究所

Dates

Publication Date
20260512
Application Date
20260203

Claims (9)

  1. 1. The cross-source vulnerability association method based on the identifier mode identification and the bidirectional atomic link is characterized by comprising the following steps: S1, collecting vulnerability information, namely after collecting vulnerability information from an external data source, generating a corresponding vulnerability record document for each piece of vulnerability information in a vulnerability document library of a storage layer, wherein fields of the vulnerability record document comprise a vulnerability main identifier, a source identifier, a tag array, an alias array, a description text, an associated identifier set and final modification time; s2, normalization processing, namely performing normalization processing on an original vulnerability main identifier contained in vulnerability information, and executing format validity check, wherein if the check fails, the processing is ended and failure reasons are recorded; S3, identifier type identification, namely carrying out matching identification on the original vulnerability main identifier subjected to standardization processing in an identifier type mode library to obtain an identifier type, and writing the original vulnerability main identifier subjected to planning processing into a vulnerability main identifier field of a vulnerability document as a standard ID v of a unified key space; If the vulnerability information of a certain data source simultaneously comprises an original vulnerability main identifier and a CVE identifier of the data source, and finally the CVE identifier is selected as a standard ID in a unified key space, the original vulnerability main identifier is written into a tag array of a vulnerability record document; S4, candidate discovery: S4.1, reading a tag array field of the vulnerability record document, and analyzing an external identifier tag in the tag array field to obtain a candidate set C1; S4.2, if the alias array field in the vulnerability record document is not null, reading the identifier of the alias array field, and adding the normalized identifier into the candidate set C2; s4.3, performing sub-cascade mode scanning on descriptive text fields in the vulnerability record document, extracting sub-strings conforming to an identifier format, normalizing the sub-strings, and adding the sub-strings into a candidate set C3; S5, merging to obtain a candidate set C=C1 U.C2 U.C 3, performing duplication removal and denoising on the candidate set C, removing identifiers which are the same as v in a vulnerability main identifier field of the vulnerability recording document in the candidate set C, and removing identifiers which do not meet the format validity check; S6, bi-directional atom association writing, namely sequentially executing the following steps of: (1) Writing A-B, namely updating the vulnerability record document corresponding to v in a storage layer, initializing the vulnerability record document if a field of an associated identifier set is empty, adding a candidate c into the associated identifier set, and adding the candidate c into the associated identifier set if the associated identifier set already exists but c does not exist in the associated identifier set; (2) B-A is written, namely the same updating is carried out on the vulnerability record documents corresponding to the candidates c by the storage layer, v is written into an associated identifier set of the vulnerability record documents corresponding to the candidates c, and a last modified field is updated; and S7, outputting and recording, namely outputting the processing result, including newly increased association number, a failure candidate list and failure reasons, and writing an audit log and a statistical counter, wherein the statistical counter is used for counting the number of successful writing times, the number of failed writing times, the number of retries, the failure reason distribution and the number of tasks to be compensated.
  2. 2. The method according to claim 1, wherein in step S2, the normalization process includes case normalization and removal of invisible characters and spaces.
  3. 3. The method of claim 1, wherein in steps S6 (1) and (2), field-level scripted atomic updates are used to update the vulnerability document, the full vulnerability record document is not read during writing, and only conditional updates are performed on the fields of the set of associated identifiers.
  4. 4. The method of claim 1 wherein in step S6, if either of (1) and (2) fails, then triggering retry logic to retry the incomplete write operation, and if a limited number of retries still fail, then recording the incomplete write as a "task to be compensated" for by a subsequent symmetry audit.
  5. 5. The method according to claim 1, wherein in step S7, the processing result includes adding a number of associations, a failure candidate list, and a failure cause, and writing an audit log and a statistics counter.
  6. 6. The method of claim 4, further comprising the step of periodically performing a symmetry audit task, step S8: S8.1, scanning all vulnerability record document sets meeting the condition that the associated identifier set is not empty; S8.2, checking whether the associated identifier set in the vulnerability record document corresponding to each element in the associated identifier set of a certain vulnerability record document A contains a vulnerability main identifier of the vulnerability record document A or not, if the associated identifier set in the vulnerability record document B does not contain the vulnerability main identifier of the vulnerability record document A, writing the vulnerability main identifier of the vulnerability record document A into the associated identifier set of the vulnerability record document B, and recording audit repairing times; and S8.3, outputting audit indexes, wherein the audit indexes comprise the number of detected asymmetric association edges, the repair success rate and the source distribution, and the asymmetric association edges refer to the condition that the association relationship exists only in one direction, namely, the association identifier set of A contains B, but the association identifier set of B does not contain A.
  7. 7. The method according to claim 1, wherein when the vulnerability list is not specified, the latest collected vulnerability information is pulled according to a time window, and automatic association is performed on each piece of vulnerability information according to steps S2-S7.
  8. 8. The method of claim 1, further comprising the step of outputting and continuously tracking the following evaluation criteria: (1) The associated coverage rate is calculated by the following steps: coverage = count(exists(linked_ids)) / count(total) Wherein exists (linked_ids) represents the number of vulnerability records for which cross-source association has been established, total represents the total number of vulnerability records; (2) The symmetry failure rate is calculated by the following steps: ASYMMETRY RATE = count ({ (a, B) |b e linked_ids (a) and a ∉ linked_ids (B))/count (all_links) The method is used for measuring the proportion of the asymmetric associated edges to all the associated edges; (3) The self-healing repair rate is calculated by the following steps: Healing Rate=repaired_asym_links/detected_asym_links wherein detected _asym_links represent the total number of asymmetric associated edges detected in the symmetry audit; repaired _asym_links represent the number of asymmetric associated edges that were successfully repair written in the current symmetry audit; (4) The concurrent conflict retry success rate is calculated by the following steps: retry_ok=retries_succeeded/ retries_total Where retries_total represents the total number of retries triggered by concurrent write conflicts, and retries_ succeeded represents the number of final successful writes completed in these retries.
  9. 9. A system for implementing the method of any of claims 1-8, comprising an access layer, a processing layer, a core service layer, and a storage layer; The access layer is used for providing an interface and inputting vulnerability records to be processed into the processing layer; the processing layer is used for carrying out standardization processing, type identification, candidate discovery and de-duplication de-noising on the vulnerability record; The core service layer is used for performing bidirectional atom association writing on the candidate set output by the processing layer and periodically executing symmetry audit; The storage layer is used for storing vulnerability record documents and audit logs.

Description

Identifier pattern based cross-source vulnerability association method for identifying and linking with bidirectional atoms Technical Field The invention relates to the technical field of network security, in particular to a cross-source vulnerability association method based on identifier pattern recognition and bidirectional atomic link. Background Cross-source vulnerability association (Cross-Origin Vulnerability Correlation) is a technical method for integrating and analyzing security vulnerability information with different sources and different dimensions to find a wider attack surface or advanced threat. By connecting the seemingly isolated vulnerability data, the method reveals the attack paths of cross systems, cross applications or cross organizations possibly utilized by an attacker. Currently, "alignment" or "referencing" of cross-source vulnerabilities is typically accomplished in one of the following ways: 1. the centralized mapping table scheme records the mapping of 'ID_A- > ID_B' in the independent relation table. 2. One-way reference field scheme is that setting 'references/aliases' fields in the vulnerability record, storing URL or external ID, but not forcing reverse write-back. 3. The map construction scheme is that a security knowledge map is constructed, vulnerabilities, assets, attack events and the like are used as nodes, association relations are inferred, and the alignment of the vulnerability IDs often depends on existing data or manual rules. The prior art has at least the following defects: 1. The associative unidirectionality makes it unreachable that recording a reference on one side only, there is no guarantee that the corresponding record is retrieved back from the other side. 2. And the concurrent writing results in lost updating, namely under the distributed acquisition and parallel tasks, the traditional read-change-write is easy to be covered, and the association relation is lost. 3. Hidden aliases are difficult to automatically discover, and external IDs often appear in descriptive text, labels, or bulletin documents, with insufficient coverage in the absence of systematic extraction and denoising strategies. 4. The quality indexes such as 'associated proportion', 'distribution according to sources', 'symmetry is destroyed' and the like cannot be continuously quantified, and closed loop optimization is difficult. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a cross-source vulnerability association method based on identifier pattern recognition and bi-directional atomic linking. In order to achieve the above purpose, the present invention adopts the following technical scheme: A cross-source vulnerability association method based on identifier pattern recognition and two-way atomic linking comprises the following steps: S1, collecting vulnerability information, namely after collecting vulnerability information from an external data source, generating a corresponding vulnerability record document for each piece of vulnerability information in a vulnerability document library of a storage layer, wherein fields of the vulnerability record document comprise a vulnerability main identifier, a source identifier, a tag array, an alias array, a description text, an associated identifier set and final modification time; s2, normalization processing, namely performing normalization processing on an original vulnerability main identifier contained in vulnerability information, and executing format validity check, wherein if the check fails, the processing is ended and failure reasons are recorded; S3, identifier type identification, namely carrying out matching identification on the original vulnerability main identifier subjected to standardization processing in an identifier type mode library to obtain an identifier type, and writing the original vulnerability main identifier subjected to planning processing into a vulnerability main identifier field of a vulnerability document as a standard ID v of a unified key space; If the vulnerability information of a certain data source simultaneously comprises an original vulnerability main identifier and a CVE identifier of the data source, and finally the CVE identifier is selected as a standard ID in a unified key space, the original vulnerability main identifier is written into a tag array of a vulnerability record document; S4, candidate discovery: S4.1, reading a tag array field of the vulnerability record document, and analyzing an external identifier tag in the tag array field to obtain a candidate set C1; S4.2, if the alias array field in the vulnerability record document is not null, reading the identifier of the alias array field, and adding the normalized identifier into the candidate set C2; s4.3, performing sub-cascade mode scanning on descriptive text fields in the vulnerability record document, extracting sub-strings conforming to an identifier format, normalizing the sub-s