CN-122027354-A - Industrial Internet component fingerprint library construction method
Abstract
The invention belongs to the field of industrial Internet security and asset identification, relates to a method for constructing an industrial Internet component fingerprint library, and aims to solve the problems of sparse fingerprint characteristics and low identification rate of a single data source. The method comprises the steps of obtaining industrial exchange flow, control logs and component firmware metadata, associating and aligning to form a multi-source data set, extracting protocol behavior fingerprints, operation semantic fingerprints and static gene fingerprints, correcting, mapping corrected fingerprints into feature matrixes, determining weights according to stability, carrying out conflict resolution based on protocol, operation semantic and firmware structure constraint relations when different data sources conflict, generating fusion fingerprints, extracting common features of similar components through clustering fusion to form fingerprint library entries, and finally carrying out difference comparison based on component state or configuration change to achieve dynamic update or branch establishment of a fingerprint library. The invention improves the accuracy, robustness and dynamic adaptability of the fingerprint of the industrial Internet component.
Inventors
- ZHOU XIAOJUN
- Men Jiaping
Assignees
- 北京中关村实验室
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (10)
- 1. The method for constructing the fingerprint library of the industrial Internet component is characterized by comprising the following steps of: Acquiring industrial exchange flow, a control log and component firmware metadata, and performing association alignment on the industrial exchange flow and the control log according to industrial protocol transaction information to form a multi-source data set corresponding to the same component instance; extracting protocol behavior fingerprints, operation semantic fingerprints and static gene fingerprints from industrial exchange flow, control logs and component firmware metadata in the multi-source data set respectively, and correcting according to the mutual verification relationship among the protocol behavior fingerprints, the operation semantic fingerprints and the static gene fingerprints; Mapping the corrected protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint into fingerprint feature matrixes, determining weights of the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint according to the stability degree of the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint in an industrial field, and carrying out conflict resolution based on protocol constraint relations, operation semantic constraint relations and firmware structure constraint relations when different data sources point to different component categories to generate fusion fingerprints of corresponding component instances; Clustering and fusing the fused fingerprints, extracting common fingerprint features which are consistent among similar component examples, and forming a fingerprint library entry; When the component state change or the configuration change is detected, the multisource data set is acquired again for the changed component and is subjected to difference comparison with the fingerprint library entry, and the fingerprint library entry is updated or a new fingerprint branch is established according to a difference result.
- 2. The method for constructing an industrial internet component fingerprint library according to claim 1, wherein the industrial exchange traffic and the control log are associated and aligned according to industrial protocol transaction information to form a multi-source data set corresponding to the same component instance, and the method comprises the steps of: extracting an industrial protocol transaction identifier and a transaction time sequence relationship carried in the industrial exchange flow, matching transaction records belonging to the same industrial control process from the control log according to the transaction identifier, and establishing an initial anchoring relationship between the industrial exchange flow and the control log at a transaction layer; based on the initial anchoring relation, carrying out consistency check on the communication endpoint attribute in the industrial exchange flow and the component instance identity information recorded in the control log, and carrying out time stamp alignment and transaction state synchronization on the industrial exchange flow passing the check and the control log to form a multi-source association record taking a component instance as an index; And in the multi-source association record, the component firmware metadata is used as an identity verification benchmark, the industrial exchange flow and the component instance pointed by the control log are subjected to cross verification, and the verified multi-source data are solidified into the multi-source data set.
- 3. The method for constructing an industrial internet component fingerprint library according to claim 1, wherein protocol behavior fingerprints, operation semantic fingerprints and static gene fingerprints are extracted for industrial exchange traffic, control logs and component firmware metadata in the multi-source dataset, respectively, by the method comprising: Analyzing the industrial exchange flow, extracting migration paths of a protocol state machine in a handshake stage, a keep-alive stage and a stop stage, extracting a value sequence of a key field from the migration paths and a message interaction time sequence to form the protocol behavior fingerprint for representing a component communication behavior mode; Analyzing the control log, identifying the operation type and the semantic label of the operation object in the log, and aggregating the semantic label into a control logic chain according to an operation time sequence to form the operation semantic fingerprint for representing the control logic characteristics of the component; Analyzing the component firmware metadata, extracting the file system hierarchy structure, the specific version identifier and the offset position of the key function fragment in the binary, and forming the static gene fingerprint for representing the inherent attribute of the component.
- 4. The method for constructing an industrial internet component fingerprint library according to claim 1, wherein the correction is performed according to a mutual authentication relationship among the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint, comprising: taking the analyzed firmware version identifier and the component model in the static gene fingerprint as references, carrying out consistency check on the protocol version field and the state machine migration path in the protocol behavior fingerprint, and eliminating the state machine path redundancy which is inconsistent with the firmware attribute; taking the firmware function module analyzed from the static gene fingerprint as a reference, carrying out matching verification on the operation type set in the operation semantic fingerprint, and screening out operation semantic records which exceed the firmware function boundary and belong to non-component self behaviors; Performing time sequence cross verification by using the communication cycle characteristics analyzed in the protocol behavior fingerprint and the control logic trigger time sequence analyzed in the operation semantic fingerprint, and executing synchronous correction on fingerprint fragments with time sequence deviation exceeding the allowable deviation range of the industrial field; and aggregating the effective features in the protocol behavior fingerprints, the operation semantic fingerprints and the static gene fingerprints after verification and correction to form three types of fingerprints after correction.
- 5. The method for constructing the fingerprint library of the industrial internet component according to claim 1, wherein the corrected protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint are mapped into a fingerprint feature matrix, and the method comprises the following steps: Taking a component instance as a matrix row index, and taking a protocol behavior dimension, an operation semantic dimension and a static gene dimension as matrix column domains to construct a feature matrix frame; Mapping the state machine migration path, the key field value sequence and the communication period characteristic in the corrected protocol behavior fingerprint to each characteristic column under the protocol behavior dimension respectively to form a protocol behavior characteristic vector; mapping a control logic chain, an operation type set and an operation time sequence relation in the corrected operation semantic fingerprint to each feature column under the operation semantic dimension respectively to form an operation semantic feature vector; Mapping the firmware version identification, the file system structural features and the key function offset positions in the corrected static gene fingerprint to each feature column under the static gene dimension respectively to form a static gene feature vector; and carrying out association aggregation on the protocol behavior feature vector, the operation semantic feature vector and the static gene feature vector under a matrix row index to construct a fingerprint feature matrix containing three types of fingerprint feature mapping relations.
- 6. The method for constructing an industrial internet component fingerprint library according to claim 1, wherein the weights of the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint are determined according to the stability degree of the fingerprint in an industrial field, and the method comprises the following steps: And respectively evaluating the stability degrees of the three types of fingerprints according to the disturbance rejection capability of the protocol behavior fingerprints in the industrial network environment, the change period of the control logic corresponding to the operation semantic fingerprints and the upgrading frequency and the replacement probability of the firmware corresponding to the static gene fingerprints, and giving a first stability weight or a second stability weight according to an evaluation result, wherein the first stability weight is greater than the second stability weight.
- 7. The method for constructing an industrial internet component fingerprint library according to claim 6, wherein the specific method for respectively evaluating the stability of the three types of fingerprints and giving the first stability weight or the second stability weight is as follows: For the protocol behavior fingerprint, giving a first stability weight to a communication link disturbance lower than or equal to an on-site disturbance empirical reference value and a message format solidification degree higher than or equal to a solidification degree empirical reference value, and giving a second stability weight to a communication link disturbance higher than the on-site disturbance empirical reference value or a message format solidification degree lower than the solidification degree empirical reference value; for the operation semantic fingerprint, a first stability weight is given to the change period longer than or equal to the change period experience reference value, and a second stability weight is given to the change period shorter than the change period experience reference value; And for the static gene fingerprint, giving a first stability weight to the upgrade frequency lower than or equal to the upgrade frequency experience reference value and the replacement probability lower than or equal to the replacement probability experience reference value, and giving a second stability weight to the upgrade frequency higher than the upgrade frequency experience reference value or the replacement probability higher than the replacement probability experience reference value.
- 8. The method for constructing the fingerprint library of the industrial internet component according to claim 1, wherein when different data sources point to different component categories, conflict resolution is performed based on a protocol constraint relationship, an operation semantic constraint relationship and a firmware structure constraint relationship, and a fusion fingerprint of a corresponding component instance is generated, and the method comprises the following steps: When the industrial exchange flow, the control log and the component categories pointed by at least two data sources in the component firmware metadata are inconsistent, eliminating the component categories which do not accord with the firmware structural characteristics by taking the firmware structural constraint relation as a reference constraint; cross-verifying the protocol constraint relation and the operation semantic constraint relation, reserving component categories which simultaneously accord with the protocol constraint relation and the operation semantic constraint relation, and marking the component category which accords with only one of the protocol constraint relation and the operation semantic constraint relation as a category to be verified; For the class to be verified, tracking the context of the industrial control process according to the association anchoring relation of the industrial exchange flow and the control log at the transaction level, and performing secondary confirmation by taking the consistency of control logic in the context as a verification basis; And merging the verified component category with the reserved component category, determining the component category as the final category of the component instance, and binding the component category with the corrected protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint to generate a fusion fingerprint of the corresponding component instance.
- 9. The method for constructing the fingerprint library of the industrial internet component according to claim 1, wherein the fusion fingerprints are clustered and fused, and common fingerprint features which are consistent among similar component examples are extracted and fingerprint library entries are formed, and the method comprises the following steps: taking the model identification and the firmware version identification of the component instance as classification anchor points, and classifying fusion fingerprints corresponding to the component instances with the same model identification and the same firmware version identification into the same class cluster; In the class cluster, respectively carrying out frequency statistics on each feature dimension in the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint, and marking the feature dimension with the occurrence frequency higher than a frequency threshold as a candidate public feature; performing association verification on the candidate public features and the corresponding stability weights in the fused fingerprints, removing candidate public features with the stability weights lower than a weight threshold, and determining the remaining features as public fingerprint features which keep consistency among similar component examples; and packaging the common fingerprint features into fingerprint library entries by taking the model identification and the firmware version identification as indexes, and establishing a mapping relation between the fingerprint library entries and corresponding component examples.
- 10. The method for constructing fingerprint database of industrial internet component according to claim 1, wherein the method for updating the fingerprint database entry or creating a new fingerprint branch according to the difference result comprises the steps of: Performing difference comparison on the multisource data set acquired again by the changed component and the common fingerprint features in the fingerprint library entry to generate a difference feature vector; When the characteristic variation amplitude in the difference characteristic vector is lower than the difference amplitude experience reference value, the difference characteristic vector is used as an incremental characteristic to be integrated into the corresponding fingerprint dimension in the fingerprint library entry, and the updating of the fingerprint library entry is completed; When the characteristic variation amplitude in the difference characteristic vector is higher than or equal to the difference amplitude experience reference value, a new fusion fingerprint is constructed according to the difference characteristic vector, a new fingerprint branch is established by taking the current state identification of the component instance as an index, and meanwhile, an original fingerprint library entry and a historical fingerprint branch thereof are reserved.
Description
Industrial Internet component fingerprint library construction method Technical Field The invention belongs to the field of industrial Internet security and asset identification, and particularly relates to a method for constructing an industrial Internet component fingerprint library. Background In an industrial Internet environment, the components are various in types, a plurality of devices such as PLC, SCADA, HMI and industrial routers are covered, the privatization degree of an industrial protocol is high, the iteration of a firmware version is complex, and a great challenge is brought to component identification and asset management. In the prior art, a common fingerprint library construction mode mainly comprises two types of rule matching and single data source learning. The former typically relies on manually extracting features and building static rule bases such as port features, banner information, or fixed URL paths, but such methods have difficulty covering unknown components, have poor recognition of variant components, and have high subsequent maintenance costs. The method is characterized in that the method only depends on network traffic packets or device logs to perform feature clustering, and because of scattered data sources and information splitting in an industrial field, problems of sparse features, incomplete characterization and the like of a single data source are easy to occur, and particularly, fine differences of different firmware versions and different configuration states of the same model are difficult to distinguish. Therefore, the prior art lacks a fingerprint library construction scheme capable of integrating multi-source information such as industrial exchange flow, control log and component firmware metadata and performing joint modeling on component identity, communication behavior, control semantics and firmware structure, so that the accuracy, stability and expandability of the fingerprint library are insufficient, and the actual requirements of component identification, state tracking and dynamic updating in industrial Internet scenes are difficult to meet. Disclosure of Invention In order to solve the problems in the prior art, namely the problems of sparse fingerprint characteristics and low recognition rate of a traditional single data source, the invention provides an industrial Internet component fingerprint library construction method, which comprises the following steps: Acquiring industrial exchange flow, a control log and component firmware metadata, and performing association alignment on the industrial exchange flow and the control log according to industrial protocol transaction information to form a multi-source data set corresponding to the same component instance; extracting protocol behavior fingerprints, operation semantic fingerprints and static gene fingerprints from industrial exchange flow, control logs and component firmware metadata in the multi-source data set respectively, and correcting according to the mutual verification relationship among the protocol behavior fingerprints, the operation semantic fingerprints and the static gene fingerprints; Mapping the corrected protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint into fingerprint feature matrixes, determining weights of the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint according to the stability degree of the protocol behavior fingerprint, the operation semantic fingerprint and the static gene fingerprint in an industrial field, and carrying out conflict resolution based on protocol constraint relations, operation semantic constraint relations and firmware structure constraint relations when different data sources point to different component categories to generate fusion fingerprints of corresponding component instances; Clustering and fusing the fused fingerprints, extracting common fingerprint features which are consistent among similar component examples, and forming a fingerprint library entry; When the component state change or the configuration change is detected, the multisource data set is acquired again for the changed component and is subjected to difference comparison with the fingerprint library entry, and the fingerprint library entry is updated or a new fingerprint branch is established according to a difference result. Furthermore, the industrial exchange flow is associated and aligned with the control log according to industrial protocol transaction information to form a multi-source data set corresponding to the same component instance, and the method comprises the following steps: extracting an industrial protocol transaction identifier and a transaction time sequence relationship carried in the industrial exchange flow, matching transaction records belonging to the same industrial control process from the control log according to the transaction identifier, and establi