Search

US-12621225-B2 - Estimation device, estimation method, and estimation program

US12621225B2US 12621225 B2US12621225 B2US 12621225B2US-12621225-B2

Abstract

An estimation apparatus includes processing circuitry configured to extract a predetermined number of similar normal pieces of packet data with a relatively high similarity to abnormal packet data from among a plurality of normal pieces of packet data based on a natural language processing model, and extract same-length packet data with the same packet length as the abnormal packet data from the similar normal packet data extracted, and compare the abnormal packet data with the same-length packet data for each byte to estimate an abnormal byte location.

Inventors

  • Yuki Yamanaka
  • Tomohiro Nagai

Assignees

  • NTT, INC.

Dates

Publication Date
20260505
Application Date
20211217

Claims (6)

  1. 1 . An estimation method, comprising: receiving an abnormal packet data identified as abnormal: extracting N normal pieces of packet data from a plurality of normal pieces of packet data by identifying the N normal pieces of packet data having the highest similarity scores related to the abnormal packet data, wherein the scores are calculated based on a natural language processing model and N is an integer; determining whether a number of the extracted N normal pieces of packet data have a same packet length as the abnormal packet data and whether the number exceeds a threshold value; determining that the number exceeds the threshold value; in response to a determination that the number exceeds the threshold value, extracting same-length packet data having the same packet length as the abnormal packet data from the N normal pieces of packet data and comparing the abnormal packet data with the same-length packet data for each byte to estimate an abnormal byte location; and determining that the number does not exceed the threshold value; in response to a determination that the number does not exceed the threshold value, calculating an edit distance between the abnormal packet data and the N similar normal pieces of packet data to estimate an inserted or deleted byte location.
  2. 2 . The estimation apparatus according to claim 1 , wherein the processing circuitry is further configured to perform one-dimensional abnormality detection for treating each byte of the abnormal packet data and the same-length packet data as a value and comparing values.
  3. 3 . The estimation apparatus according to claim 1 , wherein N similar normal pieces of vector data with highest similarity scores relative to abnormal vector data obtained by converting the abnormal packet data using the natural language processing model are specified from among a plurality of normal pieces of vector data obtained by converting the plurality of normal pieces of packet data using the natural language processing model for converting the packet data into vector data in which each vector representing characteristics of a value of each byte of the packet data is associated with a byte, and the normal packet data before conversion of the N similar normal pieces of vector data is extracted as the N normal pieces of packet data.
  4. 4 . The estimation apparatus according to claim 1 , wherein the processing circuitry is further configured to use Bidirectional Encoder Representations from Transformers (BERT) as the natural language processing model.
  5. 5 . An estimation method, comprising: receiving an abnormal packet data identified as abnormal; extracting N normal pieces of packet data from a plurality of normal pieces of packet data by identifying the N normal pieces of packet data having highest similarity scores relative to the abnormal packet data, wherein the scores are calculated based on a natural language processing model and N is an integer; determining whether a number of the extracted N normal pieces of packet data have a same packet length as the abnormal packet data and whether the number exceeds a threshold value; in response to a determination that the number exceeds the threshold value, extracting same-length packet data having the same packet length as the abnormal packet data from the N normal pieces of packet data and comparing the abnormal packet data with the same-length packet data for each byte to estimate an abnormal byte location; and in response to a determination that the number does not exceed the threshold value, calculating an edit distance between the abnormal packet data and the N similar normal pieces of packet data to estimate an inserted or deleted byte location.
  6. 6 . A non-transitory computer-readable recording medium storing therein an estimation program that causes a computer to execute a process comprising: receiving an abnormal packet data identified as abnormal; extracting N normal pieces of packet data from a plurality of normal pieces of packet data by identifying the N normal pieces of packet data having highest similarity scores relative to the abnormal packet data, wherein the scores are calculated based on a natural language processing model and N is an integer; determining whether a number of the extracted N normal pieces of packet data have a same packet length as the abnormal packet data and whether the number exceeds a threshold value; in response to a determination that the number exceeds the threshold value, extracting same-length packet data having the same packet length as the abnormal packet data from the N normal pieces of packet data and comparing the abnormal packet data with the same-length packet data for each byte to estimate an abnormal byte location; and in response to a determination that the number does not exceed the threshold value, calculating an edit distance between the abnormal packet data and the N similar normal pieces of packet data to estimate an inserted or deleted byte location.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application is based on PCT filing PCT/JP2021/046840, filed on Dec. 17, 2021, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD The present invention relates to an estimation apparatus, an estimation method, and an estimation program. BACKGROUND ART An abnormality detection system or intrusion detection system (OT-IDS: Operational Technology Intrusion Detection System) has attracted attention in a communication network of an operational technology (OT) in an industrial system, a building system, and the like. In packets transmitted or received through such a communication network, unexpected operations, such as a temperature setting value being changed by one digit due to unauthorized rewriting, may cause a serious accident. Therefore, it is desirable to be able to detect unauthorized rewriting of one byte of a payload corresponding to content of the communication without fail. Therefore, precise analysis of payload content is essential in an abnormality detection system for a network control system in an industrial system and a building system. As a technology for detailed analysis of payload content, for example, a technology for applying a natural language processing technology such as Bidirectional Encoder Representations from Transformers (BERT) to packet analysis, extracting information from a payload of any protocol, and performing abnormality detection is provided. Further, a technology for estimating an abnormal byte location as more information when detecting an abnormality has been proposed. This is a technology for searching for a normal packet that is most similar to the detected abnormal packet using, for example, BERTScore and comparing the normal packet with the abnormal packet in a high-dimensional space encoded by BERT. CITATION LIST Non Patent Literature [NPL 1] Yuki Yamanaka, Masanori Yamada, Tomokatsu Takahashi, Tomohiro Nagai, “Feature Extraction of Packet Payload Using BERT,” 2021 JSAI Annual Conference (35th) SUMMARY OF INVENTION Technical Problem However, although a related art for estimating an abnormal byte location works well only under limited circumstances, it may be difficult to accurately estimate an abnormal byte location for some actual abnormal communications. Solution to Problem In order to solve the above-described problem and achieve the object, an estimation apparatus includes: processing circuitry configured to: extract a predetermined number of similar normal pieces of packet data with a relatively high similarity to abnormal packet data from among a plurality of normal pieces of packet data based on a natural language processing model; and extract same-length packet data with the same packet length as the abnormal packet data from the similar normal packet data extracted, and compare the abnormal packet data with the same-length packet data for each byte to estimate an abnormal byte location. Advantageous Effects of Invention According to the present invention, it is possible to accurately estimate the abnormal byte location in a communication protocol packet. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram of an information processing apparatus according to an embodiment. FIG. 2 is a block diagram illustrating details of a question generation unit. FIG. 3 is a block diagram of a machine learning apparatus that learns a question generation model. FIG. 4 is a diagram illustrating an example of question-answer learning data. FIG. 5 is an image diagram of learning data for learning the question generation model. FIG. 6 is a diagram illustrating an example of question sentence creation in the information processing apparatus according to the embodiment. FIG. 7 is a flowchart of question generation processing in the information processing apparatus according to the embodiment. FIG. 8 is a flowchart of machine learning processing in the machine learning apparatus according to the embodiment. FIG. 9 is a diagram illustrating results of an experiment using the information processing apparatus according to the embodiment. FIG. 10 is a diagram illustrating an example of a computer that executes an information processing program. DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments of the estimation apparatus, estimation method, and estimation program disclosed in the present application will be described in detail on the basis of the drawings. The estimation apparatus, estimation method, and estimation program disclosed in the present application are not limited to the following embodiments. [Estimation Apparatus] An estimation apparatus 1 according to an embodiment of the present invention will be described with reference to FIG. 1. When an abnormal packet is input, the estimation apparatus 1 estimates and outputs an abnormal byte in the abnormal packet. The estimation apparatus 1 compares the abnormal packet determined to be abnormal by another system with a normal packet determined t