JP-7856671-B2 - Anomaly detection system and method
Inventors
- アンドレイ・エム,マノラチェ
- フローリン・エム,ブラッド
- アレクサンドゥル,ノバク
- エレナ,ブルチャヌ
Assignees
- ビットディフェンダー アイピーアール マネジメント リミテッド
Dates
- Publication Date
- 20260511
- Application Date
- 20220328
- Priority Date
- 20210409
Claims (20)
- A computer-implemented anomaly detection method, wherein the method comprises at least one hardware processor of a computer system, The steps of generating a modified token sequence by applying the selected transformation to the training token sequence in response to selecting a training token sequence from a training corpus of token sequences and selecting a transformation from a predetermined set of sequence transformations, A step of running a sequence analyzer having a set of adjustable parameters and configured to determine a transformation prediction indicator according to the modified token sequence, wherein the transformation prediction indicator indicates the possibility that the selected transformation was applied to generate the modified token sequence. A step of adjusting at least one parameter from the set of adjustable parameters according to the transformation prediction marker in response to determining the transformation prediction marker, A computer-implemented anomaly detection method, comprising the steps of: adjusting at least one of the parameters; and using the sequence analyzer to determine whether the target token sequence is anomaly.
- A method according to claim 1, wherein the step of applying the selected transformation includes the step of replacing the selected tokens in the training token sequence with alternative tokens.
- The method according to claim 2, wherein the at least one hardware processor is The steps include: running a token generator having another set of adjustable parameters and configured to generate the alternative tokens according to the training token sequence; A method further comprising the step of adjusting another parameter of the other set of tunable parameters in accordance with the conversion prediction marker in response to determining the conversion prediction marker.
- The method according to claim 1, A method comprising an item selected from a group of steps, the steps of applying the selected transformation, the steps of deleting selected tokens from the training token sequence, inserting additional tokens into the training token sequence, and sorting a selected subset of tokens in the training token sequence.
- The method according to claim 1, The sequence analyzer is further configured to determine a token prediction indicator according to the modified token sequence, the token prediction indicator indicating that a selected token in the modified token sequence may have been altered by the application of the selected transformation. A method comprising the step of adjusting the at least one adjustable parameter, further comprising the step of adjusting the at least one adjustable parameter according to the token prediction indicator.
- The method according to claim 1, The aforementioned training token sequence and target token sequence include text constructed in natural language. The method further includes, in response to the at least one hardware processor determining whether the target token sequence is abnormal, determining that if the target token sequence is abnormal, the creator of the target token sequence is different from the creator of the training token sequence. method.
- The method according to claim 1, The aforementioned training token sequence and target token sequence include text constructed in natural language. The method further includes, in response to the at least one hardware processor determining whether the target token sequence is abnormal, determining that if the target token sequence is abnormal, the subject of the target token sequence is different from the subject of the training token sequence. method.
- The method according to claim 1, The aforementioned training token sequence and target token sequence include text constructed in natural language. The method further includes, in response to the determination by at least one hardware processor whether the target token sequence is abnormal, determining that the target token sequence was machine-generated if the target token sequence is abnormal. method.
- The method according to claim 1, The aforementioned training corpus includes text fragments selected according to selection criteria, The method further includes, in response to the determination by at least one hardware processor whether the target token sequence is abnormal, determining that if the target token sequence is abnormal, the target token sequence does not satisfy the selection criteria. method.
- The method according to claim 1, The aforementioned training token sequence and target token sequence include a sequence of computing events, The method further includes, in response to the determination by at least one hardware processor whether the target token sequence is abnormal, determining that if the target token sequence is abnormal, the target token sequence indicates a computer security threat. method.
- A computer system comprising at least one hardware processor, wherein the at least one hardware processor is In response to selecting a training token sequence from a training corpus of token sequences, and in response to selecting a transformation from a predetermined set of sequence transformations, the selected transformation is applied to the training token sequence to generate a modified token sequence. Running a sequence analyzer having a set of adjustable parameters and configured to determine a transformation prediction indicator according to the modified token sequence, wherein the transformation prediction indicator indicates the possibility that the selected transformation was applied to generate the modified token sequence; In response to determining the transformation prediction marker, adjusting at least one parameter from the set of adjustable parameters according to the transformation prediction marker, A computer system configured to perform the sequence analyzer to determine whether the target token sequence is abnormal in response to adjusting at least one of the aforementioned parameters.
- A computer system according to claim 11, wherein applying the selected transformation includes replacing the selected tokens in the training token sequence with alternative tokens.
- The computer system according to claim 12, wherein the at least one hardware processor is Executing a token generator having another set of tunable parameters and configured to generate the alternative tokens according to the training token sequence, A computer system further configured to, in response to determining the transformation prediction marker, adjust another parameter of the other set of tunable parameters according to the transformation prediction marker.
- A computer system according to claim 11, A computer system that applies the selected transformation, which includes items selected from a group of actions including deleting selected tokens from the training token sequence, inserting additional tokens into the training token sequence, and reordering a selected subset of tokens in the training token sequence.
- A computer system according to claim 11, The sequence analyzer is further configured to determine a token prediction indicator according to the modified token sequence, the token prediction indicator indicating the possibility that a selected token in the modified token sequence has been altered by the application of the selected transformation. Adjusting the at least one adjustable parameter includes further adjusting the at least one adjustable parameter in accordance with the token prediction indicator. Computer system.
- A computer system according to claim 11, The aforementioned training token sequence and target token sequence include text constructed in natural language. The at least one hardware processor is further configured, in response to determining whether the target token sequence is abnormal, to determine , if the target token sequence is abnormal, that the creator of the target token sequence is different from the creator of the training token sequence. Computer system.
- A computer system according to claim 11, The aforementioned training token sequence and target token sequence include text constructed in natural language. The at least one hardware processor is further configured, in response to determining whether the target token sequence is abnormal, to determine , if the target token sequence is abnormal, that the subject of the target token sequence is different from the subject of the training token sequence. Computer system.
- A computer system according to claim 11, The aforementioned training token sequence and target token sequence include text constructed in natural language. The at least one hardware processor is further configured to determine, in response to determining whether the target token sequence is abnormal, that if the target token sequence is abnormal, the target token sequence was machine-generated. Computer system.
- A computer system according to claim 11, The aforementioned training corpus includes text fragments selected according to selection criteria, The at least one hardware processor is further configured to determine, in response to determining whether the target token sequence is abnormal, that if the target token sequence is abnormal, the target token sequence does not satisfy the selection criteria. Computer system.
- A computer system according to claim 11, The aforementioned training token sequence and target token sequence include a sequence of computing events, The at least one hardware processor is further configured, in response to determining whether the target token sequence is abnormal, to determine that the target token sequence indicates a computer security threat if the target token sequence is abnormal. Computer system.
Description
[0001] The present invention relates to artificial intelligence, and more particularly to a system and method for automatically detecting anomalies in data for application fields of natural language processing and computer security. [0002] Artificial intelligence (AI) and machine learning technologies are increasingly being used to process large amounts of data, particularly in application areas such as pattern recognition, automatic classification, and anomaly detection. Anomaly detection involves identifying specimens that deviate significantly from a standard or "normal" collectively defined by a group of criteria. Anomaly detection can present considerable technical challenges in the case of complex data, where the meaning and boundaries of normality may not be clear or defined beforehand. Modern artificial intelligence systems (e.g., deep neural networks) have been shown to perform well to such challenges by using their ability to automatically infer sophisticated models from data. [0003] However, implementing machine learning to train anomaly detectors presents its own set of technical challenges. Some conventional methods can result in extremely computationally expensive training, require very large training corpora, and be unstable and/or inefficient. Therefore, considerable interest has been directed towards developing novel detector architectures and methods for training anomaly detectors in the application areas of natural language processing and computer security. [0004] According to one embodiment, a computer-implemented anomaly detection method includes, in response to selecting a training token sequence from a training corpus of token sequences and selecting a transformation from a predetermined set of sequence transformations, using at least one hardware processor of a computer system to apply the selected transformation to the training token sequence to generate a modified token sequence. The method further includes running a sequence analyzer having a set of tunable parameters and configured to determine a transformation prediction indicator according to the modified token sequence, wherein the transformation prediction indicator indicates the possibility that the selected transformation has been applied to generate the modified token sequence. The method further includes, in response to determining the prediction indicator, adjusting at least one parameter from the set of tunable parameters according to the transformation prediction indicator, and in response to adjusting at least one parameter, using the sequence analyzer to determine whether the target token sequence is anomaly. [0005] In another embodiment, the computer system comprises at least one hardware processor configured to apply a selected transformation to the training token sequence to generate a modified token sequence in response to selecting a training token sequence from a training corpus of token sequences and selecting a transformation from a predetermined set of sequence transformations. The at least one hardware processor is further configured to run a sequence analyzer having a set of tunable parameters and configured to determine a transformation prediction indicator according to the modified token sequence, wherein the transformation prediction indicator indicates the possibility that the selected transformation was applied to generate the modified token sequence. The at least one hardware processor is further configured to adjust at least one parameter of the set of tunable parameters according to the transformation prediction indicator in response to determining the prediction indicator, and to use the sequence analyzer to determine whether the target token sequence is abnormal in response to adjusting at least one parameter. [0006] In another embodiment, when a non-temporary computer-readable medium is executed by at least one hardware processor of a computer system, it stores instructions that cause the computer system to apply a selected transformation to the training token sequence to generate a modified token sequence, in response to selecting a training token sequence from a training corpus of token sequences and selecting a transformation from a predetermined set of sequence transformations. The instructions further cause the computer system to execute a sequence analyzer having a set of tunable parameters and configured to determine a transformation prediction indicator according to the modified token sequence, the transformation prediction indicator indicating the possibility that the selected transformation has been applied to generate the modified token sequence. The instructions further cause the computer system to adjust at least one parameter of the set of tunable parameters according to the transformation prediction indicator in response to determining the prediction indicator, and to use the sequence analyzer to determine whether the target token sequence is abnormal in response to adjusting at l