US-12627695-B1 - Malware detection using generative artificial intelligence and threat knowledge database
Abstract
Cyber security events are detected at modules on a vehicle and are passed to a cyber security platform. A generative AI model generates a natural language summary of each event and then groups these events based upon the natural language summaries. The model produces a natural language summary for each group. Groups are linked together based upon their natural language summaries and the model produces a natural language summary for each set of linked groups. The model attempts to match the summary of each set of linked groups to incident cases in a threat knowledge database; a match generates an incident alert. A partial match generates an early warning indicating which group of events is not present in the set of linked groups. No matching to any incident alert may also occur in which case no alert is output thus reducing false positives. The AI model uses retrieval augmented generation.
Inventors
- Shih-Han Hsu
- Chih-Wei Su
- Wei-Jen Chang
Assignees
- VicOne Corporation
Dates
- Publication Date
- 20260512
- Application Date
- 20240702
Claims (15)
- 1 . A method of generating a cyber security early warning, said method comprising: inputting, into a generative AI model, a plurality of cyber security events originating at a vehicle and generating a natural language summary of each event; placing said events into groups based upon similarities between said natural language summaries of said events and generating a natural language summary of each group; placing said groups into sets of linked groups based upon similarities between said natural language summaries of said groups and generating a natural language summary of each set of linked groups; partially matching a natural language summary of one of said sets of linked groups to a natural language incident case in a threat database, wherein said incident case includes a group of related events that does not match one of said groups in said one of said sets of linked groups; and outputting an early warning for said vehicle based upon said incident case, thereby improving operation of a cybersecurity system by enabling early identification of malware activity while suppressing generation of false-positive malware detection alerts.
- 2 . The method as recited in claim 1 further comprising: using said generative AI model to perform said placing said events, said placing said groups, and said matching.
- 3 . The method as recited in claim 2 wherein said generative AI model uses retrieval augmented generation (RAG) to perform said placing said events, said placing said groups and said partially matching.
- 4 . The method as recited in claim 1 wherein said cyber security events are in detection logs obtained from cyber security modules of said vehicle.
- 5 . The method as recited in claim 1 wherein said placing said groups is performed based upon predefined similarities are based upon similarities suggested by said generative AI model.
- 6 . The method as recited in claim 1 wherein said output early warning includes a natural language summary of said incident case indicating that said group of related events is not present in said vehicle.
- 7 . The method as recited in claim 1 wherein said placing said events into groups further comprises: aggregating cyber security events from the same source that occur at different times.
- 8 . The method as recited in claim 1 wherein said plurality of cyber security events are occurring at the vehicle, and wherein said placing steps, said partially matching, and said outputting are performed in response to occurrence of said plurality of cyber security events.
- 9 . A method of not generating a cyber security incident alert, said method comprising: inputting, into a generative AI model, a plurality of cyber security events originating at a vehicle and generating a natural language summary of each event; placing said events into groups based upon similarities between said natural language summaries of said events and generating a natural language summary of each group; placing said groups into sets of linked groups based upon similarities between said natural language summaries of said groups and generating a natural language summary of each set of linked groups; attempting to match a natural language summary of one of said sets of linked groups to natural language incident cases in a threat database; determining that none of said sets of linked groups matches any incident cases in said threat database; and outputting an indication in response to said determining that said cyber security events of said vehicle do not match any incident cases in said threat database, thereby improving operation of a cybersecurity system by suppressing generation of false-positive malware detection alerts.
- 10 . The method as recited in claim 9 further comprising: using said generative AI model to perform said placing said events, said placing said groups, and said matching.
- 11 . The method as recited in claim 10 wherein said generative AI model uses retrieval augmented generation (RAG) to perform said placing said events, said placing said groups and said attempting to match.
- 12 . The method as recited in claim 9 wherein said cyber security events are in detection logs obtained from cyber security modules of said vehicle.
- 13 . The method as recited in claim 9 wherein said placing said groups is performed based upon predefined similarities are based upon similarities suggested by said generative AI model.
- 14 . The method as recited in claim 9 wherein said attempting uses an asset list of components of said vehicle to determine that said cyber security events of said vehicle do not match any incident cases in said threat database.
- 15 . The method as recited in claim 9 wherein said plurality of cyber security events are occurring at the vehicle, and wherein said placing steps, said attempting to match, said determining, and said outputting are performed in response to occurrence of said plurality of cyber security events.
Description
FIELD OF THE INVENTION The present invention relates generally to the detection of malicious events in a computing environment. More specifically, the present invention relates to using artificial intelligence to identify the relevance between detected events and known malicious incidents. BACKGROUND OF THE INVENTION A connected vehicle can communicate with devices or systems that are external to the vehicle. Most new vehicles on the market today are connected in that they have components that can perform external communication by wireless or wired connection. A connected vehicle may also have sensors for receiving sensed data of its physical environment. Connected vehicles typically have a plurality of electronic control units (ECUs) that perform various functions. For example, a connected vehicle may have an ECU for a central gateway, an ECU for in-vehicle information and entertainment, an ECU for engine management, etc. ECUs are computers with software and hardware components. More particularly, an ECU has a processor that executes software components, such as an operating system, application programs, and firmware. Cyber security, within the context of connected vehicles, is the protection of automotive electronic systems, communication networks, control algorithms, software, users, and underlying data from malicious attacks, damage, unauthorized access, or manipulation. Connected vehicles are susceptible to cyber attacks, which include unauthorized intrusion, malware infection, etc. Unfortunately, traditional information technology (IT) cybersecurity measures are not readily adaptable to connected vehicles because a typical ECU is not as powerful as computers employed in the general IT environment. Furthermore, connected vehicles have different attack surfaces than general IT environments. For instance, even though detection logs of suspect events (perhaps indicating a cyber attack) are produced continuously from the ECUs and other locations within the vehicle, it can prove difficult to sort through the large quantity of suspect events in order to generate actionable alerts that are not false positives. Current techniques rely upon determining the similarity of detected suspect events using techniques such as text embedding, TLSH, and other algorithms. De-duplication can then be used to remove certain similar events; however, a significant human effort is still needed to investigate similar events, provide a category, and finally provide feedback in the form of related incident cases that may indicate actual malware. Unfortunately, the current techniques result in too many false positives (i.e., a high FPR), leading to inactionable alerts and alert fatigue on the part of human operators and other recipients. Accordingly, an improved system and methods are desirable. SUMMARY OF THE INVENTION To achieve the foregoing, and in accordance with the purpose of the present invention, a technique is disclosed that uses generative artificial intelligence in order to match detected suspect events with known incidents in a threat knowledge database. In general, the invention is able to group similar detected events into groups, to link related groups, and then to match sets of linked groups to one or more actual incidents in a threat knowledge database, in order to output actionable alerts having a low false-positive rate. In a first embodiment, events are placed into groups, groups are placed into sets of linked groups based upon similarities between natural language summaries of the groups, a natural language summary of one of the sets of linked groups is matched to a natural language incident case in a threat database, and an incident alert for a vehicle is output based upon the incident case. In a second embodiment, events are placed into groups, groups are placed into sets of linked groups based upon similarities between natural language summaries of the groups, a natural language summary of one of the sets of linked groups is partially matched to a natural language incident case in a threat database, and an early warning for a vehicle is output based upon the incident case. In a third embodiment, events are placed into groups, groups are placed into sets of linked groups based upon similarities between natural language summaries of the groups, a natural language summary of one of the sets of linked groups is attempted to be matched to natural language incident cases in a threat database, and no alert for a vehicle is output based upon the attempted matching. These and other features of the present disclosure will be readily apparent to persons of ordinary skill in the art upon reading the entirety of this disclosure, which includes the accompanying drawings and claims. BRIEF DESCRIPTION OF THE DRAWINGS The invention, together with further advantages thereof, may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which: FIG. 1 shows a block diag