Search

EP-4085363-B1 - CODE-BASED MALWARE DETECTION

EP4085363B1EP 4085363 B1EP4085363 B1EP 4085363B1EP-4085363-B1

Inventors

  • EL-MOUSSA, FADI

Dates

Publication Date
20260506
Application Date
20201218

Claims (8)

  1. A computer implemented method of detecting malware in a received software component (214) comprising: generating a profile (208) for the malware (204) by the steps of: a) accessing (302) machine code for the malware; b) identifying (304) a subset of the machine code for the malware as a logical subroutine (206) of the malware; c) extracting (306) one or more features of the logical subroutine of the malware as the profile (208), the features comprising one or more of: a number of processor registers used in the logical subroutine; an identification of registers used in the logical subroutine; a stack size used in the logical subroutine; and a location or range of locations of a memory region accessed in the logical subroutine, accessing (310) machine code for the received software component (214) to identify (312) a plurality of logical subroutines thereof; extracting (316) one or more features (218) of each logical subroutine of the received software component for comparison (318) with the profile to detect the malware in the received software component (214).
  2. The method of claim 1 wherein a feature of a logical subroutine includes an identification of one or more operating system application programming interface calls in the logical subroutine.
  3. The method of any preceding claim wherein identifying a logical subroutine in machine code includes one or more of: identifying a series of machine code instructions accessed via a jump, branch or conditional machine code instruction; identifying a series of machine code instructions collocated in the machine code; identifying a series of machine code instructions collocated in the machine code and bounded by subroutine identifiers; and executing the machine code and monitoring the execution to trace execution paths through the machine code wherein a repeated series of machine code instructions within an execution path is determined to correspond to a logical subroutine of the machine code.
  4. The method of any preceding claim wherein identifying a logical subroutine in machine code includes disassembling the machine code to an assembler language representation of the machine code.
  5. The method of any preceding claim wherein detection of the malware in the received software component is based on identity of one or more of: a number of registers used in the logical subroutine of each of the received software component and the malware; a stack size used in the logical subroutine of each of the received software component and the malware; a location or range of locations of a memory region accessed in the logical subroutine of each of the received software component and the malware; and an identification of one or more operating system application programming interface calls in the logical subroutine of each of the received software component and the malware.
  6. The method of any preceding claim wherein detection of the malware in the received software component is based on score determined by the comparison in which the score is based on a degree of similarity of any or all of: a number of registers used in the logical subroutine of each of the received software component and the malware; a stack size used in the logical subroutine of each of the received software component and the malware; a location or range of locations of a memory region accessed in the logical subroutine of each of the received software component and the malware; and an identification of one or more operating system application programming interface calls in the logical subroutine of each of the received software component and the malware.
  7. A computer system including a processor (102) and memory (104) storing computer program code for performing the steps of the method of any preceding claim.
  8. A computer program element comprising computer program code to, when loaded into a computer system and executed thereon, cause the computer to perform the steps of a method as claimed in any of claims 1 to 6.

Description

The present invention relates to the detection of malicious software code. Traditional malware detection is based on the generation of signatures of malware code such as by hashing of all or part of known malware to provide a suitable and efficient basis for comparison at malware scanning time. This suffers from missed detection due to minor changes to malware - a single bit change in a malware can result in an entirely different signature and non-detection. Existing approaches to address this challenge can involve modularising malware into smaller components for which signatures are generated such that a granularity of signature generation can be finer. This permits detection of malware where there is wholesale identity within any particular module in dependence on module size, though malware adapts to include minor adjustments throughout the content of the malware to undermine any such granular signature generation. Accordingly, it is beneficial to provide improvements in the detection of malware. In SEXTON, JOSEPH ET AL: "Subroutine based detection of APT malware", JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHIQUES, SPRINGER PARIS, vol. 12, no. 4, 21 December 2015, pages 225-233 (XP036077218), executable code is disassembled and classified using opcode sequences. According to a first aspect of the present invention, there is provided a computer implemented method of detecting malware in a received software component comprising: generating a profile for the malware by the steps of: a) accessing machine code for the malware; b) identifying a subset of the machine code for the malware as a logical subroutine of the malware; c) extracting one or more features of the logical subroutine of the malware as the profile, the features comprising one or more of: a number of processor registers used in the logical subroutine; an identification of registers used in the logical subroutine; a stack size used in the logical subroutine; and a location or range of locations of a memory region accessed in the logical subroutine, accessing machine code for the received software component to identify a plurality of logical subroutines thereof; extracting one or more features of each logical subroutine of the received software component for comparison with the profile to detect the malware in the received software component. Preferably, a feature of a logical subroutine includes an identification of one or more operating system application programming interface calls in the logical subroutine. Preferably, identifying a logical subroutine in machine code includes one or more of: identifying a series of machine code instructions accessed via a jump, branch or conditional machine code instruction; identifying a series of machine code instructions collocated in the machine code; identifying a series of machine code instructions collocated in the machine code and bounded by subroutine identifiers; and executing the machine code and monitoring the execution to trace execution paths through the machine code wherein a repeated series of machine code instructions within an execution path is determined to correspond to a logical subroutine of the machine code. Preferably, identifying a logical subroutine in machine code includes disassembling the Preferably, detection of the malware in the received software component is based on identity of one or more of: a number of registers used in the logical subroutine of each of the received software component and the malware; a stack size used in the logical subroutine of each of the received software component and the malware; a location or range of locations of a memory region accessed in the logical subroutine of each of the received software component and the malware; and an identification of one or more operating system application programming interface calls in the logical subroutine of each of the received software component and the malware. Preferably, detection of the malware in the received software component is based on score determined by the comparison in which the score is based on a degree of similarity of any or all of: a number of registers used in the logical subroutine of each of the received software component and the malware; a stack size used in the logical subroutine of each of the received software component and the malware; a location or range of locations of a memory region accessed in the logical subroutine of each of the received software component and the malware; and an identification of one or more operating system application programming interface calls in the logical subroutine of each of the received software component and the malware. According to a second aspect of the present invention, there is a provided a computer system including a processor and memory storing computer program code for performing the steps of the method set out above. According to a third aspect of the present invention, there is a provided a computer system including a processor and memo