US-12619728-B2 - Detecting patient-zero exfiltration attacks on websites using taint tracking
Abstract
An execution environment has been designed that detects likely data exfiltration by using taint tracking and abstract execution. The execution environment is instrumented to monitor for use of functions identified as having functionality for transferring data out of an execution environment. In addition, heuristics-based rules are defined to mark or “taint” objects (e.g., variables) that are likely targets for exfiltration. With taint tracking and control flow analysis, the execution environment tracks the tainted objects through multiple execution paths of a code sample. After comprehensive code coverage, logged use of the monitored functions are examined to determine whether any tainted objects were passed to the monitored functions. If so, the logged use will indicate a destination or sink for the tainted source. Each tainted source-sink association can be examined to verify whether the exfiltration was malicious.
Inventors
- William Russell Melicher
- Mohamed Yoosuf Mohamed Nabeel
- Oleksii Starov
Assignees
- PALO ALTO NETWORKS, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20231120
Claims (20)
- 1 . A method comprising: taint tracking each sensitive source of a first code sample through multiple execution paths with abstract execution of the first code sample to one or more exfiltration sinks, wherein taint tracking each sensitive source comprises; identifying each object in the first code sample that is a sensitive source based, at least in part, on heuristics-based rules; marking each sensitive source for taint tracking; identifying predictable flow sections and branch points with code flow analysis of the first code sample; concretely executing the predictable flow sections and abstractly executing/interpreting the branch points to cover alternative execution paths; for each sensitive source tracked to an exfiltration sink, indicating an association of the sensitive source and the exfiltration sink; and generating an indication of each association as a possible malicious exfiltration.
- 2 . The method of claim 1 , wherein marking each sensitive source for taint tracking comprises one of instantiating a data structure with an entry for each sensitive source and setting a flag for each sensitive source in a data structure that tracks values of objects during execution for each execution path covered with the abstract execution.
- 3 . The method of claim 1 , wherein the heuristics-based rules indicate at least one of a naming pattern, document object model type source, and a cookie type source.
- 4 . The method of claim 1 , wherein the taint tracking comprises executing the first code sample in an execution environment with instrumentation to monitor use of any one of a plurality of functions identified as having functionality for conveying data outside of the execution environment.
- 5 . The method of claim 4 further comprising, for each marked sensitive source, determining whether the sensitive source is passed to one of the plurality of functions.
- 6 . The method of claim 5 , wherein the exfiltration sink is a destination specified in those of the plurality of functions determined to have been passed a sensitive source.
- 7 . The method of claim 1 further comprising, for each association, determining whether the exfiltration sink is malicious.
- 8 . The method of claim 1 further comprising at least one of crawling websites to obtain a plurality of code samples, receiving files with at least some of the plurality of code samples, and receiving one or more uniform resource locators (URLs) and visiting the URLs to obtain at least some of the plurality of code samples, wherein the plurality of code samples includes the first code sample.
- 9 . A non-transitory machine-readable medium having program code stored thereon, the program code comprising instructions to: obtain a first code sample; identify each object in the first code sample that has one or more specified characteristics of an object that could be a sensitive source; taint track each identified object through multiple execution paths of the first code sample in a secure execution environment, wherein the instructions to taint track each identified object through multiple execution paths comprise instructions to, identify predictable flow sections and branch points with code flow analysis of the first code sample; concretely execute the predictable flow sections and abstractly execute/interpret the branch points to cover alternative execution paths; based on the taint tracking, identify each instance of any one of a plurality of monitored functions that has been passed any taint tracked object, wherein each of the plurality of monitored functions would send data outside of an execution environment; determine a destination of each identified monitored function instance; and indicate each determined destination as an exfiltration sink and the corresponding one of the taint tracked objects.
- 10 . The non-transitory machine-readable medium of claim 9 , wherein the program code further comprises instructions to deobfuscate the first code sample prior to taint tracking.
- 11 . The non-transitory machine-readable medium of claim 9 , wherein the program code further comprises instructions to analyze each indicated exfiltration sink to determine whether the exfiltration sink is malicious.
- 12 . The non-transitory machine-readable media of claim 9 , wherein the program code further comprises instructions to at least one of crawl websites to obtain at least some of a plurality of code samples, receive files with at least some of the plurality of code samples, and receive one or more uniform resource locators (URLs) and visit the URLs to obtain at least some of the plurality of code samples, wherein the plurality of code samples includes the first code sample.
- 13 . The non-transitory machine-readable media of claim 9 , wherein the one or more specified characteristics correspond to at least one of a naming pattern, being a document object model component, and being a cookie.
- 14 . The non-transitory, machine-readable medium of claim 9 , wherein the instructions to identify each object that has one or more specified characteristics of an object that could be a sensitive source comprise instructions to tag or mark an identifier of each identified object for taint tracking.
- 15 . An apparatus comprising: a processor; and a non-transitory machine-readable medium having instructions stored thereon, the instructions executable by the processor to cause the apparatus to: obtain a first code sample; identify each object in the first code sample that has one or more specified characteristics of an object that could be a sensitive source; taint track each identified object through multiple execution paths of the first code sample in a secure execution environment, wherein the instructions to taint track each identified object through multiple execution paths comprise instructions to, identify predictable flow sections and branch points with code flow analysis of the first code sample; concretely execute the predictable flow sections and abstractly execute/interpret the branch points to cover alternative execution paths; based on the taint tracking, identify each instance of any one of a plurality of monitored functions that has been passed any taint tracked object, wherein each of the plurality of monitored functions would send data outside of an execution environment; determine a destination of each identified monitored function instance; and indicate each determined destination as an exfiltration sink and the corresponding one of the taint tracked objects.
- 16 . The apparatus of claim 15 , wherein the machine-readable medium further comprises instructions executable by the processor to cause the apparatus to deobfuscate the first code sample prior to taint tracking.
- 17 . The apparatus of claim 15 , wherein the machine-readable medium further comprises instructions executable by the processor to cause the apparatus to analyze each indicated exfiltration sink to determine whether the exfiltration sink is malicious.
- 18 . The apparatus of claim 15 , wherein the machine-readable medium further comprises instructions executable by the processor to cause the apparatus to at least one of crawl websites to obtain at least some of a plurality of code samples, receive files with at least some of the plurality of code samples, and receive one or more uniform resource locators (URLs) and visit the URLs to obtain at least some of the plurality of code samples, wherein the plurality of code samples includes the first code sample.
- 19 . The apparatus of claim 15 , wherein the one or more specified characteristics correspond to at least one of a naming pattern, being a document object model component, and being a cookie.
- 20 . The apparatus of claim 15 , wherein the instructions to identify each object that has one or more specified characteristics of an object that could be a sensitive source comprise instructions executable by the processor to cause the apparatus to tag or mark an identifier of each identified object for taint tracking.
Description
BACKGROUND The disclosure generally relates to anti-malware arrangements and malware detection (e.g., CPC G06F 21/56). Data exfiltration is the unauthorized transfer of data from a host, usually conducted in a discrete manner. Phishing and malware are sometimes used to carry out this type of attack. Examples of malware for data exfiltration includes skimmers. A skimmer or web skimmer is malware embedded in web payment pages that skims customer data. To avoid detection, malicious actors employ obfuscation techniques. Obfuscation techniques are employed for malicious and legitimate purposes. Developers use software packing to conserve space and use data and code obfuscation to protect intellectual property. Some websites use obfuscation to prevent plagiarism. Malicious actors use obfuscation or malware obfuscation to evade detection. One example of a malicious use of an obfuscation technique is randomization, which is the random change in code elements without changing semantics. Another technique is encoding obfuscation. Encoding obfuscation can be done by converting code into escaped American Standard Code for Information Interchange (ASCII) characters, using a custom encoding function and attaching the decoding function, and using standardized encryption and decryption methods. Another malware obfuscation technique is logic structure obfuscation. Logic structure obfuscation changes the logic structure to manipulate execution paths without affecting original semantics of the code. BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the disclosure may be better understood by referencing the accompanying drawings. FIG. 1 depicts a diagram of an execution environment that employs taint tracking through multiple execution paths of code samples to detect likely data exfiltration. FIG. 2 is a flowchart of example operations for taint tracking sensitive sources across multiple execution paths of a code sample to detect possible malicious exfiltration. FIG. 3 is a flowchart of example operations for running a code sample in an instrumented execution environment and obtaining a taint tracking report. FIG. 4 depicts an example computer system with an exfiltration detection tool. DESCRIPTION The description that follows includes example systems, methods, techniques, and program flows to aid in understanding the disclosure and not to limit claim scope. Well-known instruction instances, protocols, structures, and techniques have not been shown in detail for conciseness. Terminology This description uses the term “exfiltration sink” to refer to a destination of an exfiltration. The term is sometimes used in the art to refer to an exfiltrating function and the destination. However, this description narrows the meaning to only the destination to avoid confusion and overloading the term. An “exfiltration endpoint” refers to a function instance and the sink associated with the function instance. Use of the phrase “at least one of” preceding a list with the conjunction “and” should not be treated as an exclusive list and should not be construed as a list of categories with one item from each category, unless specifically stated otherwise. A clause that recites “at least one of A, B, and C” can be infringed with only one of the listed items, multiple of the listed items, and one or more of the items in the list and another item not listed. Overview An execution environment has been designed that detects likely data exfiltration by using taint tracking and abstract execution. The execution environment is instrumented to monitor for use of functions identified as having functionality for transferring data out of an execution environment. In addition, heuristics-based rules are defined to mark or “taint” objects (e.g., variables) that are likely targets for exfiltration. With taint tracking and control flow analysis, the execution environment tracks the tainted objects through multiple execution paths of a code sample. After comprehensive code coverage, logged use of the monitored functions are examined to determine whether any tainted objects were passed to the monitored functions. If so, the logged use will indicate a destination or sink for the tainted source. Each tainted source-sink association can be examined (e.g., with machine learning or against a list of known malicious websites) to verify whether the exfiltration was malicious. Example Illustrations FIG. 1 depicts a diagram of an execution environment that employs taint tracking through multiple execution paths of code samples to detect likely data exfiltration. An execution environment 105 performs concrete and abstract interpretation/execution of code samples. The execution environment 105 identifies points in control flow of a code sample that allow for alternative execution paths. The execution environment 105 then selectively applies abstract execution to these control flow points for comprehensive code coverage. This allows evaluation of code paths that can evade st