US-20260127287-A1 - DETECTING BACKDOORS IN BINARY SOFTWARE CODE
Abstract
Systems, methods, and software can be used to detect backdoors in binary software code. In some aspects, a method comprises: obtaining, by a server, binary software code corresponding to source code; generating, by the server, a backdoor abstraction of the binary software code; and generating, by the server, a backdoor risk assessment based on the backdoor abstraction of the binary software code.
Inventors
- Neil David Jonathan DUGGAN
- VINCENZO Kazimierz MARCOVECCHIO
- Adam John Boulton
Assignees
- BLACKBERRY LIMITED
Dates
- Publication Date
- 20260507
- Application Date
- 20260105
Claims (20)
- 1 . A method, comprising: obtaining, by a server, binary software code corresponding to source code; generating, by the server, a backdoor abstraction of the binary software code by identifying at least one string indicative of a potential backdoor in the binary software code; and generating, by the server, a backdoor risk assessment based on the backdoor abstraction of the binary software code, wherein generating the backdoor risk assessment comprises: determining, by the server, that none of the source code corresponds to the potential backdoor in the backdoor abstraction of the binary software code; and in response to determining that none of the source code corresponds to the potential backdoor, determining that a risk is inserted during a generation of the binary software code corresponding to the source code and including the risk in the backdoor risk assessment.
- 2 . The method of claim 1 , wherein generating, by the server, the backdoor abstraction of the binary software code comprises: determining, by the server, that the binary software code comprises the at least one string; and including, by the server, at least one potential backdoor representation corresponding to the at least one string in the backdoor abstraction of the binary software code.
- 3 . The method of claim 1 , wherein the at least one string comprises at least one of: a Uniform Resource Locator (URL); a hardcoded user credential; or a high entropy string.
- 4 . The method of claim 1 , wherein identifying the at least one string indicative of the potential backdoor in the binary software code comprises: determining a context of the at least one string, wherein the context of the at least one string comprises at least one of a location of the at least one string or a library call associated with the at least one string; and determining, based on the context of the at least one string, that the at least one string indicates the potential backdoor in the binary software code.
- 5 . The method of claim 1 , wherein generating, by the server, the backdoor risk assessment based on the backdoor abstraction of the binary software code comprises: comparing the backdoor abstraction of the binary software code and a backdoor abstraction of the source code.
- 6 . The method of claim 5 , comprising: determining that the potential backdoor is in the backdoor abstraction of the binary software code and not in the backdoor abstraction of the source code; and in response to determining that the potential backdoor is in the backdoor abstraction of the binary software code and not in the backdoor abstraction of the source code, determining that the risk is inserted during the generation of the binary software code.
- 7 . The method of claim 1 , comprising: storing, by the server, the backdoor abstraction of the binary software code as a baseline; obtaining, by the server, additional binary software code; generating, by the server, an additional backdoor abstraction of the additional binary software code; and generating, by the server, an additional backdoor risk assessment based on the baseline and the additional backdoor abstraction.
- 8 . A non-transitory computer-readable medium containing instructions which, when executed, cause a computing device to perform operations comprising: obtaining, by a server, binary software code corresponding to source code; generating, by the server, a backdoor abstraction of the binary software code by identifying at least one string indicative of a potential backdoor in the binary software code; and generating, by the server, a backdoor risk assessment based on the backdoor abstraction of the binary software code, wherein generating the backdoor risk assessment comprises: determining, by the server, that none of the source code corresponds to the potential backdoor in the backdoor abstraction of the binary software code; and in response to determining that none of the source code corresponds to the potential backdoor, determining that a risk is inserted during a generation of the binary software code corresponding to the source code and including the risk in the backdoor risk assessment.
- 9 . The non-transitory computer-readable medium of claim 8 , wherein generating, by the server, the backdoor abstraction of the binary software code comprises: determining, by the server, that the binary software code comprises the at least one string; and including, by the server, at least one potential backdoor representation corresponding to the at least one string in the backdoor abstraction of the binary software code.
- 10 . The non-transitory computer-readable medium of claim 8 , wherein the at least one string comprises at least one of: a Uniform Resource Locator (URL); a hardcoded user credential; or a high entropy string.
- 11 . The non-transitory computer-readable medium of claim 8 , wherein identifying the at least one string indicative of the potential backdoor in the binary software code comprises: determining a context of the at least one string, wherein the context of the at least one string comprises at least one of a location of the at least one string or a library call associated with the at least one string; and determining, based on the context of the at least one string, that the at least one string indicates the potential backdoor in the binary software code.
- 12 . The non-transitory computer-readable medium of claim 8 , wherein generating, by the server, the backdoor risk assessment based on the backdoor abstraction of the binary software code comprises: comparing the backdoor abstraction of the binary software code and a backdoor abstraction of the source code.
- 13 . The non-transitory computer-readable medium of claim 12 , the operations comprising: determining that the potential backdoor is in the backdoor abstraction of the binary software code and not in the backdoor abstraction of the source code; and in response to determining that the potential backdoor is in the backdoor abstraction of the binary software code and not in the backdoor abstraction of the source code, determining that the risk is inserted during the generation of the binary software code.
- 14 . The non-transitory computer-readable medium of claim 8 , the operations comprising: storing, by the server, the backdoor abstraction of the binary software code as a baseline; obtaining, by the server, additional binary software code; generating, by the server, an additional backdoor abstraction of the additional binary software code; and generating, by the server, an additional backdoor risk assessment based on the baseline and the additional backdoor abstraction.
- 15 . A computer-implemented system, comprising: one or more computers; and one or more computer memory devices interoperably coupled with the one or more computers and having tangible, non-transitory, machine-readable media storing one or more instructions that, when executed by the one or more computers, perform one or more operations comprising: obtaining, by a server, binary software code corresponding to source code; generating, by the server, a backdoor abstraction of the binary software code by identifying at least one string indicative of a potential backdoor in the binary software code; and generating, by the server, a backdoor risk assessment based on the backdoor abstraction of the binary software code, wherein generating the backdoor risk assessment comprises: determining, by the server, that none of the source code corresponds to the potential backdoor in the backdoor abstraction of the binary software code; and in response to determining that none of the source code corresponds to the potential backdoor, determining that a risk is inserted during a generation of the binary software code corresponding to the source code and including the risk in the backdoor risk assessment.
- 16 . The computer-implemented system of claim 15 , wherein generating, by the server, the backdoor abstraction of the binary software code comprises: determining, by the server, that the binary software code comprises the at least one string; and including, by the server, at least one potential backdoor representation corresponding to the at least one string in the backdoor abstraction of the binary software code.
- 17 . The computer-implemented system of claim 15 , wherein the at least one string comprises at least one of: a Uniform Resource Locator (URL); a hardcoded user credential; or a high entropy string.
- 18 . The computer-implemented system of claim 15 , wherein identifying the at least one string indicative of the potential backdoor in the binary software code comprises: determining a context of the at least one string, wherein the context of the at least one string comprises at least one of a location of the at least one string or a library call associated with the at least one string; and determining, based on the context of the at least one string, that the at least one string indicates the potential backdoor in the binary software code.
- 19 . The computer-implemented system of claim 15 , wherein generating, by the server, the backdoor risk assessment based on the backdoor abstraction of the binary software code comprises: comparing the backdoor abstraction of the binary software code and a backdoor abstraction of the source code.
- 20 . The computer-implemented system of claim 19 , the operations comprising: determining that the potential backdoor is in the backdoor abstraction of the binary software code and not in the backdoor abstraction of the source code; and in response to determining that the potential backdoor is in the backdoor abstraction of the binary software code and not in the backdoor abstraction of the source code, determining that the risk is inserted during the generation of the binary software code.
Description
CLAIM OF PRIORITY This application claims priority under 35 USC § 120 to U.S. patent application Ser. No. 17/736,417, filed on May 4, 2022, entitled “DETECTING BACKDOORS IN BINARY SOFTWARE CODE”, the entire contents of which are hereby incorporated by reference. TECHNICAL FIELD The present disclosure relates to detecting backdoors in binary software code. BACKGROUND In some cases, software services can be provided by executable binary software code. The binary software code is computer software in a binary format. The computer software can be an application software, a system software (e.g., an operating system or a device driver), or a component thereof. The binary software code can also be referred to as binary code or executable code. DESCRIPTION OF DRAWINGS FIG. 1 is a schematic diagram showing an example communication system that detects backdoors in binary software code, according to an implementation. FIG. 2 is a flowchart showing an example method for detecting backdoors in binary software code, according to an implementation. FIG. 3 is a high-level architecture block diagram of a computer according to an implementation. Like reference numbers and designations in the various drawings indicate like elements. DETAILED DESCRIPTION In some implementations, an attacker can insert malicious code on supplier's build server(s) and subvert the build process to insert a backdoor into the product software release. This type of attack is not done on the input source code, rather at the later stages of compilation and build, and malicious code is injected into the build process at the later stages. As a result, the source code may appear normal, however the binary software code generated by the build process may include the backdoor. This situation makes it hard to detect the backdoor in the released product that includes the binary software code. In some cases, a server can examine and inspect the binary software code for code segments that indicate potential backdoors. The server can collect and classify these potential backdoors with a backdoor abstraction. For each potential backdoor in the backdoor abstraction, the server can parse the source code and determine if any portion of the source code corresponds to the potential backdoor. If no source code corresponds to the potential backdoor, the server can flag this mismatch so the potential backdoor can be investigated to discover if the build process has been compromised. For example, if the binary analysis has found a Hypertext Transfer Protocol Secure (HTTPS) connection to a given Uniform Resource Locator (URL), the source code will be searched to find if there is corresponding legitimate code that uses HTTPS application programming interface (API) and the given URL. FIGS. 1-3 and associated descriptions provide additional details of these implementations. Techniques described herein produce one or more technical effects. In some cases, the techniques can improve efficiencies of identifying backdoors in binary software code. For example, the techniques do not try to recreate the original source code from the binary software code and then compare the recreated source code to the original source code in a blanket way, which is likely time consuming and can produce a high number of false positives. Instead, the techniques leverage binary inspection and knowledge of the characteristics of backdoors to classify the potential backdoors as an abstraction and then backtrack to the source code to determine whether any portion of the source code corresponds to the potential backdoors. Thus, the speed for identifying backdoors in binary software code is enhanced and the number of false positives is reduced. In some cases, the techniques can enhance security of the binary software code by identifying backdoors inserted at the build process. If a backdoor is inserted in the build process, the backdoor will be included in the binary software code but not included in the source code, and thus checking the source code alone cannot identify such backdoor. By identifying potential backdoors in the binary software code, the techniques can detect the backdoors inserted in the build process at the later stages. FIG. 1 is a schematic diagram showing an example communication system 100 that provides data communications for detecting backdoors in binary software code, according to an implementation. At a high level, the example communication system 100 includes a software developer device 102 that is communicatively coupled with a software service platform 106 and a client device 108 over a network 110. In some cases, the software developer device 102 can be part of a software developer environment that includes multiple devices, servers, and cloud computing platforms. The software developer device 102 represents an application, a set of applications, software, software modules, hardware, or any combination thereof, that can be configured to submit the source code and/or the bina