US-12625957-B2 - Known deployed file metadata repository and analysis engine
Abstract
A known-deployed file metadata repository (KDFMR) and analysis engine enumerates reference lists of files stored on a software delivery point (SDP) and compares the enumerated list of files and associated metadata to previously stored values in the KDFMR. If newly stored or modified files are identified, the analysis engine acquires the files from the SDP. Each file is analyzed to determine whether the file is an atomic file or a container file and metadata is generated or extracted. Each file stored in a container file is recursively extracted and analyzed, where metadata is generated for each extracted file and each container file. The KDFMR periodically analyzes the files stored on the SDP for differences to maintain the currency of the KDFMR data with respect to files stored on the SDP. Storage or modification of files on the SDP triggers analysis of the associated file. KDFMR data is updated with metadata determined based on sandbox detonation of files and/or identified artifacts of known-deployed files.
Inventors
- Dan E. Summers
- Jeffrey Texada
- Matthew E Kelly
- Steven DiMaria
Assignees
- BANK OF AMERICA CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20240219
Claims (20)
- 1 . A known-deployed file metadata analysis server comprising: a processor; and non-transitory memory storing instructions that, when executed by the processor, causes the known-deployed file metadata analysis server to: enumerate, based on an indication received from a software distribution point (SDP) computing system that indicates at least one file has been created, available files stored on the SDP computing system, wherein the files stored on the SDP comprise an indication that the file is “known-deployed” and wherein the indication that the file is known-deployed comprises information identifying an associated file as being both approved and having been centrally-deployed from the SDP; compare enumerated files to logical paths associated with the SDP computing system to identify one or more new files; retrieve, from the SDP computing system via a network, the one or more new files; extract metadata from each of the one or more new files; identify a match of metadata of a file of the one or more new files and metadata stored in a data store comprising information stored of “known-good” files; enrich the matched metadata with an indication that the file is “known-deployed”, wherein the indication comprises artifact metadata only available during dynamic execution of a particular entry of an artifact by the known-deployed file metadata analysis server, and wherein the artifact metadata comprises an indication of the SDP and an indication that the file was only introduced to a host through methods associated with approved software distribution practices from the SDP and wherein artifact file metadata is enriched with labels to identify a functionality associated with the artifact; and trigger synchronization of stored metadata to an external computing system based on enrichment of the matched metadata; and cause, based on the synchronization, analysis of newly saved files by the SDP using synchronized metadata to identify potentially malicious content.
- 2 . The known-deployed file metadata analysis server of claim 1 , wherein the instructions further cause the known-deployed file metadata analysis server to: calculate a first cryptographic hash of a topmost container of a container file and a second cryptographic hash of an immediate second container adjacent the topmost container; and halt recursive file extraction and metadata generation for the container file based on an indication of a match between the first cryptographic hash or the second cryptographic hash to metadata stored in the data store.
- 3 . The known-deployed file metadata analysis server of claim 1 , wherein the instructions further cause the known-deployed file metadata analysis server to schedule analysis of the files stored on the SDP on a periodic basis.
- 4 . The known-deployed file metadata analysis server of claim 1 , wherein the instructions further cause the known-deployed file metadata analysis server to trigger analysis of the files stored on the SDP based on an indication that a modified file has been saved.
- 5 . The known-deployed file metadata analysis server of claim 1 , wherein the instructions further cause the known-deployed file metadata analysis server to trigger analysis of the files stored on the SDP based on an indication that a new file has been saved.
- 6 . The known-deployed file metadata analysis server of claim 5 , wherein the instructions further cause the known-deployed file metadata analysis server to trigger analysis of the new file based on the indication that the new file has been saved.
- 7 . The known-deployed file metadata analysis server of claim 1 , wherein the instructions further cause the known-deployed file metadata analysis server to enrich the file metadata stored in the data store with semantic labels to identify whether a file is known to be used for adversary purposes.
- 8 . The known-deployed file metadata analysis server of claim 1 , wherein the instructions further cause the known-deployed file metadata analysis server to enrich the file metadata stored in the data store with semantic labels to identify whether a file serves a specific purpose within an enterprise computing environment and wherein the specific purpose corresponds to a production network for support of release software products.
- 9 . A method comprising: identifying, based on a signal indicating storage of a file a software distribution point (SDP) computing system, wherein the signal is automatically generated by the SDP computing system based on saving of at least one file, wherein the files stored on the SDP comprises an indication that the file is “known-deployed” that comprises information identifying an associated file as being approved and centrally-deployed from the SDP; retrieving, based on a comparison of enumerated files on the SDP computing system to logical paths associated with the SDP computing system, one or more new files; extracting, by a known deployed file metadata analysis engine, metadata from each of the one or more new files; identifying, by the known deployed file metadata analysis engine, a match of metadata of the one or more new files and metadata stored in a data store comprising information stored of “known-good” files that comprises an indication of one or more of an associated internal development group and a trusted vendor providing “safe” applications; enriching, by the known deployed file metadata analysis engine, the matched metadata with an indication that the file is “known-deployed” and comprising artifact metadata only available during dynamic execution of a particular entry of an artifact by the known deployed file metadata analysis engine and comprising an indication of the SDP and that the file was introduced to a host through methods associated with approved software distribution practices and from the SDP; and triggering, by the known deployed file metadata analysis engine, synchronization of stored metadata to an external computing system based on the enrichment of the matched metadata; and causing, based on the synchronization, analysis of newly saved files by the SDP using synchronized metadata to identify potentially malicious content.
- 10 . The method of claim 9 , further comprising recursively extracting metadata from each file stored in a container file of the one or more new files by: calculating a first cryptographic hash of a topmost container of the container file and a second cryptographic hash of an immediate second container adjacent the topmost container; and halting, by the known deployed file metadata analysis engine, recursive file extraction and metadata generation for the container file based on an indication of a match between the first cryptographic hash or the second cryptographic hash to metadata stored in the data store.
- 11 . The method of claim 9 , comprising scheduling, by the known deployed file metadata analysis engine, analysis of the files stored on the SDP on a periodic basis.
- 12 . The method of claim 9 , comprising triggering, by the known deployed file metadata analysis engine, analysis of the files stored on the SDP based on an indication that a modified file has been saved.
- 13 . The method of claim 9 , comprising triggering, by the known deployed file metadata analysis engine, analysis of the files stored on the SDP based on an indication that a new file has been saved.
- 14 . The method of claim 9 , comprising: receiving, via a computing network, an electronic message comprising a hash value of a file; identifying, by the known deployed file metadata analysis engine based on receipt of the electronic message, metadata associated with files stored in a known-deployed file data store, wherein the files correspond to a match of the received hash value; sending, via the computing network by the known deployed file metadata analysis engine and based on an identified match, a first electronic response message comprising an indication of a confirmed match to the received hash value; and sending, via the computing network by the known deployed file metadata analysis engine and based on a failure to identify a match to the received hash value, a second electronic response message comprising an indication of a failure to match the received hash value.
- 15 . The method of claim 9 , comprising enriching the file metadata stored in the data store with semantic labels to identify whether a file is known to be used for adversary purposes.
- 16 . The method of claim 9 , comprising enriching the file metadata stored in the data store with semantic labels to identify whether a file serves a specific purpose within an enterprise computing environment corresponding to one of a release purpose or a development purpose.
- 17 . A system comprising: a software distribution point (SDP) computing system comprising memory storing a plurality of enumerated files; a known-deployed file metadata analysis server comprising: a processor; and non-transitory memory storing instructions that, when executed by the processor, causes the known-deployed file metadata analysis server to: compare enumerated files to logical paths associated with the SDP computing system to identify one or more new files; extract metadata from each of the one or more new files, wherein the files stored on the SDP comprises an indication that the file is “known-deployed” that comprises information identifying an associated file as being approved and centrally-deployed from the SDP; identify a match of metadata of a file of the one or more new files and metadata stored in a data store comprising information stored of “known-good” files; enrich the matched metadata with an indication that the file is “known-deployed”, wherein the indication comprises artifact metadata only available during dynamic execution of a particular entry of an artifact by a known deployed file metadata analysis engine and comprising an indication of the SDP and that the file was introduced to a host through methods associated with approved software distribution practices; and trigger synchronization of stored metadata to an external computing system based on the enrichment of the matched metadata; and cause, based on the synchronization, analysis of newly saved files by the SDP using synchronized metadata to identify potentially malicious content.
- 18 . The system of claim 17 , wherein the instructions further cause the known-deployed file metadata analysis server to: calculate a first cryptographic hash of a topmost container of a container file and a second cryptographic hash of an immediate second container adjacent the topmost container; and halt recursive file extraction and metadata generation for the container file based on an indication of a match between the first cryptographic hash or the second cryptographic hash to metadata stored in the data store.
- 19 . The system of claim 17 , wherein the instructions further cause the known-deployed file metadata analysis server to trigger analysis of the files stored on the SDP based on an indication that a modified file has been saved.
- 20 . The system of claim 17 , wherein the instructions further cause the known-deployed file metadata analysis server to enrich the file metadata stored in the data store with semantic labels to identify whether a file is known to be used for adversary purposes.
Description
CROSS REFERENCE TO RELATED APPLICATION(S) This application is a continuation of and claims priority to U.S. application Ser. No. 17/319,638 entitled “Known-Deployed File Metadata Repository and Analysis Engine” filed on May 13, 2021, which is incorporated by reference in its entirety. BACKGROUND In an attempt to keep ahead of enterprise security measures, attackers continually adapt their methods in an attempt to keep ahead of the ability of enterprise network security procedures. Because sophisticated means of intercepting encrypted files are currently available, perpetrators may focus on alternative ways of avoiding data security. Often, Cyber-security detection and response processes involve differentiating permissible, expected or otherwise benign behaviors from various observed behaviors that are generated by an adversary and/or by an insider threat. On a host endpoint computing device under review this may be difficult. Often, this may require that security systems to determine which artifacts (e.g., binary files, scripts, and the like) have been introduced to the system by an outside malicious user or which artifacts have been leveraged for malign intent such as by using so-called “Living off the Land Binaries and Scripts” (LOLBAS) against a background of pre-existing and non-relevant artifacts. SUMMARY The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosure. The summary is not an extensive overview of the disclosure. It is neither intended to identify key or critical elements of the disclosure nor to delineate the scope of the disclosure. The following summary merely presents some concepts of the disclosure in a simplified form as a prelude to the description below. Aspects of the disclosure provide effective, efficient, scalable, and convenient technical solutions that address and overcome the technical problems associated with identification and deployment of files that are known to be approved and centrally deployed within that environment. A known-deployed file metadata repository (KDFMR) and analysis engine enumerates reference lists of files stored on a software delivery point (SDP) and compares the enumerated list of files and associated metadata to previously stored values in the KDFMR. If newly stored or modified files are identified, the analysis engine acquires the files from the SDP. Each file is analyzed to determine whether the file is an atomic file or a container file and metadata is generated or extracted. Each file stored in a container file is recursively extracted and analyzed, where metadata is generated for each extracted file and each container file. The KDFMR periodically analyzes the files stored on the SDP for differences to maintain the currency of the KDFMR data with respect to files stored on the SDP. Storage or modification of files on the SDP triggers analysis of the associated file. KDFMR data is updated with metadata determined based on sandbox detonation of analyzed files. These features, along with many others, are discussed in greater detail below. BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which: FIG. 1 shows an illustrative computing environment implementing a known-deployed file repository and analysis system in accordance with one or more aspects described herein; FIG. 2 shows an illustrative method for generation and use of a known-deployed file repository in accordance with one or more aspects described herein; FIG. 3 shows an illustrative operating environment in which various aspects of the disclosure may be implemented in accordance with one or more aspects described herein; and FIG. 4 shows an illustrative block diagram of workstations and servers that may be used to implement the processes and functions of certain aspects of the present disclosure in accordance with one or more aspects described herein. DETAILED DESCRIPTION In the following description of various illustrative embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown, by way of illustration, various embodiments in which aspects of the disclosure may be practiced. It is to be understood that other embodiments may be utilized, and structural and functional modifications may be made, without departing from the scope of the present disclosure. It is noted that various connections between elements are discussed in the following description. It is noted that these connections are general and, unless specified otherwise, may be direct or indirect, wired or wireless, and that the specification is not intended to be limiting in this respect. As used throughout this disclosure, computer-executable “software and data” can include one or more: algorithms, applications, application program interfaces (APIs), attachments, big data, da