US-12625779-B2 - Automated reconstruction and attribution of data modifications
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for automated data modification reconstruction and attribution. One of the methods includes determining two or more net modifications between a first backup and a second backup; attributing, for each of the two or more net modifications, the net modification to an entity from a plurality of entities; determining, for an event of interest and using first data that indicates the net modifications, a likelihood that a modification during the event of interest is attributable to a first entity; determining whether the likelihood satisfies a likelihood criterion; and performing, in response to determining that the likelihood satisfies the likelihood criterion, an action for the event of interest using second information for the first entity and the modification.
Inventors
- Eoghan Casey
- Jason K. S. Choy
Assignees
- SALESFORCE, INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20230928
Claims (20)
- 1 . A method comprising: determining, using a first backup of information that was stored in a storage system during a first time period and a second backup of information that was stored in the storage system during a second time period, two or more net modifications between the first backup and the second backup; attributing, for each of the two or more net modifications, the net modification to an entity from a plurality of entities and that made a most recent change reflected in the net modification, wherein the attributing is performed based on a schema indicating which entities of the plurality of entities have access to one or more accounts and one or more sets of information stored in the storage system; determining, for an event of interest and using first data that indicates the net modifications attributed to each of the corresponding entities, a likelihood that a modification during the event of interest is attributable to a first entity from the plurality of entities, wherein the likelihood that the modification is attributable to the first entity is calculated based on a ratio of contributing and detracting attribution factors corresponding to the first entity, and wherein the event of interest comprises an access to information in the storage system; determining whether the likelihood satisfies a likelihood criterion; and performing, in response to determining that the likelihood satisfies the likelihood criterion, an action for the event of interest using the second backup of information for the first entity and the modification.
- 2 . The method of claim 1 , wherein determining the likelihood comprises: detecting, from the plurality of entities and using the first data that indicates the net modifications attributed to each of the corresponding entities, a subset of entities that each made at least one modification from the two or more net modifications between the first backup and the second backup; and determining, for two or more entities in the subset of entities, a corresponding likelihood that the modification during the event of interest is attributable to the corresponding entity.
- 3 . The method of claim 2 , wherein the likelihood criterion comprises a likelihood threshold or a highest likelihood from the corresponding likelihoods for the subset of entities.
- 4 . The method of claim 2 , wherein detecting the subset of entities comprises detecting the subset of entities that each likely made at least one modification during the event of interest.
- 5 . The method of claim 1 , further comprising: determining, for the event of interest and using the first data that indicates the net modifications attributed to each of the corresponding entities, a second likelihood that a second modification during the event of interest is attributable to a second entity from the plurality of entities; determining whether the second likelihood satisfies the likelihood criterion; and using a result of the determination whether the second likelihood satisfies the likelihood criterion, determining whether to perform a second action for the event of interest using third data for the second entity and the second modification.
- 6 . The method of claim 1 , wherein performing the action comprises: restoring, to a current version of the storage system and using the second backup of information for the first entity and the modification, information that was modified by the modification.
- 7 . The method of claim 1 , comprising: determining, for each of one or more entity attribution factors, whether the corresponding factor applies to the likelihood that the modification is attributable to the first entity, wherein determining the likelihood that the modification during the event of interest is attributable to the first entity from the plurality of entities uses a result of the determination, for each of the one or more entity attribution factors, whether the corresponding factor applies to the likelihood that the modification is attributable to the first entity.
- 8 . The method of claim 7 , wherein determining whether the corresponding factor applies includes: determining a value that represents the one or more entity attribution factors; and determining the likelihood that the modification during the event of interest is attributable to the first entity from the plurality of entities using the value that represents the one or more entity attribution factors.
- 9 . The method of claim 7 , further comprising: selecting, using at least one of a system for the backup or a context for the backup and from a plurality of entity attribute factors, the one or more entity attribution factors.
- 10 . The method of claim 7 , wherein determining whether the corresponding factor applies includes: determining at least one of whether a single entity has been given permission to access an account, whether the account was accessed from a single device during the event of interest, or whether access to the account is limited to the single entity.
- 11 . The method of claim 7 , wherein determining, for each of the one or more entity attribution factors, whether the corresponding factor applies to the likelihood that the modification is attributable to the first entity comprises: determining whether one or more other modifications likely performed by the first entity satisfy a similarity threshold for the modification.
- 12 . A non-transitory computer-readable medium having instructions stored thereon that are capable of causing a computing device to implement operations comprising: determining, using a first backup of information that was stored in a storage system during a first time period and a second backup of information that was stored in the storage system during a second time period, two or more net modifications between the first backup and the second backup; attributing, for each of the two or more net modifications, the net modification to an entity from a plurality of entities and that made a most recent change reflected in the net modification, wherein the attributing is performed based on a schema indicating which entities of the plurality of entities have access to one or more accounts and one or more sets of information stored in the storage system; determining, for an event of interest and using first data that indicates the net modifications attributed to each of the corresponding entities, a likelihood that a modification during the event of interest is attributable to a first entity from the plurality of entities, wherein the likelihood that the modification is attributable to the first entity is calculated based on a ratio of contributing and detracting attribution factors corresponding to the first entity, and wherein the event of interest comprises an access to information in the storage system; determining whether the likelihood satisfies a likelihood criterion; and performing, in response to determining that the likelihood satisfies the likelihood criterion, an action for the event of interest using second information for the first entity and the modification.
- 13 . The non-transitory computer-readable medium of claim 12 , wherein determining the likelihood comprises: detecting, from the plurality of entities and using the first data that indicates the net modifications attributed to each of the corresponding entities, a subset of entities that each made at least one modification from the two or more net modifications between the first backup and the second backup; and determining, for two or more entities in the subset of entities, a corresponding likelihood that the modification during the event of interest is attributable to the corresponding entity.
- 14 . The non-transitory computer-readable medium of claim 13 , wherein the likelihood criterion comprises a likelihood threshold or a highest likelihood from the corresponding likelihoods for the subset of entities.
- 15 . The non-transitory computer-readable medium of claim 13 , wherein detecting the subset of entities comprises detecting the subset of entities that each likely made at least one modification during the event of interest.
- 16 . The non-transitory computer-readable medium of claim 12 , wherein the operations further comprise: determining, for the event of interest and using the first data that indicates the net modifications attributed to each of the corresponding entities, a second likelihood that a second modification during the event of interest is attributable to a second entity from the plurality of entities; determining whether the second likelihood satisfies the likelihood criterion; and using a result of the determination whether the second likelihood satisfies the likelihood criterion, determining whether to perform a second action for the event of interest using third data for the second entity and the second modification.
- 17 . The non-transitory computer-readable medium of claim 12 , wherein performing the action comprises: restoring, to a current version of the storage system and using the second backup of information for the first entity and the modification, information that was modified by the modification.
- 18 . The non-transitory computer-readable medium of claim 12 , wherein the operations comprise: determining, for each of one or more entity attribution factors, whether the corresponding factor applies to the likelihood that the modification is attributable to the first entity, wherein determining the likelihood that the modification during the event of interest is attributable to the first entity from the plurality of entities uses a result of the determination, for each of the one or more entity attribution factors, whether the corresponding factor applies to the likelihood that the modification is attributable to the first entity.
- 19 . The non-transitory computer-readable medium of claim 18 , wherein determining whether the corresponding factor applies includes: determining a value that represents the one or more entity attribution factors; and determining the likelihood that the modification during the event of interest is attributable to the first entity from the plurality of entities using the value that represents the one or more entity attribution factors.
- 20 . A system comprising one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: determining, using a first backup of information that was stored in a storage system during a first time period and a second backup of information that was stored in the storage system during a second time period, two or more net modifications between the first backup and the second backup; attributing, for each of the two or more net modifications, the net modification to an entity from a plurality of entities and that made a most recent change reflected in the net modification, wherein the attributing is performed based on a schema indicating which entities of the plurality of entities have access to one or more accounts and one or more sets of information stored in the storage system; determining, for an event of interest and using first data that indicates the net modifications attributed to each of the corresponding entities, a likelihood that a modification during the event of interest is attributable to a first entity from the plurality of entities, wherein the likelihood that the modification is attributable to the first entity is calculated based on a ratio of contributing and detracting attribution factors corresponding to the first entity, and wherein the event of interest comprises an access to information in the storage system; determining whether the likelihood satisfies a likelihood criterion; and performing, in response to determining that the likelihood satisfies the likelihood criterion, an action for the event of interest using second information for the first entity and the modification.
Description
BACKGROUND Various systems can create a backup of information stored on the system that captures an historical snapshot in time. For instance, a system that includes a database can backup data or metadata from the database. Similarly, a system that includes digital files can backup information from the system. SUMMARY Sometimes an event can impact information stored in electronic form, over a network, e.g., cloud environments. Some examples of events of interest can include data destruction or corruption, information theft or encryption, or a combination of these. These types of events can generally be referred to as information accesses, which can include information modifications. Although some of the examples described throughout this specification generally refer to information modifications, similar examples apply to other types of information access. When an event occurs, the observed modifications (e.g., additions, alterations, or deletions) indicate what information was impacted, and can be used to, at least partially, reconstruct what happened, whether the event was intentional or inadvertent, or directly or indirectly affected the information. In some examples, the observed modifications can be used to restore information affected by an event of interest as part of the reconstruction process. A reconstruction system can reconstruct an event of interest. A reconstructed event can include details indicating one or more operations performed on information stored in memory, an attribution of one or more operations to one or more entities (e.g., users, businesses, automations, robots, animals), a score indicating a likelihood that the attribution of one or more operations to an entity is correct, or a combination of these. Operations can be performed by entities through accounts. Because some accounts can be accessible by more than one entity, and some entities can switch accounts, the systems and techniques described in this specification can use contextual data to determine which of the one or more entities are responsible for a given modification. The reconstruction system can extract details about modifications of information stored in cloud environments. A cloud environment can store information for one or more components, e.g., storage systems. Although discussed from the perspective of historical information for a single component, the reconstruction system can perform similar operations for multiple different components that use the same cloud environment, different cloud environments, or a combination of both. Although some examples are discussed with respect to a cloud environment, similar processes can be performed for other types of backup systems. History details extracted from a cloud environment can include a specific modification to certain information and can identify one or more accounts or entities that performed operations causing the specific modification—e.g., by an identifier of an account. In general, data, which can include metadata, about modifications of information stored in cloud environments do not necessarily provide a full context of what occurred leading up to one or more information modifications. For example, details about modifications of cloud information may not include whether a given entity switched to another account before modifying information. In some cases, given discrete backups of information, a system does not have complete knowledge of all individual modifications to the information that occurred between backups but rather only two different instances of the information, which could have been modified by multiple entities at multiple different times between the two backups, e.g., using corresponding accounts. As a result, a system might not have a complete timeline of all modifications to the information. With uncertainties in modifications, an entity that caused an event of interest might not be evident from the stored data about cloud information modifications. To more accurately determine a source of a modification or other information accesses, e.g., an entity that caused an event of interest, systems described in this specification can reconstruct an entity-centric sequence of events of modifications within an information storage system. The systems can use such entity-centric reconstruction to link one or more modifications to an entity. For instance, instead of a timeline that indicates modifications made to particular information at particular times, the reconstruction system can attribute modifications to specific entities with access to one or more accounts associated with the modifications. The reconstruction system can create a sequence of events that shows what modifications to information were likely made by which entity and the likelihood of such attributions being accurate. The reconstruction system can generate, for any particular entity or account, a sequence of events using one or more of: (i) past modifications of information performe