US-20260127143-A1 - RAPID DATABASE SIZING USING METADATA

US20260127143A1US 20260127143 A1US20260127143 A1US 20260127143A1US-20260127143-A1

Abstract

Described herein are systems and techniques to facilitate a rapid database system that automates estimating a database footprint for a database implemented across multiple environments. The system may receive inputs from a user and/or automatically generate inputs. The system may determine database size using various logic-based extraction methods. For instance, operational database resources may be sized using logic-based extraction to search a database catalog of a production database, non-production databases may be sized using logic-based extraction to search database catalog metadata, development database resources may be sized using fuzzy logic queries generated based on production database information (e.g., since development database names are likely to resemble, but be different than, production database names), and associated mirror databases are also determined and measured during these steps. Outputs from the queries are used to generate a database footprint estimate and cost information, which may be transmitted to a user device for display.

Inventors

Eugene Scray

Assignees

STATE FARM MUTUAL AUTOMOBILE INSURANCE COMPANY

Dates

Publication Date: 20260507
Application Date: 20240229

Claims (20)

1 . A method of performing database sizing across dispersed environments, comprising: identifying, by a processor, a database type associated with one or more databases in the dispersed environments; determining, by the processor, first data indicating a first average database size of the database type in a first environment at a first location, wherein the first environment comprises one or more first databases including first set of mirrors, wherein at least one mirror within the first set of mirrors comprises a copy of a database of the database type; determining, by the processor, second data indicating a second average database size of the database type in a second environment at a second location, the second environment being different from the first environment, wherein the second environment comprises one or more second databases including a second set of mirrors; determining, by the processor, third data associated with the database type for one or more third databases in a third environment at a third location; determining, by the processor and based in part on the first data, the second data, and the third data, fourth data identifying redundant copies of databases of the database type across the first environment, the second environment, and the third environment, the redundant copies including the at least one mirror; determining, by the processor and based in part on the fourth data, a ratio between physical space and unused space for individual databases of the database type; generating, by the processor and based on the first data, the second data, the third data, the fourth data, and the ratio, result data comprising: a database footprint estimate indicative of a size of computer memory occupied by the database type across the first environment, the second environment, and the third environment, and instructions to generate a user interface; and transmitting, by the processor, the result data to a user device for display via the user interface.
2 . The method of claim 1 , wherein determining the first data comprises: querying, by the processor, the one or more first databases for a database size; receiving, at the processor and from one or more database catalogs associated with each of the one or more first databases, respective database sizes for each of the one or more first databases; and determining, by the processor and based on the respective database sizes, the first average database size.
3 . The method of claim 1 , wherein determining the second data comprises: querying, by the processor, the one or more second databases for a database size; receiving, at the processor and from one or more database catalogs associated with each of the one or more second databases, respective database sizes for each of the one or more second databases; and determining, by the processor and based on the respective database sizes, the second average database size.
4 . The method of claim 1 , wherein determining the third data comprises: querying, by the processor and using fuzzy logic, the one or more third databases for respective database sizes; receiving, at the processor and from one or more database catalogs associated with each of the one or more third databases, the respective database sizes; and generating, by the processor and based on the respective database sizes, a third average database size.
5 . The method of claim 1 , wherein determining the fourth data comprises: querying, by the processor, the first environment for first metadata indicating first redundancy within the first environment; receiving, at the processor and from the first environment, the first metadata; querying, by the processor, the second environment for second metadata indicating second redundancy within the second environment; receiving, at the processor and from the second environment, the second metadata; querying, by the processor, the third environment for third metadata indicating third redundancy within the third environment; receiving, at the processor and from the third environment, the third metadata; and determining, by the processor and based on aggregating the first metadata, the second metadata, and the third metadata, the fourth data.
6 . The method of claim 1 , wherein determining the fourth data comprises: querying, by the processor, a fourth environment for metadata indicating redundancy of the database type across the first environment, the second environment, and the third environment; receiving, at the processor and from the fourth environment, the metadata; and determining, by the processor and based on the metadata, the fourth data.
7 . The method of claim 1 , wherein generating the result data comprises correlating, by the processor and based on using a composite algorithm, the first data, the second data, the third data, the fourth data, and the ratio.
8 . The method of claim 1 , wherein the first environment, the second environment, and the third environment each correspond to different geographical locations.
9 . The method of claim 1 , wherein the database type comprises one or more of a name associated with a particular database or a logical partition associated with one or more environments.
10 . (canceled)
11 . A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors of a database system, cause the one or more processors to: identify an input comprising an indication of a database type corresponding to a target database or a target logical partition across a plurality of environments; determine, based in part on the input, first data indicating a first average database size of the database type in a first environment of the plurality of environments, wherein the first environment comprises one or more first databases including a first set of mirrors; determine second data indicating a second average database size of the database type in a second environment of the plurality of environments, wherein the second environment comprises one or more second databases including a second set of mirrors; determine third data associated with the database type for one or more third databases in a third environment of the plurality of environments; determine, based in part on the first data, the second data, and the third data, fourth data associated with a fourth environment, the fourth environment comprising a mainframe environment that includes the first environment, the second environment, and the third environment, wherein the fourth data indicates a redundancy associated with the database type in the fourth environment; generate, based on the first data, the second data, the third data, and the fourth data, result data comprising: a database footprint estimate indicative of a size of computer memory occupied by the database type across the plurality of environments, and instructions to generate a user interface; and transmit the result data to a user device for display via the user interface.
12 . The non-transitory computer-readable medium of claim 11 , wherein the fourth data further comprises a ratio between physical space and unused space for each database of the database type across the first environment, the second environment, and the third environment.
13 . The non-transitory computer-readable medium of claim 11 , wherein the mainframe environment includes metadata associated with a plurality of databases, the plurality of databases including a plurality of mirrors, including the first set of mirrors and the second set of mirrors.
14 . The non-transitory computer-readable medium of claim 11 , wherein generating the result data comprises correlating, based on using one or more algorithms, the first data, the second data, the third data, and the fourth data.
15 . The non-transitory computer-readable medium of claim 11 , wherein the result data further comprises one or more of: an indication of a reduction in the database footprint estimate over a previous database footprint size, a potential cost savings value, or a cost associated with maintaining the database type across the plurality of environments.
16 . A database storage system comprising: one or more processors; and a non-transitory memory storing computer-executable instructions that, when executed, cause the one or more processors to perform operations comprising: identify a database type associated with one or more databases of the database storage system; determine first data indicating a first average database size of a database type in a first environment at a first location, wherein the first environment comprises one or more first databases including a first set of mirrors, wherein at least one mirror within the first set of mirrors comprises a copy of a database of the database type; determine second data including a second average database size of the database type in a second environment at a second location, wherein the second environment comprises one or more second databases comprising a second set of mirrors; determine third data associated with the database type for one or more third databases in a third environment at a third location; determine, based in part on the first data, the second data, and the third data, fourth data identifying redundant copies of databases of the database type across the first environment, the second environment, and the third environment, the redundant copies including the at least one mirror; determine, based in part on the fourth data, a ratio between physical space and unused space for individual databases of the database type across the first environment, the second environment, and the third environment; generate, based on the first data, the second data, the third data, the fourth data, and the ratio, result data comprising a database footprint estimate, the database footprint estimate indicative of a size of computer memory occupied by the database type across the first environment, the second environment, and the third environment; and transmit the result data to a user device for display via a user interface.
17 . The database storage system of claim 16 , wherein the operations to determine the first data further comprise: querying the one or more first databases for a database size; receiving, from one or more database catalogs associated with each of the one or more first databases, respective database sizes for each of the one or more first databases; and generating, based on the respective database sizes, the first average database size.
18 . (canceled)
19 . (canceled)
20 . A system for generating a database footprint estimate, the system comprising: means for determining first data indicating a first average size of a database type in a first environment at a first location, wherein the first environment comprises one or more first databases including a first set of mirrors, wherein at least one mirror within the first set of mirrors comprises a copy of a database of the database type; means for determining second data indicating a second average database size of the database type in a second environment at a second location, the second environment being different from the first environment, wherein the second environment comprises one or more second databases including a second set of mirrors; means for determining third data associated with the database type for one or more third databases in a third environment at a third location; means for determining fourth data associated with a fourth environment comprising a mainframe environment that includes the first environment, the second environment, and the third environment, the fourth data identifying redundant copies of the database type; means for generating, based on one or more of the first data, the second data, the third data, and the fourth data, result data comprising the database footprint estimate. wherein the database footprint estimate is indicative of a size of computer memory occupied by the database type in the fourth environment; and means for transmitting the result data to a user device for display via a user interface.

Description

BACKGROUND Large organizations may maintain many databases implemented across multiple different physical computing resources. There are often many different copies and versions of such databases maintained in testing environments, production environments, and non-production environments for a variety of purposes, such as redundancy, backup, development, testing, and production use. As an example, a large organization may have a particular database maintained in a production environment with versions replicated across multiple production and non-production environments. Within each environment, mirrors (e.g., copies) of the particular database can be maintained for various purposes. Accordingly, across all the environments, thousands of instances of the particular database can be maintained. Therefore, determining the true footprint of a particular database (e.g., quantity and cost of computing resources used by the database, including all related types and versions of that database) can be a complex task involving many different factors. Currently, estimating a database footprint is a largely manual process. In particular, in current techniques, a user must manually estimate values associated with allocated, used, and unused resources in various environments. The user must then estimate a total database footprint then estimated by a user. This is not only time and labor intensive, but often results in an inaccurate database footprint estimate. Inaccurate database footprint estimates may impact computing resource management and planning, ultimately affecting production services. For instance, inaccurate database footprints may result in an organization paying for space that they do not need. Inaccurate database footprint estimates may also result in a company purchasing too little of space, resulting in reliability problems. Further, inaccurate database footprint estimates result in companies lacking an accurate cost accounting of database use, leading to inefficient and ineffective infrastructure management. Moreover, while some techniques enable other data to be pulled (e.g., by scanning tables), doing so may take an extended amount of time (e.g., days, weeks, months, etc.). This is not only labor intensive and costly, but current techniques do not consider mirrors, unused and unallocated space, and more, and thus, still result in inaccurate database footprint estimates. The examples of the present disclosure are directed to overcoming these deficiencies and providing a way to accurately and efficiently size a complex database system. SUMMARY Techniques described herein are directed to providing a streamlined and accurate determination of a true footprint of a complex database system. Techniques described herein implement a rapid database sizing system that automates estimating a database footprint for a database implemented across multiple environments and/or in multiple geographically dispersed computing resources. The system generates a database sizing interface with various controls and interface elements that accept parameters from a user to be used in identifying and sizing database resources. Automated processing is then executed based on these parameters to perform the various measurements required to determine the sizing of various types of database resources. These processes will vary depending on the type of resource being sized. For example, operational database resources may be sized using logic-based extraction to search a database catalog of a production database administration service, while non-production databases may be sized using logic-based extraction to search database catalog metadata (i.e., outside of a database administration service). Development database resources may be sized using fuzzy logic queries generated based on production database information (e.g., since development database names are likely to resemble, but be different than, production database names). Associated mirror databases are also determined and measured during these steps. Further processes are used to determine a ratio of the development database sizes to the production database sizes and allocated space to used space. Based on the determined sizing data, an algorithm that integrates a pricing parameter (may be provided in the initial user-provided parameters) may be executed to determine a cost of maintaining the database, including a cost per unit of data storage resources. The resulting database footprint data may be provided to the user on the same or an updated interface. The algorithm used to perform the footprint calculation may vary and the inputs may be weighted or otherwise adjusted based on various criteria. For example, the techniques described herein may relate to a method for determining, by a processor of a database storage system, first data comprising an average size of a database type associated with one or more first databases in a first environment; determining, by the processor, secon