US-12619546-B2 - System and method for data storage, transfer, synchronization, and security using automated model monitoring and training with a load-adaptive cache

US12619546B2US 12619546 B2US12619546 B2US 12619546B2US-12619546-B2

Abstract

A system and method for efficient data storage, transfer, synchronization, and security using automated model monitoring and training. The system analyzes test datasets to detect data drift, retraining encoding and decoding algorithms as needed. New data sourceblocks are created and assigned codewords, compiling an updated codebook for distribution to connected devices. A novel dyadic distribution subsystem simultaneously compresses and encrypts data by transforming input streams into a dyadic distribution. This process generates a compressed main data stream and a secondary stream of transformation information, which are combined into a secure output. The system includes a network device manager for optimizing codebook distribution based on device resource usage. Operating in both lossless and lossy modes, the system offers flexible, efficient, and secure data handling across various network configurations.

Inventors

Joshua Cooper
Grant Fickes
Charles Yeomans

Assignees

AtomBeam Technologies Inc.

Dates

Publication Date: 20260505
Application Date: 20250324

Claims (10)

1 . A computer system comprising: a hardware memory, wherein the computer system is configured to execute software instructions stored on nontransitory machine-readable storage media that: processes received data to generate a test dataset; analyzes the test dataset to determine a probability distribution; retrains encoding and decoding algorithms using the test dataset and the probability distribution; applies the retrained algorithms to generate one or more data units from the test dataset; associates each of the one or more data units with a corresponding identifier; and stores the one or more data units and their corresponding identifiers in an updated data structure; monitors system performance metrics and resource utilization across system components; analyzes temporal and spatial relationships in data access patterns; calculates likelihood scores for potential prefetch operations based on tracked percentages of previously prefetched data units that were actually used, wherein each likelihood score represents a probability that a corresponding data unit will be accessed; dynamically adjusts cache parameters based on current performance metrics; maintains multiple cache levels with different performance characteristics for different data types; and executes prefetch operations for data units whose likelihood scores exceed a threshold, wherein the prefetch operations are selected from the potential prefetch operations based on prediction models that incorporate the calculated likelihood scores and available system resources.
2 . The computer system of claim 1 , wherein the computing device is a cloud-based computing device.
3 . The computer system of claim 1 , wherein the computer system is further configured to: receive device data from at least one of a plurality of network connected devices; store received device data in a storage device operating on the memory; analyze device data to monitor network connected device resource consumption and time periods of device downtime; and forward device data to a codebook update engine.
4 . The computer system of claim 3 , wherein the computer system is further configured to: receive updated codebooks; store updated codebooks in a cache; receive device data from the network device manager; and publish updated codebooks to network connected devices associated with the received device data.
5 . The computer system of claim 1 , wherein the computer system is further configured to: analyze an input data stream to determine its properties; create a transformation matrix based on the properties of the input data; transform the input data into a dyadic distribution; generate a main data stream of transformed data and a secondary data stream of transformation information; compress the main data stream; and combine the compressed main data stream and the secondary data stream into an output stream.
6 . A computer-implemented method comprising the steps of: processing received input data to generate a test dataset; analyzing the test dataset to determine a probability distribution; retraining encoding and decoding algorithms using the test dataset and the probability distribution; applying the retrained algorithms to generate one or more data units from the test dataset; associating each of the one or more data units with a corresponding identifier; and storing the one or more data units and their corresponding identifiers in an updated data structure; monitoring system performance metrics and resource utilization across system components; analyzing temporal and spatial relationships in data access patterns; calculating likelihood scores for potential prefetch operations based on historical accuracy tracked percentages of previously prefetched data units that were actually used, wherein each likelihood score represents a probability that a corresponding data unit will be accessed data element will be accessed within a predetermined time window; dynamically adjusting cache parameters based on current performance metrics; maintaining multiple cache levels with different performance characteristics for different data types; and executing prefetch operations for data units whose likelihood scores exceed a threshold based on prediction models while considering available resources, wherein the prefetch operations are selected from the potential prefetch operations based on prediction models that incorporate the calculated likelihood scores and adjust the threshold value based on and available system resources.
7 . The method of claim 6 , wherein the method is performed on a cloud-based computing device.
8 . The method of claim 6 , further comprising the steps of: receiving device data from at least one of a plurality of network connected devices; storing received device data in a storage device; analyzing device data to monitor network connected device resource consumption and time periods of device downtime; and forwarding device data to a codebook update engine.
9 . The method of claim 8 , further comprising the steps of: receiving updated codebooks; storing updated codebooks in a cache; receiving device data from a network device manager; and publishing updated codebooks to network connected devices associated with the received device data.
10 . The method of claim 6 , further comprising the steps of: analyzing an input data stream to determine its properties; creating a transformation matrix based on the properties of the input data; transforming the input data into a dyadic distribution; generating a main data stream of transformed data and a secondary data stream of transformation information; compressing the main data stream; and combining the compressed main data stream and the secondary data stream into an output stream.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS Priority is claimed in the application data sheet to the following patents or patent applications, each of which is expressly incorporated herein by reference in its entirety: Ser. No. 18/939,537Ser. No. 18/161,080 BACKGROUND OF THE INVENTION Field of the Invention The present invention is in the field of computer data storage and transmission, and in particular to statistical analysis of datasets for automated algorithm training. Discussion of the State of the Art As computers become an ever-greater part of our lives, and especially in the past few years, data storage has become a limiting factor worldwide. Prior to about 2010, the growth of data storage far exceeded the growth in storage demand. In fact, it was commonly considered at that time that storage was not an issue, and perhaps never would be, again. In 2010, however, with the growth of social media, cloud data centers, high tech and biotech industries, global digital data storage accelerated exponentially, and demand hit the zettabyte (1 trillion gigabytes) level. Current estimates are that data storage demand will reach 175 zettabytes by 2025. By contrast, digital storage device manufacturers produced roughly 1 zettabyte of physical storage capacity globally in 2016. We are producing data at a much faster rate than we are producing the capacity to store it. In short, we are running out of room to store data, and need a breakthrough in data storage technology to keep up with demand. The primary solutions available at the moment are the addition of additional physical storage capacity and data compression. As noted above, the addition of physical storage will not solve the problem, as storage demand has already outstripped global manufacturing capacity. Data compression is also not a solution. A rough average compression ratio for mixed data types is 2:1, representing a doubling of storage capacity. However, as the mix of global data storage trends toward multi-media data (audio, video, and images), the space savings yielded by compression either decreases substantially, as is the case with lossless compression which allows for retention of all original data in the set, or results in degradation of data, as is the case with lossy compression which selectively discards data in order to increase compression. Even assuming a doubling of storage capacity, data compression cannot solve the global data storage problem. The method disclosed herein, on the other hand, works the same way with any type of data. Transmission bandwidth is also increasingly becoming a bottleneck. Large data sets require tremendous bandwidth, and we are transmitting more and more data every year between large data centers. On the small end of the scale, we are adding billions of low bandwidth devices to the global network, and data transmission limitations impose constraints on the development of networked computing applications, such as the “Internet of Things”. Furthermore, as quantum computing becomes more and more imminent, the security of data, both stored data and data streaming from one point to another via networks, becomes a critical concern as existing encryption technologies are placed at risk. What is needed is a fundamentally new approach to data storage and transmission, that allows for dramatically more storage versus existing methods on the same physical storage device, and that supports automated system efficacy monitoring and model training. SUMMARY OF THE INVENTION The inventor has developed a system and method for system and method for data storage, transfer, synchronization, and security using automated model monitoring and training with a load-adaptive cache. New data sourceblocks may be processed and assigned new codewords which are compiled into an updated codebook which may be distributed back to encoding and decoding systems and devices. Additionally, the inventor has developed a novel approach for simultaneous compression and encryption of data using a dyadic distribution-based algorithm, which can be integrated with the existing system to provide enhanced efficiency and security. According to a first preferred embodiment, a system for data storage, transfer, synchronization, and security using automated model monitoring and training with a load-adaptive cache, comprising: a computing device comprising a processor and a memory; a codebook training module comprising a first plurality of programming instructions that, when operating on the processor, cause the processor to: receive data; process the data to generate a test dataset; retrieve at least one probability distribution associated with a previous training dataset; analyze the test dataset to determine at least one new probability distribution; retrain encoding and decoding algorithms using the test dataset; apply the retrained algorithms to generate one or more data units from the test dataset; associate each of the one or more data units with a correspond