US-12626167-B2 - System and method for large language model with integrated memory during inference using manifold traversal architecture

US12626167B2US 12626167 B2US12626167 B2US 12626167B2US-12626167-B2

Abstract

A large language model system integrates persistent memory directly into inference operations through geometric manifold traversal rather than external retrieval. The system implements a memory-integrated inference engine that performs token generation with simultaneous memory access by navigating curved regions in a geometric memory manifold. Memories exist as navigable basins of increased curvature that are reinforced through usage rather than stored as discrete objects. An intent conditioning system formulates user queries as utility functions and generates vector fields that guide goal-directed memory traversal. A manifold geometry interface converts geometric memory coordinates into vectors compatible with language model attention mechanisms, augmenting standard key-value caches with memory-derived content. The system performs intentional remembering through path optimization that balances fidelity to prior cognitive trajectories with current intent guidance. Each memory access operation simultaneously retrieves information and strengthens accessed memory regions through bidirectional geometric shaping, enabling persistent cognitive evolution and cross-session memory continuity.

Inventors

Brian Galvin
Alan McCord

Assignees

AtomBeam Technologies Inc.

Dates

Publication Date: 20260512
Application Date: 20250925

Claims (20)

1 . A memory-integrated large language model system comprising: a memory-integrated inference engine that performs token generation with simultaneous memory access through geometric manifold traversal, wherein memory access occurs through goal-conditioned navigation of curved memory regions rather than discrete retrieval operations; a geometric memory manifold representing memories as navigable regions of increased curvature in latent hyperspace, wherein memories exist as geometric basins that are repeatedly traversed and reinforced through usage; an intent conditioning system that formulates user queries as utility functions and generates intent vector fields that provide goal-directed guidance for memory traversal through the geometric memory manifold; and a manifold geometry interface that converts geometric memory coordinates into vector representations compatible with language model attention mechanisms and integrates memory-derived vectors with attention computations to produce memory-enhanced token generation; wherein the memory-integrated inference engine performs intentional remembering by solving a path optimization that minimizes a functional balancing alignment with intent vector fields against manifold traversal costs, and wherein each memory access operation simultaneously retrieves information retrieve from accessed memory basins based on current manifold position and alignment with the intent vector field and reinforces accessed memory regions through bidirectional geometric shaping that strengthens frequently accessed memory pathways.
2 . The system of claim 1 , wherein the geometric memory manifold comprises a Riemannian manifold with time-evolving metric tensor that encodes memory strength through local curvature intensity.
3 . The system of claim 1 , wherein the geometric basins comprise episodic memory basins for high-resolution recent experiences, semantic memory basins for abstracted knowledge structures, and procedural memory basins for skill patterns.
4 . The system of claim 1 , wherein the manifold geometry interface augments standard attention key and value caches with memory-derived vectors to create memory-enhanced attention computations.
5 . The system of claim 1 , further comprising a dynamic compression engine that implements memory consolidation through geometric flow processes that preserve frequently accessed memory regions while compressing unused areas.
6 . The system of claim 5 , wherein the dynamic compression engine performs sleep-like consolidation cycles that replay memory trajectories and promote stable patterns across hierarchical memory substrates.
7 . The system of claim 1 , further comprising a persistent state manager that serializes manifold geometry and restores memory state across inference sessions to maintain cognitive continuity.
8 . The system of claim 1 , wherein the bidirectional geometric shaping increases curvature along successfully traversed memory paths and deepens memory basins based on access frequency patterns.
9 . The system of claim 1 , implemented as a federated architecture comprising multiple domain-specific instances that coordinate cross-domain memory access and synthesis.
10 . The system of claim 1 , wherein the memory-integrated inference engine evaluates memory access necessity for each token generation step and selectively performs manifold traversal based on context complexity requirements.
11 . A method for memory-integrated large language model inference comprising the steps of: performing token generation with simultaneous memory access through geometric manifold traversal, wherein memory access occurs through goal-conditioned navigation of curved memory regions rather than discrete retrieval operations; maintaining memories as navigable regions of increased curvature in a geometric memory manifold within latent hyperspace, wherein memories exist as geometric basins that are repeatedly traversed and reinforced through usage; formulating user queries as utility functions and generating intent vector fields that provide goal-directed guidance for memory traversal through the geometric memory manifold; converting geometric memory coordinates into vector representations compatible with language model attention mechanisms and integrating memory-derived vectors with attention computations to produce memory-enhanced token generation; performing intentional remembering by solving a path optimization that minimizes a functional balancing alignment with intent vector fields against manifold traversal costs; and simultaneously retrieving information from accessed memory basins based on current manifold position and alignment with the intent vector field and reinforcing accessed memory regions through bidirectional geometric shaping that strengthens frequently accessed memory pathways during each memory access operation.
12 . The method of claim 11 , wherein maintaining the geometric memory manifold comprises evolving a Riemannian manifold with time-evolving metric tensor that encodes memory strength through local curvature intensity.
13 . The method of claim 11 , wherein the geometric basins comprise episodic memory basins for high-resolution recent experiences, semantic memory basins for abstracted knowledge structures, and procedural memory basins for skill patterns.
14 . The method of claim 11 , wherein integrating memory-derived vectors comprises augmenting standard attention key and value caches with memory-derived vectors to create memory-enhanced attention computations.
15 . The method of claim 11 , further comprising the step of implementing memory consolidation through geometric flow processes that preserve frequently accessed memory regions while compressing unused areas.
16 . The method of claim 15 , further comprising the step of performing sleep-like consolidation cycles that replay memory trajectories and promote stable patterns across hierarchical memory substrates.
17 . The method of claim 11 , further comprising the step of serializing manifold geometry and restoring memory state across inference sessions to maintain cognitive continuity.
18 . The method of claim 11 , wherein the bidirectional geometric shaping comprises increasing curvature along successfully traversed memory paths and deepening memory basins based on access frequency patterns.
19 . The method of claim 11 , implemented across a federated architecture comprising coordinating cross-domain memory access and synthesis across multiple domain-specific instances.
20 . The method of claim 11 , further comprising the step of evaluating memory access necessity for each token generation step and selectively performing manifold traversal based on context complexity requirements.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS Priority is claimed in the application data sheet to the following patents or patent applications, which are expressly incorporated herein by reference in their entireties: Ser. No. 19/178,873,Ser. No. 19/177,611,Ser. No. 19/051,193 (issued as U.S. Pat. No. 12,387,050),Ser. No. 19/203,069 (issued as U.S. Pat. No. 12,481,688),Ser. No. 19/205,960 (published as US 2025-0363367 A1),Ser. No. 19/060,794 (published as US 2025-0363363 A1).Ser. No. 19/044,546 (published as US 2025-0363360 A1).Ser. No. 19/026,276 (published as US 2025-0363359 A1)Ser. No. 18/928,022 (published as US 2025-0363358 A1),Ser. No. 18/919,417 (published as US 2025-0363347 A1).Ser. No. 18/918,077 (published as US 2025-0363333 A1),Ser. No. 18/737,906 (published as US 2025-0378308 A1)Ser. No. 18/736,498 (published as US 2025-0363344 A1), and63/651,359. BACKGROUND OF THE INVENTION Field of the Invention This present invention relates to artificial intelligence systems, specifically large language models with integrated persistent memory architectures that perform memory access through geometric manifold traversal during inference operations. Discussion of the State of the Art Language models have evolved significantly in recent years, with modern architectures demonstrating remarkable capabilities in natural language processing, reasoning, and generation tasks. These large language models (LLMs) have become increasingly sophisticated, processing and generating human-like text across a wide range of applications. As these models have grown in capability, they have also grown substantially in size, with some models containing hundreds of billions of parameters. Modern LLMs process input prompts through complex architectures consisting of encoder and decoder blocks with attention mechanisms. Recent developments have revealed that these models often engage in an internal reasoning process, generating “thoughts” about a prompt before producing a final response. These thoughts represent the model's step-by-step reasoning and analysis of the input prompt. While some models expose these thoughts to users, others keep them internal to the model's processing pipeline. These reasoning steps have proven important to the model's ability to provide accurate and contextually appropriate responses. The computational resources required to run these large models present significant challenges for widespread deployment and real-time applications. The memory and processing requirements often necessitate specialized hardware and substantial computational infrastructure. Additionally, context windows in current architectures limit the amount of information that can be processed in a single session, constraining the model's ability to maintain long-term context and engage in extended conversations. While various solutions like retrieval-augmented generation have been proposed to address context limitations, these systems typically rely on document retrieval rather than leveraging the model's own reasoning processes. The deployment of these models presents significant challenges in terms of scalability and accessibility. Current solutions often require either substantial local computing resources or constant connection to cloud services with significant computational capacity. This is particularly problematic for mobile devices, which face inherent constraints including limited processing power, restricted memory capacity, battery limitations, and intermittent network connectivity. The combination of these constraints severely limits the practical applications of advanced language models on mobile platforms, creating a substantial gap between the capabilities available in controlled environments and those accessible to users on mobile devices in real-world scenarios. Recent mobile-optimized approaches have begun addressing these constraints through techniques such as model quantization, on-device processing, and thought caching. However, these systems still primarily function as responsive tools rather than continuous reasoning partners. Current mobile LLM implementations operate only during active user engagement, ceasing all cognitive processes during device inactivity periods. This creates significant inefficiencies in knowledge development and insight generation, as the system's computational capabilities remain dormant during substantial portions of the day. Furthermore, users who interact with language models across multiple devices experience fragmented cognitive contexts, with each device maintaining separate understanding and reasoning paths rather than providing a unified cognitive experience. Additionally, current systems lack meaningful domain specialization, treating all knowledge areas with the same generalized approach rather than developing deep expertise in particular fields. While some systems implement domain-specific models, these typically operate as isolated instances without controlled knowledge sharing bet