CN-121996789-A - Adaptive fine tuning RAG fusion large language model system with long-term and short-term memory collaboration

CN121996789ACN 121996789 ACN121996789 ACN 121996789ACN-121996789-A

Abstract

The invention discloses an adaptive fine tuning RAG fusion large language model system with long-term and short-term memory collaboration, which relates to the technical field of large language models and comprises a short-term memory module and a long-term memory fine tuning processing module, wherein the short-term memory module comprises a text data block unit, an embedded vector unit, a short-term memory buffer zone and a vector index unit, and the long-term memory fine tuning processing module comprises an adaptive route LoRA activation classifier, a plurality of LoRA modules, a preference loss calculation unit, a feature extractor and a multi-stage processing architecture. The invention alleviates the defects of the existing large language model in the dimensionality of memory architecture layering, memory updating causal traceability and online and offline learning coordination.

Inventors

HU TIANYU
Qi Jueyu
TANG ZHIXING
PAN YUCHEN
MA HUIMIN

Assignees

北京科技大学

Dates

Publication Date: 20260508
Application Date: 20251217

Claims (6)

1. The long-term memory collaborative adaptability fine-tuning RAG fusion large language model system is characterized by comprising a short-term memory module and a long-term memory fine-tuning processing module, wherein the short-term memory module comprises a text data blocking unit, an embedded vector unit, a short-term memory buffer area and a vector index unit, and the long-term memory fine-tuning processing module comprises an adaptive route LoRA activation classifier, a plurality of LoRA modules, a preference loss calculation unit, a feature extractor and a multi-stage processing architecture; the text block unit is used for dividing the input text data into blocks to obtain a plurality of text data blocks; the embedded vector unit is used for vectorizing the text data blocks to obtain a plurality of embedded vectors; the short-term memory buffer is used for storing the plurality of embedded vectors; The vector index unit is used for searching the embedded vectors in the short-term memory buffer area based on a preset searching mode, deleting the embedded vectors with the searching frequency lower than a first preset threshold value, and sending the embedded vectors with the searching frequency higher than a second preset threshold value to the long-term memory fine-tuning processing module; Each LoRA module corresponds to a downstream task; The feature extractor is used for extracting hidden layer features of all layers; The self-adaptive route LoRA activates a classifier, is used for inputting the hidden layer characteristics into a dynamic multi-layer perceptron in a training stage to obtain LoRA weight, and activates a corresponding LoRA module based on the LoRA weight in an reasoning stage; the preference loss calculation unit is used for learning a mapping process from data distribution to the activation states of the plurality LoRA of modules based on a preset preference loss function in a training stage; The multi-stage processing architecture is used for obtaining a target prediction result by adopting a multi-stage decision mechanism in an reasoning stage.
2. The system of claim 1, wherein the short-term memory module further comprises a vector extraction module for performing high-order feature extraction and vector extraction on the plurality of embedded vectors.
3. The system of claim 1, wherein the adaptive routing LoRA activates a classifier as a dynamic multi-layer aware architecture.
4. The system according to claim 1, wherein the preset preference loss function in the preference loss calculating unit includes a mean square error loss function.
5. The system of claim 1, wherein the long-term memory fine-tuning processing module reserves the extended capacity of the LoRA module during a training phase using a sparse initialization strategy.
6. The system of claim 1, wherein the multi-level processing architecture comprises a hierarchical feature aggregation, cross-layer hidden feature enhancement, and task aware projection three-level processing architecture.

Description

Adaptive fine tuning RAG fusion large language model system with long-term and short-term memory collaboration Technical Field The invention relates to the technical field of large language models, in particular to an adaptive fine-tuning RAG fusion large language model system with long-term and short-term memory collaboration. Background With the breakthrough development of large language model technology, the method has excellent capability in the fields of natural language understanding, dialogue generation, multi-task reasoning and the like, and is widely applied to intelligent scenes such as customer service, education, medical treatment and the like. However, the existing large language model still faces two major core challenges in practical application, namely firstly, a traditional memory management mechanism adopts a context window with fixed length, lacks dynamic storage capacity for long-term preference and historical interaction information of a user, causes a memory storage to present a 'one-tool' mode, cannot realize priority division of important information, and secondly, the existing fine tuning method relies on offline batch training, is difficult to capture user behavior characteristics and scene changes in real time, and causes personalized service lag and memory redundancy problems. How to keep models long-term memory and effectively learn new knowledge during continuous interaction becomes a key challenge. Large language models (Large Language Model, LLM) have shortcomings in memory architecture layering (e.g., short-term emotion caching and long-term personality image layering storage in emotion accompaniment), memory update causal traceability (e.g., logical backtracking of attack links in military decisions), on-line-off learning collaboration (e.g., instant fine tuning of sudden road conditions and periodic iteration of driving habits in automatic driving), and the like. In particular, the performance of an Agent (Agent) hosting LLM in a complex environment depends largely on the effectiveness of its information acquisition and memory management mechanisms. Existing studies have been largely developed around three sources of core information, internally generated information (Inside-trial Information), cross-test crossover information (Cross-trial Information), and external knowledge (External Knowledge). While these mechanisms have improved the capabilities of the agents in their respective areas, they all face significant limitations and challenges in practical applications. The first is internally generated information, which mainly refers to short-term memory generated by an agent in the task execution process or task-related context information. For example MemoChat generates memory by analyzing historical chat logs, helping the agent to better understand context in subsequent conversations. Similarly, tiM (Task-internal Memory) generates multiple ideas after completion of a Task, forming a short-term Memory mechanism for quick invocation in similar tasks. Such methods typically rely on context information within the task, with strong task relevance. But internal information is highly dependent on a specific task (task-specific) and hardly migrates into other types of tasks. These short-term memories fail once the task context changes. Whether MemoChat or TiM, its memory capacity is limited by the context window size of the model (e.g., LLM). As task complexity increases and dialog turns become longer, the earliest critical information may disappear. And the second is Cross information (Cross-trial Information), which refers to information shared and utilized by the intelligent agent among different tasks or experiments. Reflexion proposes a framework based on language reinforcement learning that allows an agent to record past experiences in verbal form and use them in subsequent tasks, thereby improving the efficiency of task execution. Retroformer further improve the Reflexion model, enabling the agent to more efficiently extract cross-task information from past trials by fine tuning. The method remarkably improves the generalization capability and the task execution efficiency of the intelligent agent through the sharing and the utilization of the cross-task information. Thirdly, the external knowledge (External Knowledge) refers to information acquired by the agent from external resources (such as knowledge base, web crawling, etc.). Such methods enhance the knowledge breadth and depth of the agent by combining external knowledge with task context. For example, some agents assist in task execution by accessing an external knowledge base or extracting relevant information from web pages using crawler technology. The introduction of external knowledge enables the agent to handle more complex tasks and make more accurate decisions in the absence of internal information. Calling APIs, crawling web pages, or searching large knowledge bases is a time consuming operation. This c