CN-122019724-A - Memory method supporting mixed retrieval and dynamic update in legal large model application

CN122019724ACN 122019724 ACN122019724 ACN 122019724ACN-122019724-A

Abstract

The invention provides a memory method supporting mixed retrieval and dynamic update in legal big model application, which comprises the steps of obtaining a problem of user input, carrying out mixed retrieval according to a user question, obtaining related memory information through mixed retrieval of French field secondary retrieval and a conventional memory bank, carrying out memory evolution according to current user input and a mixed retrieval result, judging whether the user input is matched with the retrieval result, updating the existing memory or creating new memory according to the matching result, updating a link relation between memories, and filtering and screening the mixed retrieval result to obtain a final memory retrieval result. The invention enables the memory system to be self-adaptively optimized according to actual use conditions, has the advantage of combining multiple search methods, can simultaneously meet the dual requirements of the legal field on accuracy and semantic understanding, effectively supports multiple application scenes such as legal consultation, case analysis, legal education and the like, and further improves the efficiency and accuracy of legal service.

Inventors

ZHUGE QINGFENG
Bi Xiangwei
SHA XINGMIAN

Assignees

华东师范大学

Dates

Publication Date: 20260512
Application Date: 20260209

Claims (14)

1. A memory method for supporting hybrid search and dynamic update in legal big model application, comprising the steps of: s1, acquiring a problem input by a user; s2, performing mixed retrieval according to the user question, and obtaining related memory information through secondary retrieval of French fields and mixed retrieval of a conventional memory bank; s3, performing memory evolution according to the current user input and the mixed search result, judging whether the user input is matched with the search result, updating the existing memory or creating a new memory according to the matched result, and updating the link relation between memories; And S4, filtering and screening the mixed search result, including deduplication treatment, correlation filtering, quality evaluation and sequencing optimization, to obtain a final memory search result.
2. The memory method for supporting hybrid search and dynamic update in application of legal big model according to claim 1, wherein said hybrid search in step S2 specifically comprises the following steps: s201, extracting key words and related legal information according to the problems input by a user, and extracting key legal terms and concepts and related legal numbers and legal name information by using a large language model or a key word extraction tool; S202, performing secondary retrieval according to the French fields, using a semantic coding model to code a question of a user to a system and contents of the French fields into word vectors according to each memorized French field in a French memory bank, and performing dense retrieval by calculating cosine similarity to obtain a retrieval result according to the French fields; s203, performing dense search and sparse search in a conventional memory bank in parallel by using word vectors and keywords to respectively obtain a dense search result and a sparse search result; s204, dynamically adjusting weight parameters of sparse retrieval and dense retrieval, carrying out normalization processing on two retrieval results, and then combining to obtain a conventional mixed retrieval result; S205, dynamically adjusting weights of the search results according to the French field and the conventional mixed search results, de-duplicating, merging and sequencing the two search results, and reserving the front Top-K memories, wherein K is the number of preset search results; S206, according to the previous Top-K memories, further searching adjacent memories of each memory according to the association relation between memories; s207, de-overlapping the Top-K memories and the adjacent memories respectively, and taking the final return result.
3. The memory method supporting hybrid search and dynamic update in application of legal big model according to claim 1, wherein the memory evolution in step S3 specifically comprises the following steps: s301, acquiring the input content and the search result of the current user; S302, judging whether the input content is matched with the search result by using a large language model, calling the large language model to analyze the semantic relevance and the logical relevance of the user input and the memory content by constructing a matching judgment prompt, and judging whether the input content is matched with the search result according to the judgment result of the large language model; S303, if the search results are matched, updating corresponding memories in the search results, wherein the memories comprise updated memory contents, keywords, context description and classification labels; s304, if the input content is not matched, extracting keywords by using a large language model, generating context description, determining classification labels, creating new memory objects and adding the new memory objects into a memory library; s305, updating the link relation between the memory and the neighbor memory, recalculating the association degree for the updated memory, updating the link, searching the existing memory with similar semantics or logic correlation for the newly added memory, and establishing the link relation.
4. The method for memorizing the mixed retrieval and dynamic updating in the application of the legal big model according to claim 1 or 2, wherein each memory of the conventional memory bank comprises a legal field attribute for storing legal numbers, legal names and legal content original text information, the legal field is that the legal information is extracted from the memorized content and stored in the legal field if the memorized content relates to legal regulations during memory creation or updating, and the legal memory bank is a subset formed by screening the unified memory bank, wherein the screening condition is that the legal field is not empty and is used for supporting the special retrieval based on the legal field.
5. The memory method supporting hybrid search and dynamic update in a legal big model application according to claim 2, wherein the dense search and sparse search in step S203 comprises: dense retrieval, namely encoding the query text and the memory content into vectors by using a semantic encoding model, and obtaining dense retrieval scores by calculating cosine similarity; and sparse retrieval, namely performing word segmentation processing on the query text and the memory content by using a BM25 algorithm, and calculating BM25 scores based on word frequencies and inverse document frequencies.
6. The method according to claim 2, wherein the dynamically adjusting the weight parameters of the sparse and dense searches in step S204 comprises: And respectively carrying out normalization processing on the BM25 score and the semantic similarity score, and calculating and combining according to the mixed score = alpha multiplied by the normalized BM25 score + (1-alpha) multiplied by the normalized semantic similarity score.
7. The memory method supporting hybrid search and dynamic update in a legal big model application according to claim 6, wherein the method for adjusting the weight parameter α comprises: Calculating the proportion of the accurate matching keywords in the query, increasing sparse retrieval weight and alpha value if the proportion of the accurate matching keywords exceeds a preset threshold, evaluating the semantic complexity of the query by using a large language model or a semantic analysis tool, and increasing dense retrieval weight and reducing alpha value if the semantic complexity score exceeds the preset threshold.
8. The method according to claim 2, wherein the dynamically adjusting weights of the legal field search result and the conventional mixed search result in step S205 includes: Analyzing the query type by using a large language model and a rule engine, if the query explicitly relates to specific legal references, increasing the weight of the legal field retrieval results to be 0.6-0.8, and if the query relates to case analysis or general legal consultation, increasing the weight of the conventional mixed retrieval results to be 0.6-0.8.
9. The method for memorizing supporting mixed retrieval and dynamic updating in a legal big model application according to claim 2, wherein the adjacent memory search in step S206 includes: And (3) for the Top-K memories retrieved in the step S205, retrieving corresponding adjacent memories from the memory base according to the associated memory identifiers stored in the links attribute of the memories, and if the number of the adjacent memories is more, limiting each memory to retrieve M adjacent memories at most, wherein M is a preset parameter, and preferentially selecting the adjacent memories with higher association degree.
10. The method for memorizing a legal big model application in support of mixed search and dynamic update according to claim 3, wherein the step S302 of judging whether the input content matches the search result comprises: Combining the user input content and the retrieved memory content into a prompt text, analyzing the semantic relevance, the logical relevance and the information consistency of the user input and the memory content by using a large language model, and judging whether the user input and the memory content are matched according to the judging result of the large language model.
11. The method for memorizing a legal big model application supporting mixed search and dynamic update according to claim 3, wherein the updating the corresponding memories in the search result in step S303 comprises: Analyzing new information input by a user by using a large language model, integrating the new information with the existing memory content, updating the memory content, re-extracting keywords according to the updated memory content, updating a keyword list, regenerating a context description and a classification label by using the large language model, updating metadata information such as the last access time, the search times and the like of the memory, and updating the stored legal field if the user input contains new legal information.
12. The method of claim 3, wherein creating new memory objects and adding them to the memory pool in step S304 comprises: Analyzing user input content by using a large language model, extracting a keyword list, generating a context description, determining a classification label, extracting and storing legal information into a legal field if the input content relates to the legal, creating a new memory object comprising a unique identifier, content, keywords, context, label, timestamp, legal field and search frequency attribute, wherein the unique identifier is generated by using UUID, the search frequency initial value is set to 0, the creation time is set to the current time, adding the newly created memory object into a memory library, and updating a search index.
13. The method for memorizing the legal big model application in support of mixed search and dynamic update according to claim 3, wherein the linking relation between the update memorization and the neighbor memorization in step S305 comprises: calculating the semantic similarity between the updated memory and other memories in a memory library by using a semantic coding model, identifying candidate associated memories by combining keyword overlapping analysis, analyzing semantic relevance and logic relevance by using a large language model, determining whether to establish or update a link relation according to an evaluation result of the large language model, and updating link attributes of the memories; Calculating the semantic similarity of the new memory and the existing memory by using a semantic coding model, selecting memories with the similarity exceeding a preset threshold as candidate neighbors, analyzing the semantic relevance and the logic relevance by using a large language model, determining whether to establish a link relation according to the evaluation result of the large language model, determining that the large language model is a relevant neighbor memory identifier, adding the neighbor memory identifier into the link attribute of the new memory, updating the link attribute of the neighbor memory, and establishing a two-way link relation; And (3) optimizing the link relation, namely analyzing the relevance between memories by using a large language model, removing the links with lower relevance according to the judging result of the large language model, and ensuring the establishment of the link relation for the memories with higher relevance.
14. The method for memorizing the legal big model application supporting the mixed search and the dynamic update according to claim 1, wherein the filtering and screening in the step S4 comprises: Removing repeated memory entries according to the unique identifiers of memories, calculating the relevance score of each memory and the user query, combining semantic similarity and keyword matching degree factors, filtering memories with the relevance score lower than a preset threshold, using a large language model to evaluate the quality and the integrity of the memories, preferentially keeping the memories with high quality according to the evaluation result of the large language model, calculating the comprehensive score by adopting a weighted ranking algorithm according to the relevance score evaluated by the large language model, the memory quality score, the timeliness of the memories, the search frequency and other factors, and ranking the scores from high to low.

Description

Memory method supporting mixed retrieval and dynamic update in legal large model application Technical Field The invention relates to the fields of natural language processing and large model application, in particular to a memory method for supporting mixed retrieval and dynamic update in legal large model application. Background In recent years, with the rapid development of deep learning technology, large language models (Large Language Model, LLM) have made breakthrough progress in the field of natural language processing (Natural Language Processing, NLP). These models exhibit powerful language understanding and generating capabilities through extensive pre-training and fine tuning, and have been successfully applied to a number of vertical fields such as education, medicine, finance, and the like. In the legal field, the large language model has great potential in applications such as intelligent law consultation, law provision retrieval, case analysis, contract examination and the like, and provides a new technical path for the intellectualization and automation of legal services. However, since legal text has the characteristics of high specialization, strong logic strictness, high normalization requirement and the like, the existing large language model faces remarkable technical challenges in legal application scenes. The solution of legal problems not only requires the accuracy of language expression, but also emphasizes the compliance of legal reasoning processes and the validity of conclusions, which puts strict requirements on the reasoning capacity and knowledge accuracy of a large language model. In particular, in the aspect of memory management, how to construct a memory system with high efficiency, accuracy and dynamic update becomes a key technical problem in the application of a legal big model. The prior legal big model application has the following problems and deficiencies in memory management: 1. the memory updating mechanism is passive, namely the existing system only triggers updating when new memory is added, lacks the capability of dynamically updating the memory structure according to user input, and cannot perform self-adaptive optimization according to actual requirements and use modes, so that the memory structure deviates from an application scene, and the practicability and the accuracy of the system are affected. 2. The existing system lacks effective management of the memory association relationship, only returns direct correlation memory during retrieval, the retrieval result cannot be expanded by using the association relationship, the semantic and logic association is not fully utilized, and a complete knowledge network is difficult to form. 3. The existing system mostly adopts a single retrieval method, sparse retrieval such as BM25 is good in accurate matching but insufficient in semantic understanding, dense retrieval can capture semantic similarity but low in sensitivity on accurate keyword matching, and double requirements of the legal field on accuracy and semantic understanding are difficult to meet. 4. The memory retrieval strategy is single, the existing system is used for performing primary retrieval mostly, memory metadata information is not fully utilized, secondary retrieval is not performed according to legal fields in legal application, targeted optimization is difficult to achieve, and the stored structured information cannot be fully utilized. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a memory method for legal big model application and supporting mixed retrieval and dynamic updating. According to the method, a mixed retrieval mechanism combining sparse retrieval and dense retrieval is introduced, and recall rate and accuracy rate of memory retrieval are effectively improved through a French field secondary retrieval and memory evolution mechanism, so that a memory system can be adaptively optimized according to actual use scenes. The method can efficiently support application scenes such as legal consultation, case analysis, legal education and the like, and improves the efficiency and accuracy of legal service. In order to achieve the above purpose, the present invention provides the following technical solutions: a memory method supporting mixed retrieval and dynamic update in legal large model application adopts a mixed retrieval technology combining sparse retrieval and dense retrieval. The method comprises the following steps: s1, acquiring a problem input by a user; s2, performing mixed retrieval according to a user question, and acquiring related memory information through secondary retrieval of a French field (original or required to be additionally added; the mixed search specifically comprises the following steps: S201, extracting key words and related legal information according to the problems input by a user, and extracting key legal terms, concepts, related legal numbers, legal names