CN-122021876-A - Multi-context session large language model system
Abstract
The invention relates to the field of data processing, in particular to a multi-context session big language model system, which comprises a multi-role context management module, a role information injection module and a role perception reasoning generation module, wherein the multi-role context management module is used for converting a dialogue history into a structured sequence containing a role identifier, role background information, dialogue content, a standpoint, preference and a timestamp, the role information injection module is used for extracting role metadata and dialogue content from the structured sequence and structurally fusing the role metadata and the dialogue content through a predefined injection mechanism so as to construct a big language model input prompt containing explicit role characteristics, and the role perception reasoning generation module is internally provided with a big language model based on a Transformer framework and is used for dynamically adjusting attention weight according to target role characteristics carried in the input prompt so as to generate a reply consistent with the target role standpoint. The system can generate a reply which is highly consistent with the role standpoint and is logically consistent when the multiparty conversation is processed, and the intention understanding accuracy is higher.
Inventors
- ZHANG FAN
- LIAO HAIYING
- XIE GAOHUI
- Song Tianlun
- WU XIAOBIN
- LIU YEHENG
- WU JIALEI
Assignees
- 广州广电五舟科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251229
Claims (10)
- 1. The multi-context session large language model system is characterized by comprising a multi-role context management module, a multi-role context management module and a multi-role context processing module, wherein the multi-role context management module is used as an input end of the system and is used for receiving original multi-party session data, allocating unique role identifiers for each participant, and converting session histories into a structured sequence containing the role identifiers, role background information, session contents, standpoints, preferences and time stamps so as to output standardized multi-role context data; The character information injection module is used for extracting character metadata and dialogue content from the structural sequence, and carrying out structural fusion on the character metadata and the dialogue content through a predefined injection mechanism so as to construct a large language model input prompt containing explicit character characteristics; And the role perception reasoning generation module is internally provided with a large language model based on a Transformer framework and is used for receiving the input prompt and reasoning, and an improved attention calculation unit is utilized in the reasoning process to dynamically adjust the attention weight according to the target role characteristics carried in the input prompt so as to generate a reply consistent with the target role position.
- 2. The multi-context conversational large language model system of claim 1, wherein the improved attention computing unit employs a conditional attention mechanism, and specifically comprises a role gating subunit, wherein the role gating subunit is configured to calculate a correlation between each Token in an input sequence and a current target role to generate a gating scalar, and the role aware reasoning generating module uses the gating scalar to perform weighted adjustment on an original attention weight, and an adjustment formula is: ; In the formula, As a standard attention weight, the weight of the attention, For a gating scalar for the target character r, Is the final attention weight.
- 3. The multiple context session large language model system of claim 2, wherein said character gating subunit comprises a neural network configured to receive character embedding vectors for target characters And a representation vector of the current Token As input, the gating scalar is calculated to output a value between 0 and 1 to suppress the weight of non-relevant character information in the attention calculation.
- 4. The multi-context conversational large language model system of claim 1, wherein the role-aware inference generation module further includes a role embedding unit to assign a learnable role embedding vector to each role and add the role embedding vector to a corresponding word embedding vector at an input layer of the model as a comprehensive input representation into the attention calculation unit.
- 5. The multi-context session large language model system of claim 1, wherein the character information injection module employs a structured hint construction algorithm that converts the structured sequence into a hierarchical format of class JSON and introduces a special Token in the vocabulary, the special Token including at least a tag for identifying a character start, a tag for identifying an attribute key, the large language model being configured to parse and distinguish dialog content from character metadata by identifying the special Token.
- 6. The multiple context session large language model system of claim 5, wherein the role awareness inference generation module is further configured with a structure type embedding unit for assigning a type ID to each Token in the input hint to distinguish it from belonging to plain text, object keys or object values, and a hierarchical position encoding unit for encoding absolute positions of the Token in the sequence and depth and sibling node order in the JSON-like hierarchy to characterize logical dependencies of role attributes.
- 7. The multiple context session large language model system of claim 1, wherein the multi-headed attention mechanism in the role-aware inference generation module comprises a specific role-building-head, which is a specially-divided attention-head for specially capturing and modeling interactions and position conflicts between different roles, rather than just focusing on text semantic associations.
- 8. The multiple context session large language model system of claim 1, further comprising a model fine tuning module for performing supervised fine tuning of the large language model based on an enhanced data set containing data generated by a mirrored dialog and a perspective transformation strategy, the mirrored dialog strategy being to generate dialog responses set by two different roles for the same context, the perspective transformation strategy being to require the model to be repeated or reviewed with the kisses of the different roles, respectively, given a section of neutral narrative.
- 9. The multiple context session large language model system of claim 1, further comprising an interactive control interface for receiving control instructions input by a user, the instructions comprising defining relationships between roles or specifying a target role identity to generate a reply, the role-aware inference generation module adjusting target role conditions in an inference process in response to the control instructions to generate content with a specific directivity.
- 10. The multi-context conversational large language model system of any one of claims 1-9, wherein each record in the structured sequence stored by the multi-role context management module is a multi-tuple in the form of a < role identifier, conversational content, timestamp >, based on which the system tracks and distinguishes the perspective evolution of different roles in a long context.
Description
Multi-context session large language model system Technical Field The present invention relates to the field of data processing. More particularly, the present invention relates to a multiple context session large language model system. Background Along with the progress of deep learning technology, a large language model based on a transducer architecture achieves remarkable results in text generation and multi-round dialogue tasks, and is widely applied to scenes such as intelligent customer service, virtual character interaction, conference assistance and the like. However, the prior art still faces significant challenges in processing session data involving multiple participants, contextually intricate. Currently existing large language models rely primarily on standard Self-Attention (Self-Attention) mechanisms to handle long text sequences, with the model treating the entire dialog history as coming from one anonymous or unified user. All dialog turns are simply spliced chronologically into a linear text sequence as contextual input to the model, which lacks built-in mechanisms to identify and distinguish the different participants in the dialog. Thus, existing large language models handle de-identified text streams that cannot bind and understand content to a particular speaker. Although the algorithm can capture semantic associations between Token, it essentially treats the input data as a homogenous text stream, lacking explicit modeling and differentiation of the high-level semantic feature of "character identity". In a multi-persona scenario, this indiscriminate manner of attention computation has the obvious disadvantage that it is difficult to structurally distinguish between the standpoint and perspective evolution of different speakers, resulting in models that are susceptible to interference from non-relevant persona information at the time of reasoning. Due to the lack of an attention focusing mechanism for a specific target character, the model cannot dynamically adjust the attention weight to the context according to the currently set character identity, so that the generated reply often has the following technical problems: 1. Role confusion conflicts with views-the model cannot distinguish views of different roles. For example, in a dialect, it regards both the arguments of the positive and negative directions as contradictory statements of the same user, so that a reply may be generated that tries to reconcile the contradictions, but is confusing from a standpoint, and the contradiction of the dialect cannot be reflected. 2. Deviation is intended to be understood that the model may erroneously correlate the context of questions or statements posed by different roles. For example, the fact that the hypothetical problem posed by role a is mistaken for role B validation results in a fundamental deviation in understanding the current query. 3. Reply lacks pertinence and directionality-since the objects and participants of the conversation cannot be identified, the reply of the model is generalized, owner-oriented, and personalized response for a particular role cannot be achieved (e.g., "My advice is.+ -. For the user A's just-presented needs"). 4. The method is not suitable for complex collaboration scenes, and the prior art is completely inadequate in advanced scenes where accurate recording of speaking attribution, summarizing of multiparty views, or role playing (such as simulated interviews and forensic dialects) is required. Disclosure of Invention In order to solve the technical problems of role confusion, viewpoint conflict and intention understanding deviation which occur when the conventional large language model processes multi-party participants and context complicated session data, the invention provides schemes in the following aspects. In a first aspect, the present invention provides a multi-context conversational large language model system, comprising a multi-role context management module as an input to the system for receiving raw multi-party conversational data, assigning each participant a unique role identifier, and converting the conversational history into a structured sequence comprising the role identifier, role context information, conversational content, standpoints, preferences, and timestamps, to output standardized multi-role context data; The character information injection module is used for extracting character metadata and dialogue content from the structural sequence, and carrying out structural fusion on the character metadata and the dialogue content through a predefined injection mechanism so as to construct a large language model input prompt containing explicit character characteristics; And the role perception reasoning generation module is internally provided with a large language model based on a Transformer framework and is used for receiving the input prompt and reasoning, and an improved attention calculation unit is utilized in the reasoning