CN-121996767-A - Implementation method for editing personality traits of large language model in fine granularity

CN121996767ACN 121996767 ACN121996767 ACN 121996767ACN-121996767-A

Abstract

The invention discloses a method for realizing fine-granularity editing of personality traits of a large language model, which belongs to the technical field of artificial intelligence, and the method constructs a high-quality question-answer data set of positive and negative pole traits based on a large five-personality theory, and is subjected to double-model verification and human expert review optimization; and dynamically combining the weights of all LoRA expert modules based on the MoE framework through a gating function to realize fine granularity editing of personality traits. The invention solves the problems of personality drift, high resource consumption and poor controllability in the prior art, has the advantages of fine granularity control, flexible switching, low storage cost and high interpretability, can be widely applied to multiple scenes such as role playing, virtual customer service and the like, and provides a high-efficiency solution for personalized interaction of a large language model.

Inventors

ZHANG TONGXUAN
YANG HAORAN

Assignees

天津师范大学

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (8)

1. The implementation method for editing personality traits of large language model in fine granularity is characterized by comprising the following steps: S101, acquiring a question-answer data set containing personality trait theme scenes for subsequent model training; s102, carrying out multi-round backtracking verification and quality optimization on the question-answer data set to construct a personality trait question-answer data set; s103, based on the constructed personality trait question-answering data set, adopting LoRA high-efficiency supervision fine-tuning method to respectively train to obtain a plurality of LoRA expert modules aiming at the specific personality trait; s104, carrying out feature fusion on LoRA expert modules with different personality traits based on the MoE framework, and training a gating function on a corresponding personality trait question-answer data set to learn to dynamically allocate the optimal combination weight of the personality trait LoRA expert modules, thereby realizing controllable editing and accurate adjustment of the personality traits of the loading model LLaMA-3-8B-Instruct.
2. The method according to claim 1, wherein in step S101, the question-answering data set is generated by using the description of the big five personality traits as prompt information and invoking the GPT-4o model to generate a corresponding response for each contextual problem in the question-answering data set, the generated response being in accordance with the expected personality trait settings.
3. The method according to claim 2, wherein in step S101, the large five-personality trait comprises a positive-polarity personality trait and a negative-polarity personality trait, each of which comprises five-dimensional traits of openness, disfigurement, exology, handedness, and neuro.
4. The method for editing personality traits of a large language model with fine granularity according to claim 2, wherein in step S101, the prompt word format of the GPT-4o generated response is system instruction, personality trait description, output requirement and scenario problem.
5. The method for realizing personality traits of the fine-grained compiled large language model according to claim 1, wherein in step S102, the multiple rounds of backtracking verification and quality optimization comprise scoring all responses generated by using DeepSeek-V3 to GPT-4o according to the matching degree of the responses to expected sub-traits by 1 to 5, judging that the data quality is higher when the scoring result is 4 or more, considering that the sub-trait labels are not matched with the sub-traits captured by the model if the scoring result is less than 4, returning the data to the GPT-4o for response improvement, and directly discarding if the scoring result is less than 4 in three continuous iterations.
6. The method of claim 5, wherein in step S102, human expert sampling and review are performed on the screened data, 10% of samples are extracted from each sub-feature class, 3 or more experts judge that the data is valid, and when the qualification rate of the sub-feature class is lower than 90%, the generation flow is optimized and then the data is reviewed again.
7. The method for editing personality traits of large language model with fine granularity according to claim 1, wherein in step S103, the LoRA efficient supervised fine tuning method is implemented by freezing the base weights of the original model LLaMA-3-8B-Instruct Limiting the parameter update to a low rank decomposition form, the process can be expressed as: ; wherein DeltaW 0 represents the updated parameters of the training phase, And Representing two low rank matrices, din and dout representing the dimensions of the input and output features, respectively, of rank Training parameters in the two low-rank matrixes A and B only; And training 2m multiplied by n LoRA expert modules based on the personality trait question-answer data set, wherein each LoRA expert module is special for learning and expressing a specific personality trait, m represents that one personality trait corresponds to m personality traits, and n represents that the model is to be controlled to simulate n personality traits.
8. The method according to claim 1, wherein in step S104, the implementation of the feature fusion comprises aggregating LoRA expert modules of the sub-features into LLaMA-3-8B-Instruct model in MoE mode, and introducing a trainable weight parameter omega into each LoRA layer, so that the input of the layer is multiplied by the weight parameter omega before entering the corresponding LoRA expert module, and treating LoRA expert modules of each sub-feature as an independent expert Its weight matrix Can be expressed as: ; Wherein, the And Representing two low rank matrices, din and dout representing the dimensions of the input and output features, respectively, of rank The gating function is responsible for learning and distributing the optimal combination weight of the simulation target personality, and is used for a scene of five big personality Wherein O represents openness, C represents disfigurement, E represents camber, A represents handedness, N represents neural matter, and the model simulates personality type The forward propagation process at this time can be expressed as: ; Wherein O pi represents the output of the model when simulating personality P i , W 0x and W Ex represent the basic weight of the original model LLaMA-3-8B-Instruct and the specific weight of LoRA expert modules respectively, M represents the total number of LoRA expert modules used for simulating sub-personality characteristics in the current MoE architecture, x represents the input vector, and the gating function The system is a multi-layer perceptron network with 256 neuron hidden layers, wherein an activation function adopts a ReLU, W G represents a parameter matrix used for learning in a gating mechanism, the input dimension of the gating network is the size of a model hidden layer, the output dimension is the number of human lattice characteristics multiplied by LoRA layers, and the gating function Dynamically adjusting output weights according to input x And (3) scaling the input of the jth sub-feature LoRA expert module to realize personality simulation, wherein B j A j represents a LoRA low-rank weight matrix corresponding to the jth expert module, and alpha and r are constant scale factors and rank parameters of the LoRA expert module respectively.

Description

Implementation method for editing personality traits of large language model in fine granularity Technical Field The invention belongs to the technical field of artificial intelligence, and particularly relates to a method for realizing fine-granularity editing of personality traits of a large language model. Background Along with the development of a large language model, the large language model has higher potential in simulating various personality traits and is widely applied to scenes such as role playing, virtual customer service, game NPC, psychological assistance and the like. In order to make the model more fit to specific roles or user preferences, personality-controllable generation technology becomes a research hotspot, and the core goal is to accurately adjust the performance of the model in different personality dimensions on the premise of maintaining the core language capability of the model. At present, personality trait control of the existing large language model mainly adopts two technical schemes, namely a prompt word injection method, a response of a corresponding style is generated by adding a personality description prompt word guide model in input, and a full-parameter fine tuning method, wherein all parameters of the model are updated to adapt to a target personality. However, both schemes have the obvious defects that the prompt word injection method is easy to generate personality drift in long dialogue and lacks stability to conflict personality change, and the full-parameter fine tuning method can adjust model parameters, but has the defects of reduced task performance and high cost when different personality is stored and switched. Along with the exponential increase of the training data volume of a large language model and the scale of model parameters, the requirements for personality control generation are increasingly diversified, the contradiction between limited computing and storage resources and the requirements for high-efficiency loading and switching of various personality traits are increasingly highlighted, and great challenges are brought to the deployment and application of the model. Therefore, the prior art cannot meet the core requirements in practical applications, and a new technical solution is needed to solve the above problems. Disclosure of Invention The invention aims to provide a method for realizing fine-granularity editing of personality traits of a large language model. In order to solve the technical problems, the technical scheme adopted by the invention is that the method for realizing the personality traits of the large language model by editing the personality with fine granularity comprises the following steps: S101, acquiring a question and answer data set containing a personality trait theme scene for subsequent model training, namely adopting a large five-personality trait description written by a human expert as prompt information, and calling a GPT-4o model to generate a corresponding response for each situation problem in the data set, wherein the generated content is required to be reflected and accords with the expected personality trait setting. Wherein the large five personality traits include a positive polarity personality trait (+) and a negative polarity personality trait (-). Wherein, GPT-4o generates the prompt word format of response: prompting words, system instructions, personality trait descriptions, output requirements and scene problems. S102, carrying out multi-round backtracking verification and quality optimization on an original question-answer data set to construct a personality trait question-answer data set; to ensure the accuracy and consistency of the question and answer dataset, a DeepSeek-V3 model was introduced to verify and score personality traits embodied in the answers. The total response generated by DeepSeek-V3 to GPT-4o is scored according to the degree of fit of the total response to the expected sub-characteristics by 1 to 5 points, the data quality is judged to be higher when the scoring result is 4 points or more, if the scoring result is lower than 4 points, the sub-characteristics label is not matched with the sub-characteristics captured by the model, and the data is returned to GPT-4o for response improvement. If the score of the same piece of data in three continuous iterations is lower than 4, the same piece of data is directly discarded. Wherein each human lattice trait dataset is approximately 2000 samples in size to ensure a balance of the number of samples for each sub-trait category. For a single personality, the total training data of the gating function is 3000 samples, wherein 500 samples are randomly extracted from each sub-trait data set to be synthesized into a personality trait data set for gating function training of which the weight of the LoRA expert modules is distributed, then 5 human experts sample and evaluate the personality trait data set, and the extraction proportion of