CN-121996487-A - AEC-Q standard-oriented training method, system and storage medium for special large model in vehicle-mounted chip test field

CN121996487ACN 121996487 ACN121996487 ACN 121996487ACN-121996487-A

Abstract

The invention relates to the field of large model application and discloses a training method, a system and a storage medium of a special large model for the AEC-Q standard vehicle-gauge chip test field, wherein the method adopts a three-stage progressive architecture of continuous training, reinforced fine tuning and special fine tuning, S1 carries out multi-mode analysis and two-stage segmentation on an AEC-Q standard document, completes continuous training through causal language model loss, and establishes a knowledge system for the vehicle-gauge chip test field; S2, sequentially executing low-rank adaptive fine tuning and group relative strategy optimization reinforcement learning based on the same error correction data set to enable the model to have error positioning and correction reasoning capability, and S3 utilizing structured question-answering to conduct fine tuning on the data to achieve accurate question-answering. The system comprises a data preparation module, a model training module and a parameter updating module, and is used for automatically executing the steps. According to the invention, training can be completed in a single-card GPU, the universal capability and the professional performance are considered, and the error correction accuracy and the inspection efficiency of the vehicle-gauge chip test report can be remarkably improved.

Inventors

LEI LILI
ZHANG JIALONG
MA CHUNLEI

Assignees

北京国家新能源汽车技术创新中心有限公司

Dates

Publication Date: 20260508
Application Date: 20260123

Claims (10)

1. A training method of a special large model for the AEC-Q standard-oriented vehicle-mounted chip test field is characterized by at least comprising the following steps: s1, a training stage, namely continuously training a pre-training large model by using a field block text obtained by multi-mode analysis and two-stage block of an AEC-Q vehicle-gauge chip test standard document so as to enable the model to build a vehicle-gauge chip test field knowledge system; s2, a reinforcement fine tuning stage, namely, on the basis of a model after subsequent training, performing parameter efficient fine tuning on the model in a low-rank adaptive mode, and performing reinforcement learning on the same error correction data set by adopting a group relative strategy optimization algorithm to obtain a vehicle gauge test error correction model with error positioning and correction reasoning capability; s3, a special fine tuning stage, namely fine tuning the error correction model by using AEC-Q standard structured questions and answers to the data, so that the model grasps a business rule of 'asking questions based on specified fields-outputting corresponding field contents', and accurate field-level questions and answers in the field of vehicle-gauge chip testing are realized.
2. The training method of the large model special for the vehicle-mounted chip test field facing the AEC-Q standard according to claim 1, wherein the multi-mode analysis and two-stage segmentation in the step S1 comprises: s11, converting the AEC-Q standard PDF file into a Word file for retaining directory levels, tables and picture metadata by adopting a multi-mode document analysis tool; s12, in the stage one, serializing a table in a Word file into a MarkDown text, calling a visual model to convert a flow chart and a framework chart into descriptive text, and generating an initial data block according to a directory structure; S13, based on the maximum context length of the model, merging or cutting the initial data blocks to obtain continuous training samples with the length of 512-1536 tokens.
3. The training method of the special large model for the AEC-Q standard-oriented vehicle-mounted chip test field according to claim 1 or 2, wherein the continuous training in the step S1 adopts a causal language model loss function, and the specific function is as follows: ; Wherein L (theta) is training loss corresponding to model parameter theta, N is total number of samples for continuous training, T is the token length of a single sample, x i,t is the real token at the T-th position in the i-th question-answer pair, and P (x i,t ,…,x i,t-1 ; theta) is the probability of predicting the T-th token according to the T-1 token before the i-th sample under the parameter theta; and iteratively updating all parameters of the model by minimizing the prediction cross entropy of the model to the text sequence in the vehicle rule field.
4. The training method of the large model special for the vehicle-mounted chip testing field facing the AEC-Q standard according to claim 2, wherein the step S2 at least comprises the following steps: S21, data preparation, namely embedding data according to catalogue blocks obtained in the first invoking stage into fact reversal, data extreme deviation or core conclusion tampering errors into the designed workflow, and generating an error correction data set containing an error position index, an error reason and corrected text according to AEC-Q standard original text reasoning; S22, model training, namely performing LoRA low-rank adaptive fine adjustment on the model after subsequent training, freezing trunk parameters, only updating adapter parameters, performing reinforcement learning by taking the obtained error correction data set as input and adopting a group relative strategy optimization algorithm, and iteratively updating the strategy model through a reward function driven by combining format rewards and accuracy rewards, so that model output accords with a four-section error correction framework and positioning and correction are accurate.
5. The method for training a large model special for the field of vehicle-mounted chip testing oriented to the AEC-Q standard according to claim 4, wherein the objective function of the set of relative strategy optimization algorithms in step S22 is: ; Wherein J GRPO (theta) is an objective function of GRPO algorithm, E [ Q-P (Q) ] is a query sample Q for testing and correcting the vehicle regulation and takes expectations, pi θold (O|Q) is an old strategy, pi θ (o i |Q) is a new strategy, clip (1-epsilon, 1+epsilon) is a cut-off operation, epsilon is a super-parameter, A i is an advantage function output by an ith group, beta is a penalty coefficient of KL divergence, and D KL (π θ ||π ref ) is KL divergence of the new strategy and a reference strategy; D KL (π θ ||π ref ) is specifically as follows: ; Wherein pi ref (o i |Q) is the predicted probability of the reference strategy pi ref to the i-th group error correction output o i , pi θ (o i |Q) is the predicted probability of the new strategy pi θ to the i-th group error correction output o i ; wherein, the reward function is as follows: ; Wherein A i is a dominance function corresponding to the i-th group error correction output, R i is a rewarding value obtained by the i-th group error correction output, mean (R 1 ,r 2 ,…,r G ) is a rewarding Mean value of the same batch of G-group error correction outputs, and std (R 1 ,r 2 ,…,r G ) is a rewarding standard deviation of the same batch of G-group error correction outputs.
6. The training method of the large model special for the vehicle-mounted chip testing field facing the AEC-Q standard according to claim 1, wherein the step S3 at least comprises: S31, data preparation, namely constructing a service model containing fields of test items, test contents, test conditions and judgment standards by service specialists according to AEC-Q standard files and test reports, writing specific test requirements into a database table, and generating a structured question-answer pair sample in batches by designating part of field contents as questions and the rest of field contents as answers; S32, model training, namely, taking a question-answer pair sample as input, and using a causal language model loss function to finely tune the model obtained in the step S2, and updating parameters related to field mapping preferentially until the proportion of the sample, of which the model output answer is completely consistent with the field content corresponding to the database, reaches a preset threshold.
7. The method for training the special large model in the field of AEC-Q standard-oriented vehicle-mounted chip testing according to claim 1, wherein the updating formula of the parameters in the step S1-S3 during back propagation is as follows: ; Wherein, θ t+1 is the updated model parameter, θ t is the current model parameter before updating, η is the learning rate; is the gradient of the loss function L (θ) at the current parameter θ t .
8. An AEC-Q standard oriented training system for a large model special for the field of vehicle-mounted chip testing, which is characterized in that the training system is built based on the training method according to any one of claims 1-7, and at least comprises: The data preparation module is used for obtaining a field block text after multi-mode analysis and two-section block of the AEC-Q vehicle-mounted chip test standard document, and further generating an error correction data set and a structured question-answer pair sample; Model training module for carrying out in proper order: Training the pre-training large model continuously by using the segmented text; performing parameter efficient fine adjustment on the model after continuous training in a low-rank adaptive mode, and performing reinforcement learning on the error correction data set by adopting a group relative strategy optimization algorithm to obtain a vehicle gauge test error correction model; The question-answer pair sample is used for fine adjustment of the error correction model, so that the error correction model has field-level accurate question-answer capability; and the parameter updating module is used for iteratively updating the model parameters through gradient descent and back propagation of the loss function.
9. The training system of the AEC-Q standard-oriented vehicle-mounted chip test field-specific large model of claim 8, wherein the data preparation module comprises: The multi-mode analysis sub-module is used for converting the AEC-Q standard PDF into a Word file for retaining directory levels, tables and picture metadata; the two-section type block sub-module is used for serializing the form into a MarkDown text, converting the flow chart/architecture diagram into a descriptive text, and merging or cutting according to tokens lengths to obtain training samples; the error correction data set generation sub-module is used for implanting fact inversion, data extreme deviation or core conclusion tampering errors into the block data and outputting an error correction data set containing an error position index, an error reason and corrected text; and the question-answer pair generation sub-module is used for generating structured question-answer pair samples in batches according to the AEC-Q service model field.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-7.

Description

AEC-Q standard-oriented training method, system and storage medium for special large model in vehicle-mounted chip test field Technical Field The invention relates to the field of large model application, in particular to a training method, a training system and a storage medium of a special large model for the AEC-Q standard-oriented vehicle-mounted chip test field. Background The vehicle gauge chip must pass AEC-Q series stress test before loading, and the related test report strictly follows standard clauses. However, the prior art still relies on manual comparison of PDF standards and reports, namely, on one hand, AEC-Q files are long in space, more in tables and diagrams, manual interpretation efficiency is low and easy to miss, and on the other hand, a general large model is difficult to directly use for report generation or verification because the general large model does not systematically learn AEC-Q knowledge and is always in the presence of illusion and tense plum wear when facing professional questions such as 'temperature cycling conditions', 'failure criteria', and the like. To increase the degree of automation, the industry has attempted to vectorize AEC-Q text and then to do search enhancement (RAG). Meanwhile, the general model is only fine-tuned by a simple field corpus, and lacks special training of a complete reasoning chain of 'error positioning-cause analysis-correction result', so that the reliability of error identification and correction cannot be ensured in a test report error correction scene. Therefore, a large model training method special for vehicle-gauge chip test, which can completely retain AEC-Q standard structural information and has the capability of system error correction reasoning, is needed. Disclosure of Invention The invention aims to provide a training method, a training system and a training medium for a special large model in the AEC-Q standard-oriented vehicle-mounted chip test field, which are used for solving the problems of lack of professional knowledge, poor error correction reliability and low field-level question-answering accuracy caused by multi-mode information analysis deletion, insufficient reasoning links and catastrophic forgetting in the prior art. In order to achieve the technical aim and the technical effect, the invention discloses a training method of a special large model for the vehicle-mounted chip test field facing the AEC-Q standard, which at least comprises the following steps: s1, a training stage, namely continuously training a pre-training large model by using a field block text obtained by multi-mode analysis and two-stage block of an AEC-Q vehicle-gauge chip test standard document so as to enable the model to build a vehicle-gauge chip test field knowledge system; s2, a reinforcement fine tuning stage, namely, on the basis of a model after subsequent training, performing parameter efficient fine tuning on the model in a low-rank adaptive mode, and performing reinforcement learning on the same error correction data set by adopting a group relative strategy optimization algorithm to obtain a vehicle gauge test error correction model with error positioning and correction reasoning capability; s3, a special fine tuning stage, namely fine tuning the error correction model by using AEC-Q standard structured questions and answers to the data, so that the model grasps a business rule of 'asking questions based on specified fields-outputting corresponding field contents', and accurate field-level questions and answers in the field of vehicle-gauge chip testing are realized. Preferably, in the step S1, the multi-mode parsing and the two-stage partitioning include: s11, converting the AEC-Q standard PDF file into a Word file for retaining directory levels, tables and picture metadata by adopting a multi-mode document analysis tool; s12, in the stage one, serializing a table in a Word file into a MarkDown text, calling a visual model to convert a flow chart and a framework chart into descriptive text, and generating an initial data block according to a directory structure; S13, based on the maximum context length of the model, merging or cutting the initial data blocks to obtain continuous training samples with the length of 512-1536 tokens. Preferably, the continuous training in step S1 adopts a causal language model loss function, and the specific function is as follows: ; Wherein L (theta) is training loss corresponding to model parameter theta, N is total number of samples for continuous training, T is the token length of a single sample, x i,t is the real token at the T-th position in the i-th question-answer pair, and P (x i,t,…,xi,t-1; theta) is the probability of predicting the T-th token according to the T-1 token before the i-th sample under the parameter theta; and iteratively updating all parameters of the model by minimizing the prediction cross entropy of the model to the text sequence in the vehicle rule field. Preferably, the