Search

CN-121996534-A - Block chain-based large model AI intelligent evaluation method, device, equipment and storage medium

CN121996534ACN 121996534 ACN121996534 ACN 121996534ACN-121996534-A

Abstract

The application discloses a large model AI intelligent evaluation method, device, equipment and storage medium based on a blockchain, and relates to the technical field of artificial intelligence, comprising the steps of acquiring an evaluation request, and determining a target evaluation node by utilizing an AI arbitration node, a preset node reputation library and a model type in the blockchain; and calling each target evaluation node and evaluating the target large model based on the reinforced test questions to record the obtained to-be-processed evaluation result in the blockchain, and calling a preset dynamic evaluation engine to analyze the recorded preliminary evaluation result and the to-be-processed evaluation result to obtain a fusion analysis result so as to generate an evaluation report. To improve the efficiency of evaluating large models.

Inventors

  • WANG YIMING
  • YIN PING
  • CHEN YANQING
  • AN XIAOBO

Assignees

  • 青岛浪潮海若人工智能有限公司

Dates

Publication Date
20260508
Application Date
20251023

Claims (10)

  1. 1. A large model AI intelligent evaluation method based on a block chain is characterized by comprising the following steps: acquiring an evaluation request corresponding to a target large model, and determining a target evaluation node by using an AI arbitration node in the blockchain, a corresponding preset node reputation library and a model type corresponding to the target large model; Invoking each target evaluation node, performing preliminary evaluation on the target large model based on a preset static evaluation data set, recording a preliminary evaluation result in the blockchain through an intelligent contract, and then invoking a preset dynamic evaluation engine in the AI arbitration node to perform vulnerability identification on the preliminary evaluation result in the blockchain to obtain a vulnerability identification result, and if the vulnerability identification result represents that the target large model has a vulnerability, generating a reinforced test question corresponding to the vulnerability; Calling each target evaluation node and evaluating the target large model based on the reinforced test questions so as to record the obtained to-be-processed evaluation result in the blockchain, and then calling a machine learning model in the preset dynamic evaluation engine to carry out multi-dimensional fusion analysis on the preliminary evaluation result recorded in the blockchain and the to-be-processed evaluation result to obtain a fusion analysis result; And generating an evaluation report comprising a comprehensive score, a fine granularity capability radar chart, a defect analysis and an improvement suggestion based on the fusion analysis result so as to set the evaluation report as a target evaluation result to be recorded in the blockchain.
  2. 2. The blockchain-based large model AI intelligent evaluation method of claim 1, wherein the obtaining an evaluation request corresponding to a target large model and then determining a target evaluation node using AI-arbitrated nodes in the blockchain, a corresponding preset node reputation library, and model types corresponding to the target large model comprises: Acquiring an evaluation request corresponding to a target large model and a model type corresponding to the target large model, and determining a corresponding preset node reputation library based on the evaluation request so as to acquire a plurality of evaluation nodes from the preset node reputation library by utilizing AI arbitration nodes and the model types in the blockchain, wherein the historical evaluation quality scores, response speeds and professional field labels of all the evaluation nodes are recorded in the preset node reputation library; determining node information which respectively corresponds to each evaluation node and comprises a historical evaluation quality score, a response speed and a professional field label, respectively matching each node information with the model type to obtain a corresponding matching degree score, and setting an evaluation node with the highest score in each matching degree score as a target evaluation node; The professional field label comprises a code evaluation label and a medical text evaluation label, the historical evaluation quality score is a quality score determined based on the accuracy and the reliability of the historical evaluation, and the response speed is a speed determined based on the delay time of the evaluation node for processing the evaluation task; And acquiring node performance corresponding to each evaluation node, and updating the preset node reputation library based on each node performance to obtain a new preset node reputation library, wherein the node performance comprises node real-time load conditions, node response speed and historical evaluation quality scores.
  3. 3. The blockchain-based large model AI intelligent evaluation method of claim 1, wherein the calling each target evaluation node and performing preliminary evaluation on the target large model based on a preset static evaluation data set, and recording the preliminary evaluation result in the blockchain through an intelligent contract, comprises: Invoking the target evaluation node and testing the target large model based on a locally stored preset static data set, generating a model answer and a preliminary score, packaging the model answer and the preliminary score into an intelligent contract transaction, and then storing the intelligent contract transaction into the blockchain, wherein the static data set comprises preset test questions covered with a plurality of fields and capability dimensions and corresponding standard answers; Detecting the authenticity and the integrity of the data of the intelligent contract transaction by utilizing a consensus mechanism in the blockchain to obtain a detection result, and if the detection result represents that the detection is passed, calling the target evaluation node to analyze the intelligent contract transaction to obtain a preliminary analysis result comprising the performance index of the target large model on a specific task; And determining a preliminary evaluation result based on the intelligent contract transaction and the preliminary analysis result, and recording the preliminary evaluation result, the timestamp and the node identification in the blockchain through an intelligent contract.
  4. 4. The blockchain-based large model AI intelligent evaluation method of claim 1, wherein the invoking the preset dynamic evaluation engine in the AI-arbitrating node performs vulnerability identification on the preliminary evaluation result in the blockchain to obtain a vulnerability identification result, and if the vulnerability identification result characterizes that the target large model has a vulnerability, generating an enhanced test question corresponding to the vulnerability, including: determining a vulnerability identification threshold value, and calling a preset dynamic evaluation engine in the AI arbitration node to perform vulnerability identification on a preliminary evaluation result in the blockchain to obtain a to-be-processed vulnerability identification result, wherein the to-be-processed vulnerability identification result is the vulnerability of the target large model in a preset field, preset capacity or preset attribute; judging whether each vulnerability identification result to be processed is larger than a preset vulnerability identification threshold value, and if the vulnerability identification result to be processed is larger than the preset vulnerability identification threshold value, setting the vulnerability identification result to be processed as a target vulnerability identification result; Generating an enhanced test question corresponding to the target vulnerability recognition result and comprising an antagonism sample and a pertinence difficult problem based on the target vulnerability recognition result by using a preset large language model, packaging the enhanced test question into intelligent contract transactions, and distributing the intelligent contract transactions to each evaluation node through the blockchain network.
  5. 5. The blockchain-based large model AI intelligent evaluation method of claim 1, wherein the invoking each of the target evaluation nodes and evaluating the target large model based on the enhanced test question to record the resulting to-be-processed evaluation result in the blockchain comprises: Invoking local resources corresponding to each target evaluation node, evaluating the target large model based on a current enhanced test question to obtain a current to-be-processed evaluation result comprising a model answer and a performance index, and judging whether the current to-be-processed evaluation result meets a preset stop condition; Stopping the operation of evaluating the target large model if the current to-be-processed evaluating result meets the preset stopping condition, and restarting to the step of calling local resources corresponding to each target evaluating node and evaluating the target large model based on the current enhanced testing questions if the current to-be-processed evaluating result does not meet the preset stopping condition; The preset stopping condition comprises that the evaluating time reaches a preset time threshold, the evaluating times reach a preset identification threshold and the evaluating node resources are exhausted.
  6. 6. The blockchain-based large model AI intelligent evaluation method according to claim 1, wherein the calling the machine learning model in the preset dynamic evaluation engine performs multidimensional fusion analysis on the preliminary evaluation result and the to-be-processed evaluation result recorded in the blockchain to obtain a fusion analysis result, and the method comprises the following steps: Acquiring the preliminary evaluation result and the to-be-processed evaluation result from the blockchain, and calling a multi-mode learning model in the preset dynamic evaluation engine to perform the result consistency of the same problem or similar problems on the preliminary evaluation result and the to-be-processed evaluation result at different time or by different nodes to obtain a result consistency evaluation result; Invoking a graph neural network in the preset dynamic evaluation engine to perform evaluation of checking problem perturbation or model output result stability under resistance attack on the preliminary evaluation result and the to-be-processed evaluation result to obtain a robustness evaluation result; Invoking a causal inference model in the preset dynamic evaluation engine to evaluate the tendency of a specific group or topic on the preliminary evaluation result and the to-be-processed evaluation result to obtain a bias evaluation result; and determining a fusion analysis result based on the result consistency evaluation result, the robustness evaluation result and the bias evaluation result.
  7. 7. The blockchain-based large model AI intelligent evaluation method of any of claims 1-6, wherein the generating an evaluation report including a composite score, a fine granularity capability radar chart, a defect analysis, and a refinement suggestion based on the fusion analysis result to set the evaluation report as a target evaluation result record into the blockchain comprises: Generating an evaluation report containing comprehensive scores, fine granularity capability radar graphs, defect analysis and improvement suggestions by using a report generation center in the AI arbitration node and based on the fusion analysis result, and packaging the evaluation report into an intelligent contract transaction record on a blockchain; The comprehensive score is a score obtained by weighting calculation based on the weight corresponding to the fusion analysis result and used for reflecting the overall performance of the target large model, the fine-granularity capacity radar chart is used for showing the performance of the target large model in different dimensions, the defect analysis is used for describing weaknesses corresponding to the target large model and influences corresponding to the weaknesses, the improvement proposal is an optimization measure corresponding to the defect analysis, and the evaluation report can be inquired and verified in the blockchain through an interactive interface and has non-tamper property and traceability.
  8. 8. A large model AI intelligent evaluation device based on block chain is characterized by comprising: The evaluation node determining module is used for acquiring an evaluation request corresponding to a target large model, and then determining a target evaluation node by utilizing an AI arbitration node in the blockchain, a corresponding preset node credit library and a model type corresponding to the target large model; The reinforced test question determining module is used for calling each target evaluating node, performing preliminary evaluation on the target large model based on a preset static evaluation data set, recording the preliminary evaluation result in the blockchain through an intelligent contract, then calling a preset dynamic evaluation engine in the AI arbitration node to perform vulnerability recognition on the preliminary evaluation result in the blockchain to obtain a vulnerability recognition result, and generating a reinforced test question corresponding to the vulnerability if the vulnerability recognition result represents the weakness of the target large model; The fusion analysis result determining module is used for calling each target evaluating node and evaluating the target large model based on the reinforced test questions so as to record the obtained to-be-processed evaluation result in the blockchain, and then calling a machine learning model in the preset dynamic evaluation engine to carry out multi-dimensional fusion analysis on the preliminary evaluation result recorded in the blockchain and the to-be-processed evaluation result to obtain a fusion analysis result; And the evaluation report generation module is used for generating an evaluation report comprising a comprehensive score, a fine granularity capability radar chart, a defect analysis and improvement suggestion based on the fusion analysis result so as to set the evaluation report as a target evaluation result to be recorded in the blockchain.
  9. 9. An electronic device, comprising: A memory for storing a computer program; A processor for executing the computer program to implement the blockchain-based large model AI intelligent evaluation method as claimed in any of claims 1 to 7.
  10. 10. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the blockchain-based large model AI intelligent evaluation method of any of claims 1-7.

Description

Block chain-based large model AI intelligent evaluation method, device, equipment and storage medium Technical Field The invention relates to the technical field of artificial intelligence, in particular to a large model AI intelligent evaluation method, device, equipment and storage medium based on a blockchain. Background Currently, with the rapid development of Large Language Model (LLM) technology, it becomes important to objectively, just and comprehensively evaluate the performance of the model. The prior art has attempted to solve the creditworthiness problem of the evaluation using blockchains. For example, the multi-party consensus evaluating system based on the blockchain evaluates the model through a plurality of evaluating parties (blockchain nodes) and links the data, and ensures the transparency of the evaluating process and the public confidence of the result by utilizing the non-tamperable and distributed consensus characteristics of the blockchain. However, through extensive analysis, the existing solutions still have the following significant drawbacks: Static stiffness of the evaluation strategy, namely the evaluation standard and the data set used by each evaluation party are preset and static unchanged. The 'one-tool' mode cannot sense the unique performance of the tested model in real time in the evaluating process, and aims at specific weaknesses (such as logic loopholes, facts errors, prejudices and the like) of the model to be tested, so that the depth and the flexibility of the model to be evaluated are insufficient. The node selection mechanism lacks intellectualization, namely, the nodes participating in evaluation are selected by relying on the traditional consensus algorithm (such as Raft and DPoS), and the decentralization is ensured, but the historical evaluation quality, response speed and professional field difference of each evaluation node are not fully considered, so that the evaluation efficiency is low, or the low-quality and unmatched nodes participate in key consensus to influence the overall evaluation effect. The comprehensive analysis capability of the result is weak, the final comprehensive evaluation result usually depends on simple fractional weighted average or summary, and lacks an intelligent analysis layer which can deeply mine, fuse and analyze multi-side evaluation data and identify model deep behavior patterns, consistency, robustness and bias from the multi-side evaluation data, so that the evaluation insight is limited. From the above, how to improve the efficiency of evaluating a large model in the blockchain-based large model AI intelligent evaluation process is a problem to be solved at present. Disclosure of Invention In view of the above, the present invention aims to provide a blockchain-based large model AI intelligent evaluation method, device, equipment and storage medium, which can improve the efficiency of evaluating a large model in a blockchain-based large model AI intelligent evaluation process. The specific scheme is as follows: in a first aspect, the present application provides a blockchain-based large model AI intelligent evaluation method, including: acquiring an evaluation request corresponding to a target large model, and determining a target evaluation node by using an AI arbitration node in the blockchain, a corresponding preset node reputation library and a model type corresponding to the target large model; Invoking each target evaluation node, performing preliminary evaluation on the target large model based on a preset static evaluation data set, recording a preliminary evaluation result in the blockchain through an intelligent contract, and then invoking a preset dynamic evaluation engine in the AI arbitration node to perform vulnerability identification on the preliminary evaluation result in the blockchain to obtain a vulnerability identification result, and if the vulnerability identification result represents that the target large model has a vulnerability, generating a reinforced test question corresponding to the vulnerability; Calling each target evaluation node and evaluating the target large model based on the reinforced test questions so as to record the obtained to-be-processed evaluation result in the blockchain, and then calling a machine learning model in the preset dynamic evaluation engine to carry out multi-dimensional fusion analysis on the preliminary evaluation result recorded in the blockchain and the to-be-processed evaluation result to obtain a fusion analysis result; And generating an evaluation report comprising a comprehensive score, a fine granularity capability radar chart, a defect analysis and an improvement suggestion based on the fusion analysis result so as to set the evaluation report as a target evaluation result to be recorded in the blockchain. Optionally, the obtaining an evaluation request corresponding to the target large model, and then determining the target evaluation node by using