CN-121981612-A - Multi-comment score deviation correction method and system based on hierarchical estimation

CN121981612ACN 121981612 ACN121981612 ACN 121981612ACN-121981612-A

Abstract

The application provides a multi-commenter scoring deviation correction method and system based on hierarchical estimation, and relates to the technical field of assessment; based on pairwise comparison among the judgments in the scoring data, initial relative severity parameters of the judgments are estimated and obtained, the initial relative severity parameters are taken as fixed estimated values, an item reaction theoretical model matched with the scoring data type is adopted to estimate and obtain initial distinguishing parameters of the judgments and initial capacity parameters of the candidates, the initial relative severity parameters, the initial distinguishing parameters and the initial capacity parameters are taken as initial values, the joint optimization objective function is constructed in advance, joint optimization is carried out with the minimum joint optimization objective function, a final estimated value is obtained, and a corrected scoring result is generated based on the final estimated value. By applying the technical scheme of the application, the capacity parameters of the testee can be determined efficiently in the subjective scoring scene of multiple judges, so that the score data can be corrected more fairly.

Inventors

LIU YIXING
SUN WEN
HAN DONGRAN
WU LIN

Assignees

北京中医药大学

Dates

Publication Date: 20260505
Application Date: 20260206

Claims (10)

1. A multi-panelist score bias correction method based on hierarchical estimation, characterized in that it is applicable to an evaluation scene with limited information content, and executed by a computing device, comprising: Obtaining scoring data from a plurality of panelists for a plurality of candidates; Estimating and obtaining initial relative severity parameters of each commentary person based on paired comparison among the commentary persons in the scoring data; Taking the initial relative severity parameter as a fixed estimated value, and estimating to obtain an initial distinguishing degree parameter of each commender and an initial capacity parameter of each candidate by adopting a project reaction theoretical model matched with the grading data type; Taking the initial relative severity parameter, the initial distinguishing degree parameter and the initial capacity parameter as initial values, and carrying out joint optimization based on a pre-constructed joint optimization objective function containing the severity parameter, the distinguishing degree parameter and the capacity parameter so as to minimize the joint optimization objective function and obtain final estimated values of all the parameters; and generating a corrected scoring result based on the candidate capacity parameter in the final estimated value.
2. The multi-panelist score deviation correction method according to claim 1, wherein the estimating obtains initial relative severity parameters of each of the panelists based on a pair-wise comparison between the panelists in the score data is specifically: under the condition that candidate capacity parameters are not introduced, a paired comparison model for representing the probability of severity difference among the judges is constructed based on the score comparison result of any two judges on the same candidate in the score data; obtaining preliminary severity parameter estimation values of each commender by minimizing a first negative log likelihood function of the pair comparison model; And carrying out centering treatment on the preliminary severity parameter estimation value to enable the mean value of the preliminary severity parameter estimation value to be zero, and obtaining the preliminary relative severity parameter.
3. The multiple panel score deviation correction method of claim 2 wherein the first negative log likelihood function is represented by the expression: Wherein, the A function representing a vector b of parameters of relative severity with respect to the commentator, Representing the probability that the calculated comment i is more severe than comment j based on parameter b, the summation term representing that all scoring against the same candidate is satisfied Is summed up by the pair of judges of the (c), And (3) representing a first observation indicating variable, taking 1 when the score of the comment i is lower than that of the comment j, otherwise taking 0, and assigning values according to a preset rule when the scores are equal.
4. The multiple panel score deviation correction method of claim 3, wherein the pairwise comparison model is represented by the expression: Wherein, the 、 Respectively represent scores of the same candidate by the comment i and the comment j, Indicating the probability that the score of panel i is lower than the score of panel j, 、 Respectively and correspondingly representing the relative severity parameters of the commentators i and j; the minimizing the first negative log likelihood function is implemented by adopting an iterative optimization algorithm or a variation inference method.
5. The multi-panelist score deviation correction method of claim 1, wherein the score data is bounded continuous data and the adapted project response theoretical model includes a Beta model, expressed by the following expression: Wherein, the The position parameter is represented by a parameter of the position, A discrimination parameter representing a comment i, A severity parameter representing a comment i, A capability parameter representing a candidate j, A probability density function representing the observed score value x, The parameter of precision is represented by a parameter of precision, And (x) represents a Gamma function.
6. The method for correcting multiple commentary scores deviation according to claim 5, wherein the initial severity parameter is taken as a fixed estimated value, and an item reaction theoretical model matched with the type of the score data is adopted to estimate and obtain the initial discrimination parameter of each commentary and the initial capability parameter of each candidate, specifically: Setting the initial relative severity parameter to a known constant in the project response theoretical model; constructing likelihood functions of candidate capacity parameters and commentary discrimination parameters based on the project reaction theoretical model; and carrying out parameter joint estimation by using a maximum likelihood estimation method or a conditional maximum likelihood method based on the likelihood function, and synchronously estimating to obtain initial capacity parameters of each candidate and initial distinguishing degree parameters of each commender.
7. The method for correcting deviation of scores of multiple judges according to claim 5, wherein the joint optimization objective function is a second negative log likelihood function, and the expression is specifically as follows: Where Θ represents a joint parameter vector comprising all the relative severity parameter b, the discrimination parameter a, and the capability parameter θ, L (Θ) represents a negative log likelihood function of observed data given the joint parameter vector Θ, Representing the actual scoring result of candidate j by panel i, Representing the probability density function of the Beta distribution.
8. The multiple panel score deviation correction method of claim 7, wherein said minimizing said joint optimization objective function is accomplished using an iterative optimization algorithm, including but not limited to the L-BFGS-B algorithm, or using a variation inference method; the iterative process of the joint tuning takes initial values corresponding to all parameters as starting points, and takes the relative change of the objective function value smaller than a preset threshold value as a convergence condition.
9. The multi-panelist score deviation correction method according to any one of claims 1 to 8, wherein a corrected score result is generated based on candidate capability parameters in the final estimate, in particular: mapping the candidate capacity parameters to a preset scoring interval through scale conversion processing, generating correction scores, and updating the scoring data based on the correction scores; wherein the scale conversion process is implemented based on linear mapping or inverse transformation of an inverse linking function.
10. A multi-panel score deviation correction system based on hierarchical estimation, comprising: The acquisition module is used for acquiring scoring data of a plurality of candidates from a plurality of judges; The first processing module is used for estimating and obtaining initial relative severity parameters of all the commentators based on pairwise comparison among the commentators in the scoring data; the second processing module is used for estimating and obtaining initial distinguishing degree parameters of each commender and initial capacity parameters of each candidate by taking the initial relative severity parameters as fixed estimated values and adopting a project response theoretical model matched with the grading data type; The third processing module is used for taking the initial relative severity parameter, the initial distinguishing degree parameter and the initial capacity parameter as initial values, and carrying out joint optimization based on a pre-constructed joint optimization objective function containing the severity parameter, the distinguishing degree parameter and the capacity parameter so as to minimize the joint optimization objective function and obtain final estimated values of all the parameters; and the correction processing module is used for generating a corrected scoring result based on the candidate capacity parameter in the final estimated value.

Description

Multi-comment score deviation correction method and system based on hierarchical estimation Technical Field The application relates to the technical field of evaluation, in particular to a multi-comment score deviation correction method and system based on hierarchical estimation. Background In subjective scoring scenes (such as academic competition, talent recruitment, teacher evaluation, artistic performance scoring, medical multidisciplinary consultation and the like) in which multiple commentary participants participate, scoring results often directly relate to resource allocation, individual development opportunities and organizational decisions, and fairness and scientificity of the subjective scoring scenes have high reality sensitivity. In current practice, these scenes generally use an arithmetic mean or a truncated mean (i.e., the mean after the highest score and the lowest score are removed) as a calculation method of the final scoring result. This type of approach essentially assumes that different judges have negligible differences in scoring scale and judgment criteria. However, even if external factors such as interest interference and information asymmetry are eliminated, two types of non-negligible systematic deviation sources exist in the multiple-panelist scoring system, namely, a "wide difference" (the degree of tightness of scoring by different panelists) and a "consistency difference" (the stability difference of scoring standards of the same panelist at different times or for different subjects). The existing mean or truncated mean strategy can only weaken the influence of extreme scoring at the statistical level, but cannot identify, quantify or correct the systematic deviations, so that the final scoring result is still technically mixed with the superposition effect of the 'true ability of the evaluated object' and the 'individual characteristics of the commentator', and the result still can deviate from the objective true level. Project reaction theory (Item Response Theory, IRT) is taken as an important branch of modern measurement theory, and separation of tested capability and project characteristics is realized theoretically by constructing a probability model between tested potential characteristics and project parameters. Although IRT has been successful in large-scale application scenarios such as standardized evaluation, educational examination, and psychological measurement, model assumption, parameter estimation flow, and computational architecture are all premised on sufficient sample size, relatively complete response matrix, and equilibrium. The present inventors have recognized that each evaluator may be regarded as a "scoring item", the score given by the evaluation object is determined by the potential characteristics of the evaluator and the evaluation object's own attribute, and the evaluation object's attribute has a structure consistent with the conventional item parameter, i.e. on one hand, the evaluator has a "difficulty" difference, i.e. the evaluator has a higher score and a lower score, which may be understood as a severity, and on the other hand, the evaluator has a "differentiation" difference, i.e. the sensitivity and differentiation ability of different evaluators when differentiating between high and low evaluation objects, which may be understood as consistency of scores. Based on this, the IRT model or its extended model can be applied to the subjective scoring field in an effort to distinguish "tested ability" from "comment characteristics" by parameter estimation, thereby achieving a more fair score interpretation. In practical application, the inventor finds that the existing IRT and an extension model thereof are mostly designed for a large-scale evaluation scene, and model assumption and parameter estimation both depend on full-parameter joint estimation and multi-round iterative optimization. However, in a small sample or sparse scoring environment (such as a situation that only tens of tested persons and a small number of evaluators or scoring records are incomplete), the conventional IRT method often faces technical problems of unstable parameters, difficulty in convergence of a model and the like, so that not only is the calculation efficiency low, but also the reliability of capability parameter estimation is difficult to ensure, and the capability parameter estimation is difficult to directly apply in a real subjective scoring scene. Therefore, how to propose a score deviation parameter correction method for subjective score scenes participated by multiple panelists, to efficiently determine capacity parameters of testees, and to realize fairer correction of score data based on the capacity parameters becomes a technical problem to be solved. Disclosure of Invention In order to overcome the problems in the related art at least to a certain extent, the application provides a multi-commentary-person scoring deviation correction method and s