CN-121998123-A - Training method and system of process fairness machine learning model based on feature attribution interpretation

CN121998123ACN 121998123 ACN121998123 ACN 121998123ACN-121998123-A

Abstract

The training method of the process fairness machine learning model based on feature attribution interpretation comprises the steps of (a) dividing a data point set of the machine learning model into a first set X 1 and a second set X 2 with different sensitivity attributes according to the sensitivity attributes, (b) extracting k pairs of data points from the first set X 1 and the second set X 2 , wherein similarity measurement indexes of two data points in the pairs of data points are minimum, (c) carrying out a regular term penalty with a super parameter of alpha by taking the process fairness loss L GPF as a penalty term based on the cross entropy loss L CE of the machine learning model f θ , so as to obtain an objective function total loss L, training and iterating the machine learning model f θ until the objective function total loss L converges, and (d) outputting a trained machine learning model f θ . A corresponding training system is also provided.

Inventors

WANG ZIMING
HUANG CHANGWU
YAO XIN

Assignees

岭南大学
南方科技大学

Dates

Publication Date: 20260508
Application Date: 20241031

Claims (16)

1. A training method of a process fairness machine learning model based on feature attribution interpretation comprises the following steps: (a) Dividing a data point set of the machine learning model into a first set X 1 and a second set X 2 with different sensitivity attributes according to the sensitivity attributes; (b) K pairs of data points are extracted from the first set X 1 and the second set X 2 , wherein the similarity measure index of two data points in the pairs of data points is the smallest, one data point in the pairs of data points belongs to the first set X 1 , the other data point in the pairs of data points belongs to the second set X 2 , k data points originally belonging to the first set X 1 are selected to form a third set X ′ 1 , k data points originally belonging to the second set X 2 are selected to form a fourth set X ′ 2 , and the pairs of data points are expressed as: And Where i=1, 2,. -%, k; (c) Performing a regular term penalty with a super parameter alpha by taking the process fairness loss L GPF as a penalty term based on a cross entropy loss L CE of a machine learning model f θ so as to obtain an objective function total loss L, training and iterating the machine learning model f θ until the objective function total loss L converges, and (D) And outputting a trained machine learning model f θ .
2. Training method according to claim 1, wherein the total objective function loss L is denoted as L = L CE +α×L GPF , the process fairness loss L GPF being calculated as follows: Wherein, the G () is a local feature-attributable interpretation function, and l q denotes distance.
3. The training method of claim 2, wherein L q is L 1 distance, and the process fairness loss L GPF is calculated according to the following formula:
4. training method according to claim 2, wherein the hyper-parameter α = 0.5.
5. The training method of claim 1, wherein step (a) is represented as follows: A first set X 1 ＝{x (i) ∈X|s (i) ＝s 1 of the two sets, A second set X 2 ＝{x (i) ∈X|s (i) ＝s 2 of the two sets, Wherein s 1 and s 2 are two values with different sensitivity attributes, s (i) is a sensitivity attribute value of data point X (i) , X is a set of data points with m number of machine learning models f θ , i=1, 2.
6. The training method of claim 1, wherein the method of extracting k pairs of data points from the first set X 1 and the second set X 2 in step (b) is as follows: Wherein the method comprises the steps of Is an empty set; For X 1 By making the following steps And (3) with Data similarity measurement index between Minimum find in X 2 Will be Wherein, the Selecting and taking out Pairs of data points; for X 2 By making the following steps And (3) with Data similarity measurement index between Minimum find in X 1 Wherein, the Selecting the rest Pairs of data points.
7. The training method of claim 6, wherein the data similarity metric d x (,) is euclidean distance.
8. The training method of claim 1, wherein model gradient parameters are updated using an Adam optimizer
9. A training system for a process fairness machine learning model based on feature attribution interpretation, comprising: The sensitive attribute dividing module divides the data point set of the machine learning model into a first set X 1 and a second set X 2 with different sensitive attributes according to the sensitive attributes; A data point pair extraction module, wherein the data point pair extraction module extracts k pairs of data points from a first set X 1 and a second set X 2 , two data points in the data point pair have minimum similarity measurement indexes, one data point pair belongs to the first set X 1 , the other data point pair belongs to the second set X 2 , k data points originally belonging to the first set X 1 are selected to form a third set X ′ 1 , k data points originally belonging to the second set X 2 are selected to form a fourth set X ′ 2 , and the data point pair is expressed as: And Where i=1, 2,. -%, k; A training module based on feature-based interpretation, which performs a regular term penalty with a super parameter alpha by using a process fairness loss L GPF as a penalty term based on a cross entropy loss L CE of a machine learning model f θ , thereby obtaining an objective function total loss L, training and iterating the machine learning model f θ until the objective function total loss L converges, and And the output module outputs the trained machine learning model f θ .
10. Training system according to claim 9, wherein the total objective function loss L is denoted as L = L CE +α×L GPF , the process fairness loss L GPF being calculated as follows: Wherein, the G () is a local feature-attributable interpretation function, and l q denotes distance.
11. The training system of claim 10, wherein L q is L 1 distance, and the process fairness loss L GPF is calculated as follows:
12. training system according to claim 10, wherein the hyper-parameter α = 0.5.
13. The training system of claim 9, wherein, in the sensitive attribute partitioning module, A first set X 1 ＝{x (i) ∈X|s (i) ＝s 1 of the two sets, A second set X 2 ＝{x (i) ∈X|s (i) ＝s 2 of the two sets, Wherein s 1 and s 2 are two values with different sensitivity attributes, s (i) is a sensitivity attribute value of data point X (i) , X is a set of data points with m number of machine learning models f θ , i=1, 2.
14. The training system of claim 9, wherein the data point pair extraction module extracts k pairs of data points from the first set X 1 and the second set X 2 , comprising: Wherein the method comprises the steps of Is an empty set; For X 1 By making the following steps And (3) with Data similarity measurement index between Minimum find in X 2 Will be Wherein, the Selecting and taking out Pairs of data points; for X 2 By making the following steps And (3) with Data similarity measurement index between Minimum find in X 1 Wherein, the Selecting the rest Pairs of data points.
15. The training system of claim 14, wherein the data similarity metric d x (,) is euclidean distance.
16. The training system of claim 9, wherein model gradient parameters are updated using an Adam optimizer

Description

Training method and system of process fairness machine learning model based on feature attribution interpretation Technical Field The present disclosure relates generally to training of machine learning models. In particular, the present disclosure relates to training of a process fairness machine learning model based on feature attribution interpretation. Background With the widespread use of artificial intelligence, fairness is becoming a most of the concerns. Ensuring fairness of artificial intelligence decisions is critical to avoiding exacerbating existing social inequality. In general, in the context of decision making, fairness refers to "not generating any bias or preference for an individual or group due to its inherent or acquired characteristics. Thus, studying fairness in artificial intelligence and machine learning is imperative to facilitate responsible and inclusive deployment of artificial intelligence techniques. Fairness often includes two dimensions, result fairness—fairness that focuses on decision results, also called allocation fairness, and process fairness-fairness that studies decision processes that result in decision results, also called program fairness. Process fairness of a machine learning model refers to the internal decision logic/process of the model, without any bias or preference to the individual or group by its inherent or acquired features. While research on fairness in artificial intelligence has focused mainly on result fairness, researchers have also increasingly recognized the importance of process fairness. Although process fairness is considered to be critical to achieving overall fairness and is considered to be a more reliable criterion than result fairness, existing techniques and methods tend to focus on improving the result fairness of machine learning models, and there is still a lack of methods to improve the process fairness of machine learning models during training. In the prior art on the study of the process fairness, some studies are based on the inherent fairness of the characteristics of a training model, and the process fairness is measured by 'relying on the moral judgment or intuition of the fairness of using input characteristics in the decision situation of human beings', that is, the defined process fairness is measured based on the evaluation of human beings, which is easily influenced by the subjective influence of human beings, has strong subjectivity and limitation, and is difficult to expand. The fairness of each input feature is judged manually, and then the feature selection is utilized to remove the feature considered unfair by manual evaluation so as to improve the process fairness of the machine learning model. Such methods perceive the process fairness of the machine learning model based solely on the input features, without regard to whether the decision process or logic behind the model prediction is fair. Disclosure of Invention Aspects and advantages of the disclosure will be set forth in part in the description which follows, or may be obvious from the description, or may be learned by practice of the technology. The application provides a training method of a process fairness machine learning model based on feature attribution interpretation, which comprises the following steps of (a) dividing a data point set of the machine learning model into a first set X 1 and a second set X 2 with different sensitivity attributes according to the sensitivity attributes, (b) extracting k pairs of data points from the first set X 1 and the second set X 2, wherein two data point similarity measurement indexes of the pairs of data points are minimum, one of the pairs of data points belongs to the first set X 1, the other of the pairs of data points belongs to the second set X 2, k data points which are selected to be originally belonging to the first set X 1 form a third set X '1, and k data points which are selected to be originally belonging to the second set (2 form a fourth set X' 2, wherein the pairs of data points are expressed as follows: And Wherein i=1, 2..k, (c) based on the cross entropy loss L CE of the machine learning model f θ, performing a regular term penalty with a super parameter α with the process fairness loss L GPF as a penalty term, thereby obtaining an objective function total loss L, training the iterative machine learning model f θ until the objective function total loss L converges, and (d) outputting the trained machine learning model f θ. In some embodiments, the objective function total loss L is denoted as l=l CE+α×LGPF, and the process fairness loss L GPF is calculated as follows: Wherein, the G () is a local feature-attributable interpretation function, and l q denotes distance. In some embodiments, L q is the distance L 1, and the calculation formula of the process fairness loss L GPF is as follows: in some embodiments, the super parameter α=0.5. In some embodiments, step (a) is represented as follows: A fi