if [capitalGain:>=5119] then [high]
else if [maritalStatus:single] then [low]
else if [capitalLoss:1882-1978] then [high]
else if [workclass:private && education:bachelors] then [high]
else if [education:masters_doctorate] then [high]
else [low]
Learning fair rule lists with FairCORELS
Using Python to learn fair rule lists with FairCORELS
FairCORELS is an open-source library for learning fair rule lists. In this short tutorial, we will learn how to use it to train rule lists under group fairness constraints.
Prerequisites
Group fairness
Several notions of fairness have been proposed in the machine learning literature (Narayanan 2018; Chouldechova and Roth 2018). This tutorial will focus on statistical notions of fairness (Verma and Rubin 2018), also known as group fairness definitions. Simply put, statistical notions of fairness aim at studying how the performances of a machine learning model, as evaluated by a statistical measure \(m\) (e.g., false positive or false negative rates), differ depending on the group membership of individuals within a dataset. Group memberships are often determined by the values of particular attributes (e.g., gender, age, ethnicity) hereafter referred to as sensitive attributes. In this context, the word unfairness is usually used to characterize the gap between the value of \(m\) for a majority group (e.g., historically advantaged individuals) and its value for a minority group (e.g., historically disadvantaged individuals). Table 1 summarizes some of the most frequently used statistical notions of fairness.
name | measure | identifier | method |
---|---|---|---|
statistical parity (SP) | probability of receiving a positive outcome | 1 | statistical_parity() |
predictive parity (PP) | positive predictive value | 2 | predictive_parity() |
predictive equality (PE) | false positive rate | 3 | predictive_equality() |
equal opportunity (EOpp) | false negative rate | 4 | predictive_equality() |
equalized odds (EOdds) | false negative rate and false positive rate | 5 | equalized_odds() |
conditional use accuracy equality (CUAE) | positive predictive value and negative predictive value | 6 | conditional_use_accuracy_equality() |
Rule lists
A rule list (Rivest 1987; Angelino et al. 2018) \(r= (d_p, \delta_p, q_0, K)\) of length \(K \geq 0\) is a \((K+1)-\)tuple consisting of \(K\) distinct association rules \(p_k \to q_k\), in which \(p_k \in d_p\) is the antecedent of the association rule and \(q_k \in \delta_p\) its corresponding consequent, followed by a default prediction \(q_0\). The rule list below predicts whether a person is likely to make at least 50k per year.
To make a prediction using a rule list, the rules are applied sequentially until one rule applies, in which case the associated outcome is reported. If none of the rules applies, then the default prediction is reported.
CORELS
CORELS (Angelino et al. 2018) is a supervised machine learning algorithm which takes as input a feature vector \(X\) and its associated label vector \(Y\), all assumed to be binary, and finds the rule list \(r\) that minimize the regularized empirical risk: \(\mathsf{misc(r, X, y)} + \lambda K\), where \(\mathsf{misc(r, X, y)}\) is the misclassification error of the rule list and \(\lambda \geq 0\) is a regularization parameter used to penalize longer rule lists. It represents the search space of the rule lists as a trie and uses an efficient branch-and-bound algorithm to prune it.
FairCORELS
FairCORELS (Aïvodji et al. 2019) is a multi-objective formulation of CORELS that aims to learn fair rule lists. More precisely, FairCORELS solves the following optimization problem:
\[\begin{eqnarray} \label{eq:faircorels} \underset{r \in \mathcal{R}}{\operatorname{argmin}} & \mathsf{misc(r, X, y)} + \lambda K \\ \text{s.t. } & \texttt{unf}(r,X,y, A) \leq \epsilon, \\ \end{eqnarray}\]
where the objective function is the regularized empirical risk of CORELS and the constraint requires that the unfairness \(unf(r,X,y, A)\) of the solution, for a particular sensitive attribute \(A\), to be less than a value \(\epsilon\).
Learning a fair rule list
In this section, we will see how to use FairCORELS to learn a fair rule list.
Installation
FairCORELS can be installed easily using pip install faircorels
. Read the README for more details on the installation process.
Dataset and Preprocessing
For this tutorial, we will use the Adult Income dataset. It contains demographic information of about \(48,842\) individuals from the \(1994\) U.S. census. The associated classification task consists of predicting whether a particular individual earns more than \(50,000\$\) per year. Table 2 shows the first six records of the dataset.
age | workclass | fnlwgt | education | educational_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
25 | Private | 226802 | 11th | 7 | Never-married | Machine-op-inspct | Own-child | Black | Male | 0 | 0 | 40 | United-States | <=50K |
38 | Private | 89814 | HS-grad | 9 | Married-civ-spouse | Farming-fishing | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K |
28 | Local-gov | 336951 | Assoc-acdm | 12 | Married-civ-spouse | Protective-serv | Husband | White | Male | 0 | 0 | 40 | United-States | >50K |
44 | Private | 160323 | Some-college | 10 | Married-civ-spouse | Machine-op-inspct | Husband | Black | Male | 7688 | 0 | 40 | United-States | >50K |
34 | Private | 198693 | 10th | 6 | Never-married | Other-service | Not-in-family | White | Male | 0 | 0 | 30 | United-States | <=50K |
63 | Self-emp-not-inc | 104626 | Prof-school | 15 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 3103 | 0 | 32 | United-States | >50K |
Similar to CORELS, FairCORELS also need input data to be categorical. The dataset will be transformed as follows:
First, all the numerical attributes are converted into categorical ones by using the Minimum Description Length Principle (Fayyad and Irani 1993).
Afterwards, one-hot encoding is applied to the resulting dataset.
Finally, the set of rules (composed of single-clause, negative single-clause, and two-clauses rules) is constructed by applying FPGrowth (Han, Pei, and Yin 2000) to the one-hot encoded dataset.
Table 3 gives an overview of the preprocessed dataset.
workclass.fedGov | workclass.otherGov | workclass.private | workclass.selfEmployed | workclass.unEmployed | education.associates | education.bachelors | education.dropout | education.hs_grad | education.masters_doctorate | education.prof_school | marital_status.married | marital_status.single | occupation.blueCollar | occupation.other | occupation.professional | occupation.sales | occupation.whiteCollar | gender.Female | gender.Male | age..20.5 | age.20.5.23.5 | age.23.5.24.5 | age.24.5.27.5 | age.27.5.29.5 | age.29.5.35.5 | age.35.5.41.5 | age.41.5.54.5 | age.54.5.61.5 | age..61.5 | capital_gain..57.0 | capital_gain.57.0.3048.0 | capital_gain.3048.0.3120.0 | capital_gain.3120.0.4243.5 | capital_gain.4243.5.4401.0 | capital_gain.4401.0.4668.5 | capital_gain.4668.5.4826.0 | capital_gain.4826.0.4932.5 | capital_gain.4932.5.4973.5 | capital_gain.4973.5.5119.0 | capital_gain.5119.0.5316.5 | capital_gain.5316.5.5505.5 | capital_gain.5505.5.6618.5 | capital_gain.6618.5.7055.5 | capital_gain..7055.5 | capital_loss..1551.5 | capital_loss.1551.5.1568.5 | capital_loss.1568.5.1820.5 | capital_loss.1820.5.1859.0 | capital_loss.1859.0.1881.5 | capital_loss.1881.5.1894.5 | capital_loss.1894.5.1927.5 | capital_loss.1927.5.1975.5 | capital_loss.1975.5.1978.5 | capital_loss.1978.5.2168.5 | capital_loss.2168.5.2176.5 | capital_loss.2176.5.2218.5 | capital_loss.2218.5.2252.0 | capital_loss.2252.0.2384.5 | capital_loss.2384.5.2450.5 | capital_loss.2450.5.2469.5 | capital_loss.2469.5.3089.5 | capital_loss..3089.5 | hours_per_week..34.5 | hours_per_week.34.5.39.5 | hours_per_week.39.5.41.5 | hours_per_week.41.5.49.5 | hours_per_week.49.5.61.5 | hours_per_week..61.5 | income |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Training
To train the constrained rule list, we will need the following libraries:
pandas: to load the dataset.
sklearn: to split the dataset into train and test sets.
faircorels: to train the rule list with fairness constraint and to evaluate the unfairness and the accuracy
First, let’s load the libraries!
import pandas as pd
from faircorels import FairCorelsClassifier, ConfusionMatrix, Metric
from sklearn.model_selection import train_test_split
Now, we will perform the following tasks:
- Load the preprocessed dataset.
- Separate the feature vector from the label vector.
- Remove the feature vector that characterizes the majority and the minority groups. In fact, the sensitive attribute is only used to enforce the fairness contraint at training time. The sensitive attribute is not used at inference time to avoid disparate treatment.
- Split the dataset into train and test sets.
# loading the preprocessed dataset
= pd.read_csv("../../../datasets/adult_income/binary.csv")
df
# getting the label vector
= df["income"]
y = y.to_numpy()
y
# getting the majority and minority groups
= df["gender:Male"]
maj_group = df["gender:Female"]
min_group
#getting the rest ot the features
= df.drop(["income", "gender:Male", "gender:Female"], axis=1)
X = X.columns.tolist()
features = X.to_numpy()
X
# split the dataset into train/test sets
\
X_train, X_test, y_train, y_test, \
maj_group_train, maj_group_test, = train_test_split(X, y,
min_group_train, min_group_test
maj_group,
min_group,=0.33,
test_size=42) random_state
Finally, we can train the rule list.
# create the model
= FairCorelsClassifier(
clf = 1e-3,
c = 3,
mode = 4,
fairness = 0.95,
epsilon = 1,
max_card = maj_group_train,
maj_vect = min_group_train,
min_vect = 300000,
n_iter = "bfs",
policy = 2,
bfs_mode = [])
verbosity # train the model
=features, prediction_name="[income:>50K]") clf.fit(X_train, y_train, features
The rule list model is instantiated using the FairCorelsClassifier class with the following parameters:
c: is the weight of the rule list’s length regularizer.
mode: determines the optimization methods. A value of 3 corresponds to the epsilon constraint method. That is, it will fix a value for the unfairness and then maximize the accuracy.
fairness: identifier of the fairness metric used. See Table @ref(tab:groupfairness) to get the identifier of each fairness metric and their corresponding methods. For this example, we used the equal opportunity metric.
epsilon: is the value of the fairness constraint. So, a value of 0.95 means that we want an unfairness gap of 0.05.
max_card: maximum cardinality allowed when mining rules. In this example, we use max_card = 1 since we have already performed the mining with FPGrowth.
maj_vect and min_vect: are used to specify the majority and minority groups
n_iter: is used to specify the maximum number of nodes (rule lists) to search before exiting. Even if branch-and-bound is an exact method, we will need to stop the search algorithm after a certain number of iterations to avoid high computational overhead. However, whenever the algorithm stops, we have the guarantee that the best rule list given the computational constraint is returned.
policy and bfs_mode: are used to specify, the exploration strategy. That is, how nodes are ordered within the priority queue of the branch-and-bound framework. Here, we use a breadth-first search and nodes are selected according to the value of the objective function of the rule lists that they represent.
The rule list is trained with the fit method.
Evaluation of the rule list
To obtain the trained model’s description, we can use the rl_ attribute of the rule list object.
print(clf.rl_)
We can compute the accuracy of the model by using the score method.
= clf.score(X_train, y_train)
accuracy_train = clf.score(X_test, y_test) accuracy_test
print("Accuracy train: {}".format(accuracy_train))
print("Accuracy test: {}".format(accuracy_test))
To compute the unfairness, we will use the ConfusionMatrix and Metric classes.
def compute_unfairness(min_group, maj_group, X, y):
= ConfusionMatrix(min_group, maj_group, clf.predict(X), y)
cm = cm.get_matrix()
cm_minority, cm_majority = Metric(cm_minority, cm_majority)
fairness_metric return fairness_metric
= compute_unfairness(min_group_train, maj_group_train,
fairness_metric_train
X_train, y_train)= compute_unfairness(min_group_test, maj_group_test,
fairness_metric_test X_test, y_test)
print("Unfairness train: {}".format(fairness_metric_train.equal_opportunity()))
print("Unfairness test: {}".format(fairness_metric_test.equal_opportunity()))
To evaluate the unfairness, we first create a confusion matrix for both the minority and majority groups using the ConfusionMatrix class. Then, we create a metric object using Metric. Finally, we get the value of the unfairness using the corresponding method equal_opportunity().
In this particular example, we can see that the learned rule list effectively has an unfairness that is lower than 0.05 and an accuracy of 0.819 on the training set. It also generalizes well: it has an unfairness close to 0.00 and an accuracy of 0.816 on the test set.
Conclusion
In this short tutorial, we learned how to train rule lists under group fairness constraints using FairCORELS. More details about FairCORELS can be found in our paper Learning Fair Rule Lists (Aïvodji et al. 2019).