Learning fair rule lists with FairCORELS

Python
Interpretability
Fairness

Using Python to learn fair rule lists with FairCORELS

Author
Published

March 29, 2021

FairCORELS is an open-source library for learning fair rule lists. In this short tutorial, we will learn how to use it to train rule lists under group fairness constraints.

Prerequisites

Group fairness

Several notions of fairness have been proposed in the machine learning literature (Narayanan 2018; Chouldechova and Roth 2018). This tutorial will focus on statistical notions of fairness (Verma and Rubin 2018), also known as group fairness definitions. Simply put, statistical notions of fairness aim at studying how the performances of a machine learning model, as evaluated by a statistical measure \(m\) (e.g., false positive or false negative rates), differ depending on the group membership of individuals within a dataset. Group memberships are often determined by the values of particular attributes (e.g., gender, age, ethnicity) hereafter referred to as sensitive attributes. In this context, the word unfairness is usually used to characterize the gap between the value of \(m\) for a majority group (e.g., historically advantaged individuals) and its value for a minority group (e.g., historically disadvantaged individuals). Table 1 summarizes some of the most frequently used statistical notions of fairness.

name measure identifier method
statistical parity (SP) probability of receiving a positive outcome 1 statistical_parity()
predictive parity (PP) positive predictive value 2 predictive_parity()
predictive equality (PE) false positive rate 3 predictive_equality()
equal opportunity (EOpp) false negative rate 4 predictive_equality()
equalized odds (EOdds) false negative rate and false positive rate 5 equalized_odds()
conditional use accuracy equality (CUAE) positive predictive value and negative predictive value 6 conditional_use_accuracy_equality()
Table 1: Summary of statistical notions of fairness with the related statistical measure involved. The columns identifier and method represent the id and the method used for the fairness metric within the FairCORELS library.

Rule lists

A rule list (Rivest 1987; Angelino et al. 2018) \(r= (d_p, \delta_p, q_0, K)\) of length \(K \geq 0\) is a \((K+1)-\)tuple consisting of \(K\) distinct association rules \(p_k \to q_k\), in which \(p_k \in d_p\) is the antecedent of the association rule and \(q_k \in \delta_p\) its corresponding consequent, followed by a default prediction \(q_0\). The rule list below predicts whether a person is likely to make at least 50k per year.

 if [capitalGain:>=5119] then [high]
else if [maritalStatus:single] then [low]
else if [capitalLoss:1882-1978] then [high]
else if [workclass:private && education:bachelors] then [high]
else if [education:masters_doctorate] then [high]
else [low]

To make a prediction using a rule list, the rules are applied sequentially until one rule applies, in which case the associated outcome is reported. If none of the rules applies, then the default prediction is reported.

CORELS

CORELS (Angelino et al. 2018) is a supervised machine learning algorithm which takes as input a feature vector \(X\) and its associated label vector \(Y\), all assumed to be binary, and finds the rule list \(r\) that minimize the regularized empirical risk: \(\mathsf{misc(r, X, y)} + \lambda K\), where \(\mathsf{misc(r, X, y)}\) is the misclassification error of the rule list and \(\lambda \geq 0\) is a regularization parameter used to penalize longer rule lists. It represents the search space of the rule lists as a trie and uses an efficient branch-and-bound algorithm to prune it.

FairCORELS

FairCORELS (Aïvodji et al. 2019) is a multi-objective formulation of CORELS that aims to learn fair rule lists. More precisely, FairCORELS solves the following optimization problem:

\[\begin{eqnarray} \label{eq:faircorels} \underset{r \in \mathcal{R}}{\operatorname{argmin}} & \mathsf{misc(r, X, y)} + \lambda K \\ \text{s.t. } & \texttt{unf}(r,X,y, A) \leq \epsilon, \\ \end{eqnarray}\]

where the objective function is the regularized empirical risk of CORELS and the constraint requires that the unfairness \(unf(r,X,y, A)\) of the solution, for a particular sensitive attribute \(A\), to be less than a value \(\epsilon\).

Learning a fair rule list

In this section, we will see how to use FairCORELS to learn a fair rule list.

Installation

FairCORELS can be installed easily using pip install faircorels. Read the README for more details on the installation process.

Dataset and Preprocessing

For this tutorial, we will use the Adult Income dataset. It contains demographic information of about \(48,842\) individuals from the \(1994\) U.S. census. The associated classification task consists of predicting whether a particular individual earns more than \(50,000\$\) per year. Table 2 shows the first six records of the dataset.

Table 2: First six rows of the Adult Income dataset.
age workclass fnlwgt education educational_num marital_status occupation relationship race gender capital_gain capital_loss hours_per_week native_country income
25 Private 226802 11th 7 Never-married Machine-op-inspct Own-child Black Male 0 0 40 United-States <=50K
38 Private 89814 HS-grad 9 Married-civ-spouse Farming-fishing Husband White Male 0 0 50 United-States <=50K
28 Local-gov 336951 Assoc-acdm 12 Married-civ-spouse Protective-serv Husband White Male 0 0 40 United-States >50K
44 Private 160323 Some-college 10 Married-civ-spouse Machine-op-inspct Husband Black Male 7688 0 40 United-States >50K
34 Private 198693 10th 6 Never-married Other-service Not-in-family White Male 0 0 30 United-States <=50K
63 Self-emp-not-inc 104626 Prof-school 15 Married-civ-spouse Prof-specialty Husband White Male 3103 0 32 United-States >50K

Similar to CORELS, FairCORELS also need input data to be categorical. The dataset will be transformed as follows:

  • First, all the numerical attributes are converted into categorical ones by using the Minimum Description Length Principle (Fayyad and Irani 1993).

  • Afterwards, one-hot encoding is applied to the resulting dataset.

  • Finally, the set of rules (composed of single-clause, negative single-clause, and two-clauses rules) is constructed by applying FPGrowth (Han, Pei, and Yin 2000) to the one-hot encoded dataset.

Table 3 gives an overview of the preprocessed dataset.

Table 3: First six rows of the binarized dataset.
workclass.fedGov workclass.otherGov workclass.private workclass.selfEmployed workclass.unEmployed education.associates education.bachelors education.dropout education.hs_grad education.masters_doctorate education.prof_school marital_status.married marital_status.single occupation.blueCollar occupation.other occupation.professional occupation.sales occupation.whiteCollar gender.Female gender.Male age..20.5 age.20.5.23.5 age.23.5.24.5 age.24.5.27.5 age.27.5.29.5 age.29.5.35.5 age.35.5.41.5 age.41.5.54.5 age.54.5.61.5 age..61.5 capital_gain..57.0 capital_gain.57.0.3048.0 capital_gain.3048.0.3120.0 capital_gain.3120.0.4243.5 capital_gain.4243.5.4401.0 capital_gain.4401.0.4668.5 capital_gain.4668.5.4826.0 capital_gain.4826.0.4932.5 capital_gain.4932.5.4973.5 capital_gain.4973.5.5119.0 capital_gain.5119.0.5316.5 capital_gain.5316.5.5505.5 capital_gain.5505.5.6618.5 capital_gain.6618.5.7055.5 capital_gain..7055.5 capital_loss..1551.5 capital_loss.1551.5.1568.5 capital_loss.1568.5.1820.5 capital_loss.1820.5.1859.0 capital_loss.1859.0.1881.5 capital_loss.1881.5.1894.5 capital_loss.1894.5.1927.5 capital_loss.1927.5.1975.5 capital_loss.1975.5.1978.5 capital_loss.1978.5.2168.5 capital_loss.2168.5.2176.5 capital_loss.2176.5.2218.5 capital_loss.2218.5.2252.0 capital_loss.2252.0.2384.5 capital_loss.2384.5.2450.5 capital_loss.2450.5.2469.5 capital_loss.2469.5.3089.5 capital_loss..3089.5 hours_per_week..34.5 hours_per_week.34.5.39.5 hours_per_week.39.5.41.5 hours_per_week.41.5.49.5 hours_per_week.49.5.61.5 hours_per_week..61.5 income
0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

Training

To train the constrained rule list, we will need the following libraries:

  • pandas: to load the dataset.

  • sklearn: to split the dataset into train and test sets.

  • faircorels: to train the rule list with fairness constraint and to evaluate the unfairness and the accuracy

First, let’s load the libraries!

import pandas as pd
from faircorels import FairCorelsClassifier, ConfusionMatrix, Metric
from sklearn.model_selection import train_test_split

Now, we will perform the following tasks:

  • Load the preprocessed dataset.
  • Separate the feature vector from the label vector.
  • Remove the feature vector that characterizes the majority and the minority groups. In fact, the sensitive attribute is only used to enforce the fairness contraint at training time. The sensitive attribute is not used at inference time to avoid disparate treatment.
  • Split the dataset into train and test sets.
# loading the preprocessed dataset
df = pd.read_csv("../../../datasets/adult_income/binary.csv")

# getting the label vector
y = df["income"]
y = y.to_numpy()


# getting the majority and minority groups
maj_group = df["gender:Male"]
min_group = df["gender:Female"]

#getting the rest ot the features 
X = df.drop(["income", "gender:Male", "gender:Female"], axis=1)
features = X.columns.tolist()
X = X.to_numpy()

# split the dataset into train/test sets
X_train, X_test, y_train, y_test, \
maj_group_train, maj_group_test, \
min_group_train, min_group_test =  train_test_split(X, y, 
                                                    maj_group, 
                                                    min_group,
                                                    test_size=0.33, 
                                                    random_state=42)

Finally, we can train the rule list.

# create the model
clf = FairCorelsClassifier( 
                        c = 1e-3,
                        mode = 3,
                        fairness = 4,
                        epsilon = 0.95,
                        max_card = 1,
                        maj_vect = maj_group_train,
                        min_vect = min_group_train,
                        n_iter = 300000, 
                        policy = "bfs",
                        bfs_mode = 2,
                        verbosity = [])
# train the model
clf.fit(X_train, y_train, features=features, prediction_name="[income:>50K]")

The rule list model is instantiated using the FairCorelsClassifier class with the following parameters:

  • c: is the weight of the rule list’s length regularizer.

  • mode: determines the optimization methods. A value of 3 corresponds to the epsilon constraint method. That is, it will fix a value for the unfairness and then maximize the accuracy.

  • fairness: identifier of the fairness metric used. See Table @ref(tab:groupfairness) to get the identifier of each fairness metric and their corresponding methods. For this example, we used the equal opportunity metric.

  • epsilon: is the value of the fairness constraint. So, a value of 0.95 means that we want an unfairness gap of 0.05.

  • max_card: maximum cardinality allowed when mining rules. In this example, we use max_card = 1 since we have already performed the mining with FPGrowth.

  • maj_vect and min_vect: are used to specify the majority and minority groups

  • n_iter: is used to specify the maximum number of nodes (rule lists) to search before exiting. Even if branch-and-bound is an exact method, we will need to stop the search algorithm after a certain number of iterations to avoid high computational overhead. However, whenever the algorithm stops, we have the guarantee that the best rule list given the computational constraint is returned.

  • policy and bfs_mode: are used to specify, the exploration strategy. That is, how nodes are ordered within the priority queue of the branch-and-bound framework. Here, we use a breadth-first search and nodes are selected according to the value of the objective function of the rule lists that they represent.

The rule list is trained with the fit method.

Evaluation of the rule list

To obtain the trained model’s description, we can use the rl_ attribute of the rule list object.

print(clf.rl_)

We can compute the accuracy of the model by using the score method.

accuracy_train = clf.score(X_train, y_train)
accuracy_test = clf.score(X_test, y_test)
print("Accuracy train: {}".format(accuracy_train))
print("Accuracy test: {}".format(accuracy_test))

To compute the unfairness, we will use the ConfusionMatrix and Metric classes.

def compute_unfairness(min_group, maj_group, X, y):
  cm = ConfusionMatrix(min_group, maj_group, clf.predict(X), y)
  cm_minority, cm_majority = cm.get_matrix()
  fairness_metric = Metric(cm_minority, cm_majority)
  return fairness_metric

fairness_metric_train = compute_unfairness(min_group_train, maj_group_train, 
                                            X_train, y_train)
fairness_metric_test  = compute_unfairness(min_group_test, maj_group_test, 
                                            X_test, y_test)
print("Unfairness train: {}".format(fairness_metric_train.equal_opportunity()))
print("Unfairness test: {}".format(fairness_metric_test.equal_opportunity()))                                                         

To evaluate the unfairness, we first create a confusion matrix for both the minority and majority groups using the ConfusionMatrix class. Then, we create a metric object using Metric. Finally, we get the value of the unfairness using the corresponding method equal_opportunity().

In this particular example, we can see that the learned rule list effectively has an unfairness that is lower than 0.05 and an accuracy of 0.819 on the training set. It also generalizes well: it has an unfairness close to 0.00 and an accuracy of 0.816 on the test set.

Conclusion

In this short tutorial, we learned how to train rule lists under group fairness constraints using FairCORELS. More details about FairCORELS can be found in our paper Learning Fair Rule Lists (Aïvodji et al. 2019).

References

Aïvodji, Ulrich, Julien Ferry, Sébastien Gambs, Marie-José Huguet, and Mohamed Siala. 2019. “Learning Fair Rule Lists.” arXiv Preprint arXiv:1909.03977.
Angelino, Elaine, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, and Cynthia Rudin. 2018. “Learning Certifiably Optimal Rule Lists for Categorical Data.” Journal of Machine Learning Research 18 (234): 1–78.
Chouldechova, Alexandra, and Aaron Roth. 2018. “The Frontiers of Fairness in Machine Learning.” arXiv Preprint arXiv:1810.08810.
Fayyad, Usama M., and Keki B. Irani. 1993. “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning.” In IJCAI.
Han, Jiawei, Jian Pei, and Yiwen Yin. 2000. “Mining Frequent Patterns Without Candidate Generation.” ACM Sigmod Record 29 (2): 1–12.
Narayanan, Arvind. 2018. “Translation Tutorial: 21 Fairness Definitions and Their Politics.” In Proc. Conf. Fairness Accountability Transp., New York, USA.
Rivest, Ronald L. 1987. “Learning Decision Lists.” Machine Learning 2 (3): 229–46.
Verma, Sahil, and Julia Rubin. 2018. “Fairness Definitions Explained.” In 2018 Ieee/Acm International Workshop on Software Fairness (Fairware), 1–7. IEEE.