Explaining black box ML model for building Knowledge Graph

Tirth Patel
10 min readDec 3, 2021

Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This article is about making machine learning models and their decisions interpretable.

Table of Content

Introduction

Interpreting Machine Learning models is no longer a luxury, but a necessity has given the rapid adoption of AI in the industry. This article is a continuation of my series of articles aimed at ‘Explainable Artificial Intelligence (XAI).’ The idea here is to cut through the hype and provide the tools and techniques needed to start interpreting any black-box machine learning model. As AI-powered technologies proliferate in the enterprise, the term “explainable AI” (XAI) has entered mainstream vernacular. XAI is a set of tools, techniques, and frameworks intended to help users and designers of AI systems understand their predictions, including how and why the systems arrived at them. A June 2020 IDC report found that business decision-makers believe explainability is a “critical requirement” in AI. To this end, explainability has been referenced as a guiding principle for AI development at DARPA, the European Commission’s High-level Expert Group on AI, and the National Institute of Standards and Technology. Startups are emerging to deliver “explainability as a service,” like Truera, and tech giants such as IBM, Google, and Microsoft have open-sourced both XAI toolkits and methods.

What is explainable AI (XAI)?

Generally speaking, there are three types of explanations in XAI: Global, local, and social influence.

  • Global explanations shed light on what a system is doing as a whole instead of the processes that lead to a prediction or decision. They often include summaries of how a system uses a feature to make a prediction and “meta information,” like the type of data used to train the system.
  • Local explanations provide a detailed description of how the model came up with a specific prediction. These might include information about how a model uses features to generate an output or how flaws in input data will influence the result.
  • Social influence explanations relate to how “socially relevant” others — i.e., users — behave in response to a system’s predictions. A system using this sort of explanation may show a report on model adoption statistics or the system’s ranking by users with similar characteristics (e.g., people above a certain age).

What is the significance of interpretability in machine learning?

In traditional statistics, we construct and verify hypotheses by investigating the data at large. We build models to build rules that we can incorporate into our mental processes models. For example, a marketing firm can build a model that correlates marketing campaign data to financial data to determine an effective marketing campaign. This is a top-down approach to data science, and interpretability is vital as it is a cornerstone of the defined rules and processes. As correlation often does not equal causality, a solid model understanding is needed to make decisions and explain them.

We delegate aspects of the business process to machine learning models in a bottom-up approach to data science. Furthermore, machine learning enables whole new business concepts. Bottom-up data science is usually associated with the automation of manual and time-consuming operations. For example, a manufacturing company could install sensors on its machinery and undertake predictive maintenance. Maintenance engineers can work more efficiently, and they won’t have to undertake costly periodic checkups. Model interpretability is required to ensure that the model is performing as expected and build confidence with users and facilitate the shift from manual to automated procedures.

As a data scientist, fine-tuning models to achieve optimal performance is a common worry. ‘Given data X and labels y, finding the model with the least amount of error’ is a common way of framing data science. While being able to train performant models is vital for a data scientist, it’s also essential to see the larger picture. The interpretability of data and machine learning models is one of the most critical parts of a data science pipeline’s actual ‘usefulness’ since it guarantees that the model is aligned with the problem you’re trying to address. Although it’s easy to get caught up in experimenting with cutting-edge techniques when developing models, properly evaluating your results is an essential component of the data science process.

Why is it critical to do a thorough examination of your models?

As a data scientist, there are various reasons to concentrate on model interpretability. Although there is some overlap, they capture the various reasons for interpretability:

  • Identify and mitigate bias.
  • Accounting for the context of the problem.
  • Improving generalisation and performance.
  • Ethical and legal reasons.

In this article, I will give you hands-on guides which showcase various ways to explain potential black-box machine learning models in a model-agnostic way. We will be working on a Relation Classifier Model for Financial document(Microsoft’s Earnings). We will implement the Microsoft OpenSource Interpret-Text library and further use LIME to explain the SVM(RBF) model we created in part1 of this series.

Microsoft’s Interpret-Text (Opensource library)

Interpret-Text combines community-developed interpretability strategies for NLP models with a dashboard for visualizing the findings. Users can conduct their tests on various state-of-the-art explainers and readily compare the results. Users will explain their machine learning models either globally on each label or locally for each document using these capabilities. We will be working on Classical Text Explainer to build the relation classifier model.

Preprocessing and the Pipeline: The ClassicalTextExplainer serves as a high level wrapper for the entire NLP pipeline, by natively handling the text preprocessing, encoding, training and hyperparameter optimization process. This allows the user to supply the dataset in text form without need for any external processing, with the entire text pipeline process being handled by the explainer under the hood. In its default configuration the preprocessing pipeline uses a 1-gram bag-of-words encoder implemented by sklearn’s count-vectorizer. The utilities file contains the finer details of the preprocessing steps in the default pipeline. Below is the code implemented for building relation classifier.

import pandas as pd
import re
import itertools
import nltk
data = pd.read_csv("dataset-copy.csv")
nltk.download('punkt')
# sklearn
from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from interpret_text.experimental.classical import ClassicalTextExplainer
# for testing
from scrapbook.api import glue
data['Entity']=data['Head'] + " " + data['Tail']# Lowercase text
data['Entity'] = data['Entity'].apply(lambda x: x.lower())
# Tokenize words
data['Entity'] = data['Entity'].apply(lambda x: ' '.join(nltk.word_tokenize(x)))
# Lowercase text
data['Relation'] = data['Relation'].apply(lambda x: x.lower())
# Tokenize words
data['Relation'] = data['Relation'].apply(lambda x: ' '.join(nltk.word_tokenize(x)))
X_str = data['Entity'] # the document we want to analyze
ylabels = data['Relation'] # the labels
# Create explainer object that contains default glassbox classifier and explanation methods
explainer = ClassicalTextExplainer()
label_encoder = LabelEncoder()
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X_str, ylabels, train_size=0.8, test_size=0.2)
y_train = label_encoder.fit_transform(y_train)
y_test = label_encoder.transform(y_test)
# Train the model
classifier, best_params = explainer.fit(X_train, y_train)
# obtain best classifier and hyper params
print("best classifier: " + str(best_params))
# PERFORMANCE MATRIX
mean_accuracy = classifier.score(X_test, y_test, sample_weight=None)
print("accuracy = " + str(mean_accuracy * 100) + "%")
y_pred = classifier.predict(X_test)
[precision, recall, fscore, support] = precision_recall_fscore_support(y_test, y_pred,average='macro')
# Local explaination
y = classifier.predict(document1)
predicted_label = label_encoder.inverse_transform(y)
local_explanation = explainer.explain_local(document1,
predicted_label)
from interpret_text.experimental.widget import ExplanationDashboard
ExplanationDashboard(local_explanation)

A critical look at explaining relation classifier with LIME

Interpretability of Machine Learning models is a topic that has received a lot of interest in the last few years. The use cases are plentyful: from confounders in medical predictions, biased algorithms used for the selection of job applicants to potential future regulation that requires companies to explain why their AI made a particular decision. This section looks at a particular technique called LIME, that aims to make individual predictions interpretable. The algorithm does so by fitting a locally-linear model to the predicted class-probability in the neighbourhood of the feature space where we desire our explanation. The simplicity of the linear model then makes it possible to explain how the prediction is affected by the features, locally.

The idea behind LIME is simple: making a local surrogate model in the neighbourhood of a prediction, that represents an optimum in the trade-off between interpretability (simplicity) and faithfulness (accuracy). The interpretable model can then tell us how the actual, original model works locally. The un-faithfulness (loss) is hereby measured using the sum of squared differences between the predicted probabilities of the original and the surrogate model in the vicinity of the prediction. In other words, it aims to find a simple, usually linear, model with as few terms as possible that is still representative for the original model, at least locally. For text, new instances are generated by random sampling of the words that are present in the observation. In other words, words are randomly left out from the input. The resulting near-by points are then fed into the classifier, and a cloud of predicted probabilities is gathered. A linear model is then fitted, and its coefficients are used for the explanation.

Although LIME promises to optimize between interpretability/simplicity and faithfulness (in an elegant equation in the paper), the algorithm does not do this for us. The number of coefficients (simplicity) is chosen by the user beforehand, and a linear regression is fitted to the samples. Furthermore, as we will see, the sampling, the choice of a kernel size that defines the locality and the regularization of the linear model can be problematic.

For images and tabular data, LIME -especially the definition of locality and the sampling- works quite differently, but that is out of the scope of this article.

from __future__ import print_function
import lime
import sklearn.ensemble
from lime import lime_text
from sklearn.pipeline import make_pipeline
from lime.lime_text import LimeTextExplainer
# converting the vectoriser and model into a pipeline
# this is necessary as LIME takes a model pipeline as an input
c = make_pipeline(vectorizer, clf)
# saving a list of strings version of the X_test object
ls_X_test= list(X_test_lime["Entity"])
# saving the class names in a dictionary to increase interpretability
class_names = class_na

In the above code, we first created a pipeline of tf-idf vectorizer and SVM classifier. Then, to explain the output for each input, we pass the input to the explainer class object.

# create the LIME explainer
# add the class names for interpretability
LIME_explainer = LimeTextExplainer(class_names=class_names)
# choose a random single prediction
idx = 50
# explain the chosen prediction
# use the probability results of the logistic regression
# can also add num_features parameter to reduce the number of features explained
LIME_exp = LIME_explainer.explain_instance(ls_X_test[idx], c.predict_proba)
# print results
print('Document id: %d' % idx)
print('Entities: ', ls_X_test[idx])
print('Probability=', c.predict_proba([ls_X_test[idx]]).round(3)[0,1])
print('True class: %s' % y_test_lime[idx])

Further, we can analyze the words/features which are important for the model to predict that particular class.

LIME_exp.show_in_notebook(text=True)

Output

The concept of LIME is nice. It is visually attractive (especially for text), and indeed accepts any classifier, as long as it can probe it’s .predict_proba() method.

However, there are two issues:

  • The size of the neighbourhood (kernel width) is set per default at some arbitrary value, and it is rather unclear what would be a suitable value
  • The R-squared of the surrogate model’s fit can be very low.

The latter can have 2 causes. One is having an insufficient number of features (which is acceptable and easily solved if desired). The other has to do with the inability to fit a good linear model, due to a too large kernel size and/or too high regularization. Tweaking is needed in the latter case.

Therefore, always monitor the R-squared, and decide whether its value is acceptable.

Explaining Logistic Regression Model with ELI5

Let us first try to use a simple scikit-learn pipeline to build our text classifier which we will try to interpret later. In this pipeline, I will be using a very simple count vectorizer along with Logistic regression.

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegressionCV
from sklearn.pipeline import make_pipeline# Creating train-test Split
X = X_test_lime[['Entity']]
y = X_test_lime[['Relation']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)# fitting the classifier
vec = CountVectorizer()
clf = LogisticRegressionCV()
pipe = make_pipeline(vec, clf)
pipe.fit(X_train.Entity, y_train.Relation)

To checkout result:

from sklearn import metrics
def print_report(pipe):
y_actuals = y_test['Relation']
y_preds = pipe.predict(X_test['Entity'])
report = metrics.classification_report(y_actuals, y_preds)
print(report)
print("accuracy: {:0.3f}".format(metrics.accuracy_score(y_actuals, y_preds)))
print_report(pipe)

The above is a pretty simple Logistic regression model and it performs pretty well. We can check out its weights using the below function:

for i, tag in enumerate(clf.classes_):
coefficients = clf.coef_[i]
weights = list(zip(vec.get_feature_names(),coefficients))
print('Tag:',tag)
print('Most Positive Coefficients:')
print(sorted(weights,key=lambda x: -x[1])[:10])
print('Most Negative Coefficients:')
print(sorted(weights,key=lambda x: x[1])[:10])
print("--------------------------------------")

Output:

And that is all pretty good. We can see the coefficients make sense although we had less accuracy model. We can try to improve our model using this information.

But above was a lot of code. ELI5 makes this exercise pretty simple for us. We just have to use the below command:

import eli5
eli5.show_weights(clf, vec=vec, top=20)

Now as you can see the weights value for Python is the same as from the values we got from the function we wrote manually. And it is much prettier and wholesome to explore.

But that is just the tip of the iceberg. ELI5 can also help us to debug our models as we can see below. ELI5 provides us with a good way to do this. It works for a variety of models and the documentation for this library is one of the best I have ever seen. Also, I love the decorated output the ELI5 library provides with the simple and fast way it provides to interpret my models. And debug them too. To use ELI5 with your models you can follow along with the code in this Kaggle Kernel

--

--