As predictive models are increasingly deployed in critical domains, there has been a growing emphasis on explaining the predictions of these models to decision makers and relevant stakeholders so that they can understand the rationale behind model predictions, and determine if and when to rely on these predictions. To this end, several algorithms have been proposed in eXplainable Artificial Intelligence (XAI) literature to explain models in a post hoc manner. Despite the plethora of post hoc explanation methods, there is little to no work on systematically benchmarking these methods in an efficient and transparent manner. OpenXAI is a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI is designed to support the development of novel explanation methods and evaluation metrics, and our publicly available leaderboards allow for easy and transparent comparison of explanation methods across diverse evaluation metrics.

OpenXAI comprises of implementations of various state-of-the-art evaluation metrics to assess the faithfulness (both with and without ground truth), stability, and fairness of post hoc explanations. In addition, it has XAI-ready datasets, trained models, and APIs for popular post hoc explanation methods to enable researchers and practitioners to easily benchmark existing or new explanation methods.

We envision OpenXAI to be an interface between XAI researchers and practitioners. Researchers can benchmark new explanation methods using OpenXAI, and also integrate them into our OpenXAI framework and leaderboards, and ML practitioners can readily compare these methods using our APIs and leaderboards.


To install the OpenXAI package, clone OpenXAI's repo and install the package from the root directory using:

pip install -e . 

The installation of the OpenXAI package is hassle-free with minimum dependency on external packages.

Data Loaders

The data loaders in OpenXAI are lightweight. OpenXAI provides a collection of functionalities with easy-toa-use high-level APIs to benchmark explanations. As an example, to obtain the German dataset from the data loader in OpenXAI, do as follows:

from openxai.dataloader import return_loaders
loader_train, loader_test = return_loaders(data_name= 'german', download=True)
inputs, labels = iter(loader_test).next()

Trained Models

OpenXAI provides two classes of trained predictive models for transparent and reproducible benchmarking of explanation methods. The code snippet below shows how to load OpenXAI’s pre-trained models using our LoadModel class.

from openxai import LoadModel
model = LoadModel(data_name= 'german', ml_model='ann', pretrained=True)


Explanation methods included in OpenXAI are readily accessible through the Explainer class. Users need to specify the method name in order to invoke the appropriate method and generate explanations.

from openxai import Explainer
exp_method = Explainer(method= 'lime',model=model, dataset_tensor=inputs)
explanations= exp_method.get_explanation(inputs, labels)


Benchmarking an explanation method using evaluation metrics is quite simple and the code snippet below describes how to invoke the RIS metric. The input_dict is described in the Getting Started file.

from openxai import Evaluator
metric_evaluator = Evaluator(input_dict, inputs, labels, model, exp_method)
score= metric_evaluator.evaluate(metric='RIS')