Skip to contents

Visualizes SHAP values against feature values to gain an impression of feature effects.

Usage

xgb.plot.shap(
  data,
  shap_contrib = NULL,
  features = NULL,
  top_n = 1,
  model = NULL,
  trees = NULL,
  target_class = NULL,
  approxcontrib = FALSE,
  subsample = NULL,
  n_col = 1,
  col = rgb(0, 0, 1, 0.2),
  pch = ".",
  discrete_n_uniq = 5,
  discrete_jitter = 0.01,
  ylab = "SHAP",
  plot_NA = TRUE,
  col_NA = rgb(0.7, 0, 1, 0.6),
  pch_NA = ".",
  pos_NA = 1.07,
  plot_loess = TRUE,
  col_loess = 2,
  span_loess = 0.5,
  which = c("1d", "2d"),
  plot = TRUE,
  ...
)

Arguments

data

The data to explain as a matrix, dgCMatrix, or data.frame.

shap_contrib

Matrix of SHAP contributions of data. The default (NULL) computes it from model and data.

features

Vector of column indices or feature names to plot. When NULL (default), the top_n most important features are selected by xgb.importance().

top_n

How many of the most important features (<= 100) should be selected? By default 1 for SHAP dependence and 10 for SHAP summary. Only used when features = NULL.

model

An xgb.Booster model. Only required when shap_contrib = NULL or features = NULL.

trees

Passed to xgb.importance() when features = NULL.

target_class

Only relevant for multiclass models. The default (NULL) averages the SHAP values over all classes. Pass a (0-based) class index to show only SHAP values of that class.

approxcontrib

Passed to predict.xgb.Booster() when shap_contrib = NULL.

subsample

Fraction of data points randomly picked for plotting. The default (NULL) will use up to 100k data points.

n_col

Number of columns in a grid of plots.

col

Color of the scatterplot markers.

pch

Scatterplot marker.

discrete_n_uniq

Maximal number of unique feature values to consider the feature as discrete.

discrete_jitter

Jitter amount added to the values of discrete features.

ylab

The y-axis label in 1D plots.

plot_NA

Should contributions of cases with missing values be plotted? Default is TRUE.

col_NA

Color of marker for missing value contributions.

pch_NA

Marker type for NA values.

pos_NA

Relative position of the x-location where NA values are shown: min(x) + (max(x) - min(x)) * pos_NA.

plot_loess

Should loess-smoothed curves be plotted? (Default is TRUE). The smoothing is only done for features with more than 5 distinct values.

col_loess

Color of loess curves.

span_loess

The span parameter of stats::loess().

which

Whether to do univariate or bivariate plotting. Currently, only "1d" is implemented.

plot

Should the plot be drawn? (Default is TRUE). If FALSE, only a list of matrices is returned.

...

Other parameters passed to graphics::plot().

Value

In addition to producing plots (when plot = TRUE), it silently returns a list of two matrices:

  • data: Feature value matrix.

  • shap_contrib: Corresponding SHAP value matrix.

Details

These scatterplots represent how SHAP feature contributions depend of feature values. The similarity to partial dependence plots is that they also give an idea for how feature values affect predictions. However, in partial dependence plots, we see marginal dependencies of model prediction on feature value, while SHAP dependence plots display the estimated contributions of a feature to the prediction for each individual case.

When plot_loess = TRUE, feature values are rounded to three significant digits and weighted LOESS is computed and plotted, where the weights are the numbers of data points at each rounded value.

Note: SHAP contributions are on the scale of the model margin. E.g., for a logistic binomial objective, the margin is on log-odds scale. Also, since SHAP stands for "SHapley Additive exPlanation" (model prediction = sum of SHAP contributions for all features + bias), depending on the objective used, transforming SHAP contributions for a feature from the marginal to the prediction space is not necessarily a meaningful thing to do.

References

  1. Scott M. Lundberg, Su-In Lee, "A Unified Approach to Interpreting Model Predictions", NIPS Proceedings 2017, https://arxiv.org/abs/1705.07874

  2. Scott M. Lundberg, Su-In Lee, "Consistent feature attribution for tree ensembles", https://arxiv.org/abs/1706.06060

Examples


data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")

## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)
nrounds <- 20

model_binary <- xgboost(
  agaricus.train$data, factor(agaricus.train$label),
  nrounds = nrounds,
  verbosity = 0L,
  learning_rate = 0.1,
  max_depth = 3L,
  subsample = 0.5,
  nthreads = nthread
)

xgb.plot.shap(agaricus.test$data, model = model_binary, features = "odor=none")

contr <- predict(model_binary, agaricus.test$data, type = "contrib")
xgb.plot.shap(agaricus.test$data, contr, model = model_binary, top_n = 12, n_col = 3)

# Summary plot
xgb.ggplot.shap.summary(agaricus.test$data, contr, model = model_binary, top_n = 12)

# Multiclass example - plots for each class separately:
x <- as.matrix(iris[, -5])
set.seed(123)
is.na(x[sample(nrow(x) * 4, 30)]) <- TRUE # introduce some missing values

model_multiclass <- xgboost(
  x, iris$Species,
  nrounds = nrounds,
  verbosity = 0,
  max_depth = 2,
  subsample = 0.5,
  nthreads = nthread
)
nclass <- 3
trees0 <- seq(from = 1, by = nclass, length.out = nrounds)
col <- rgb(0, 0, 1, 0.5)

xgb.plot.shap(
  x,
  model = model_multiclass,
  trees = trees0,
  target_class = 0,
  top_n = 4,
  n_col = 2,
  col = col,
  pch = 16,
  pch_NA = 17
)

xgb.plot.shap(
  x,
  model = model_multiclass,
  trees = trees0 + 1,
  target_class = 1,
  top_n = 4,
  n_col = 2,
  col = col,
  pch = 16,
  pch_NA = 17
)

xgb.plot.shap(
  x,
  model = model_multiclass,
  trees = trees0 + 2,
  target_class = 2,
  top_n = 4,
  n_col = 2,
  col = col,
  pch = 16,
  pch_NA = 17
)

# Summary plot
xgb.ggplot.shap.summary(x, model = model_multiclass, target_class = 0, top_n = 4)