Visualizes SHAP values against feature values to gain an impression of feature effects.
Usage
xgb.plot.shap(
data,
shap_contrib = NULL,
features = NULL,
top_n = 1,
model = NULL,
trees = NULL,
target_class = NULL,
approxcontrib = FALSE,
subsample = NULL,
n_col = 1,
col = rgb(0, 0, 1, 0.2),
pch = ".",
discrete_n_uniq = 5,
discrete_jitter = 0.01,
ylab = "SHAP",
plot_NA = TRUE,
col_NA = rgb(0.7, 0, 1, 0.6),
pch_NA = ".",
pos_NA = 1.07,
plot_loess = TRUE,
col_loess = 2,
span_loess = 0.5,
which = c("1d", "2d"),
plot = TRUE,
...
)
Arguments
- data
The data to explain as a
matrix
,dgCMatrix
, ordata.frame
.- shap_contrib
Matrix of SHAP contributions of
data
. The default (NULL
) computes it frommodel
anddata
.- features
Vector of column indices or feature names to plot. When
NULL
(default), thetop_n
most important features are selected byxgb.importance()
.- top_n
How many of the most important features (<= 100) should be selected? By default 1 for SHAP dependence and 10 for SHAP summary. Only used when
features = NULL
.- model
An
xgb.Booster
model. Only required whenshap_contrib = NULL
orfeatures = NULL
.- trees
Passed to
xgb.importance()
whenfeatures = NULL
.- target_class
Only relevant for multiclass models. The default (
NULL
) averages the SHAP values over all classes. Pass a (0-based) class index to show only SHAP values of that class.- approxcontrib
Passed to
predict.xgb.Booster()
whenshap_contrib = NULL
.- subsample
Fraction of data points randomly picked for plotting. The default (
NULL
) will use up to 100k data points.- n_col
Number of columns in a grid of plots.
- col
Color of the scatterplot markers.
- pch
Scatterplot marker.
- discrete_n_uniq
Maximal number of unique feature values to consider the feature as discrete.
- discrete_jitter
Jitter amount added to the values of discrete features.
- ylab
The y-axis label in 1D plots.
- plot_NA
Should contributions of cases with missing values be plotted? Default is
TRUE
.- col_NA
Color of marker for missing value contributions.
- pch_NA
Marker type for
NA
values.- pos_NA
Relative position of the x-location where
NA
values are shown:min(x) + (max(x) - min(x)) * pos_NA
.- plot_loess
Should loess-smoothed curves be plotted? (Default is
TRUE
). The smoothing is only done for features with more than 5 distinct values.- col_loess
Color of loess curves.
- span_loess
The
span
parameter ofstats::loess()
.- which
Whether to do univariate or bivariate plotting. Currently, only "1d" is implemented.
- plot
Should the plot be drawn? (Default is
TRUE
). IfFALSE
, only a list of matrices is returned.- ...
Other parameters passed to
graphics::plot()
.
Value
In addition to producing plots (when plot = TRUE
), it silently returns a list of two matrices:
data
: Feature value matrix.shap_contrib
: Corresponding SHAP value matrix.
Details
These scatterplots represent how SHAP feature contributions depend of feature values. The similarity to partial dependence plots is that they also give an idea for how feature values affect predictions. However, in partial dependence plots, we see marginal dependencies of model prediction on feature value, while SHAP dependence plots display the estimated contributions of a feature to the prediction for each individual case.
When plot_loess = TRUE
, feature values are rounded to three significant digits and
weighted LOESS is computed and plotted, where the weights are the numbers of data points
at each rounded value.
Note: SHAP contributions are on the scale of the model margin. E.g., for a logistic binomial objective, the margin is on log-odds scale. Also, since SHAP stands for "SHapley Additive exPlanation" (model prediction = sum of SHAP contributions for all features + bias), depending on the objective used, transforming SHAP contributions for a feature from the marginal to the prediction space is not necessarily a meaningful thing to do.
References
Scott M. Lundberg, Su-In Lee, "A Unified Approach to Interpreting Model Predictions", NIPS Proceedings 2017, https://arxiv.org/abs/1705.07874
Scott M. Lundberg, Su-In Lee, "Consistent feature attribution for tree ensembles", https://arxiv.org/abs/1706.06060
Examples
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")
## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)
nrounds <- 20
model_binary <- xgboost(
agaricus.train$data, factor(agaricus.train$label),
nrounds = nrounds,
verbosity = 0L,
learning_rate = 0.1,
max_depth = 3L,
subsample = 0.5,
nthreads = nthread
)
xgb.plot.shap(agaricus.test$data, model = model_binary, features = "odor=none")
contr <- predict(model_binary, agaricus.test$data, type = "contrib")
xgb.plot.shap(agaricus.test$data, contr, model = model_binary, top_n = 12, n_col = 3)
# Summary plot
xgb.ggplot.shap.summary(agaricus.test$data, contr, model = model_binary, top_n = 12)
# Multiclass example - plots for each class separately:
x <- as.matrix(iris[, -5])
set.seed(123)
is.na(x[sample(nrow(x) * 4, 30)]) <- TRUE # introduce some missing values
model_multiclass <- xgboost(
x, iris$Species,
nrounds = nrounds,
verbosity = 0,
max_depth = 2,
subsample = 0.5,
nthreads = nthread
)
nclass <- 3
trees0 <- seq(from = 1, by = nclass, length.out = nrounds)
col <- rgb(0, 0, 1, 0.5)
xgb.plot.shap(
x,
model = model_multiclass,
trees = trees0,
target_class = 0,
top_n = 4,
n_col = 2,
col = col,
pch = 16,
pch_NA = 17
)
xgb.plot.shap(
x,
model = model_multiclass,
trees = trees0 + 1,
target_class = 1,
top_n = 4,
n_col = 2,
col = col,
pch = 16,
pch_NA = 17
)
xgb.plot.shap(
x,
model = model_multiclass,
trees = trees0 + 2,
target_class = 2,
top_n = 4,
n_col = 2,
col = col,
pch = 16,
pch_NA = 17
)
# Summary plot
xgb.ggplot.shap.summary(x, model = model_multiclass, target_class = 0, top_n = 4)