Visualizes SHAP contributions of different features.
Usage
xgb.ggplot.shap.summary(
data,
shap_contrib = NULL,
features = NULL,
top_n = 10,
model = NULL,
trees = NULL,
target_class = NULL,
approxcontrib = FALSE,
subsample = NULL
)
xgb.plot.shap.summary(
data,
shap_contrib = NULL,
features = NULL,
top_n = 10,
model = NULL,
trees = NULL,
target_class = NULL,
approxcontrib = FALSE,
subsample = NULL
)
Arguments
- data
The data to explain as a
matrix
,dgCMatrix
, ordata.frame
.- shap_contrib
Matrix of SHAP contributions of
data
. The default (NULL
) computes it frommodel
anddata
.- features
Vector of column indices or feature names to plot. When
NULL
(default), thetop_n
most important features are selected byxgb.importance()
.- top_n
How many of the most important features (<= 100) should be selected? By default 1 for SHAP dependence and 10 for SHAP summary. Only used when
features = NULL
.- model
An
xgb.Booster
model. Only required whenshap_contrib = NULL
orfeatures = NULL
.- trees
Passed to
xgb.importance()
whenfeatures = NULL
.- target_class
Only relevant for multiclass models. The default (
NULL
) averages the SHAP values over all classes. Pass a (0-based) class index to show only SHAP values of that class.- approxcontrib
Passed to
predict.xgb.Booster()
whenshap_contrib = NULL
.- subsample
Fraction of data points randomly picked for plotting. The default (
NULL
) will use up to 100k data points.
Details
A point plot (each point representing one observation from data
) is
produced for each feature, with the points plotted on the SHAP value axis.
Each point (observation) is coloured based on its feature value.
The plot allows to see which features have a negative / positive contribution on the model prediction, and whether the contribution is different for larger or smaller values of the feature. Inspired by the summary plot of https://github.com/shap/shap.
See also
xgb.plot.shap()
, xgb.ggplot.shap.summary()
,
and the Python library https://github.com/shap/shap.