Creates a data.table
of feature importances.
Usage
xgb.importance(
model = NULL,
feature_names = getinfo(model, "feature_name"),
trees = NULL
)
Arguments
- model
Object of class
xgb.Booster
.- feature_names
Character vector used to overwrite the feature names of the model. The default is
NULL
(use original feature names).- trees
An integer vector of (base-1) tree indices that should be included into the importance calculation (only for the "gbtree" booster). The default (
NULL
) parses all trees. It could be useful, e.g., in multiclass classification to get feature importances for each class separately.
Value
A data.table
with the following columns:
For a tree model:
Features
: Names of the features used in the model.Gain
: Fractional contribution of each feature to the model based on the total gain of this feature's splits. Higher percentage means higher importance.Cover
: Metric of the number of observation related to this feature.Frequency
: Percentage of times a feature has been used in trees.
For a linear model:
Features
: Names of the features used in the model.Weight
: Linear coefficient of this feature.Class
: Class label (only for multiclass models). For objects of classxgboost
(as produced byxgboost()
), it will be afactor
, while for objects of classxgb.Booster
(as produced byxgb.train()
), it will be a zero-based integer vector.
If feature_names
is not provided and model
doesn't have feature_names
,
the index of the features will be used instead. Because the index is extracted from the model dump
(based on C++ code), it starts at 0 (as in C/C++ or Python) instead of 1 (usual in R).
Details
This function works for both linear and tree models.
For linear models, the importance is the absolute magnitude of linear coefficients. To obtain a meaningful ranking by importance for linear models, the features need to be on the same scale (which is also recommended when using L1 or L2 regularization).
Examples
# binary classification using "gbtree":
data("ToothGrowth")
x <- ToothGrowth[, c("len", "dose")]
y <- ToothGrowth$supp
model_tree_binary <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gbtree",
max_depth = 2L
)
xgb.importance(model_tree_binary)
# binary classification using "gblinear":
model_tree_linear <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gblinear",
learning_rate = 0.3
)
xgb.importance(model_tree_linear)
# multi-class classification using "gbtree":
data("iris")
x <- iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]
y <- iris$Species
model_tree_multi <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gbtree",
max_depth = 3
)
# all classes clumped together:
xgb.importance(model_tree_multi)
# inspect importances separately for each class:
num_classes <- 3L
nrounds <- 5L
xgb.importance(
model_tree_multi, trees = seq(from = 1, by = num_classes, length.out = nrounds)
)
xgb.importance(
model_tree_multi, trees = seq(from = 2, by = num_classes, length.out = nrounds)
)
xgb.importance(
model_tree_multi, trees = seq(from = 3, by = num_classes, length.out = nrounds)
)
# multi-class classification using "gblinear":
model_linear_multi <- xgboost(
x, y,
nrounds = 5L,
nthreads = 1L,
booster = "gblinear",
learning_rate = 0.2
)
xgb.importance(model_linear_multi)