# Feature importance

Creates a `data.table` of feature importances.

## Usage

``` r
xgb.importance(
  model = NULL,
  feature_names = getinfo(model, "feature_name"),
  trees = NULL
)
```

## Arguments

- model:

  Object of class `xgb.Booster`.

- feature_names:

  Character vector used to overwrite the feature names of the model. The
  default is `NULL` (use original feature names).

- trees:

  An integer vector of (base-1) tree indices that should be included
  into the importance calculation (only for the "gbtree" booster). The
  default (`NULL`) parses all trees. It could be useful, e.g., in
  multiclass classification to get feature importances for each class
  separately.

## Value

A `data.table` with the following columns:

For a tree model:

- `Features`: Names of the features used in the model.

- `Gain`: Fractional contribution of each feature to the model based on
  the total gain of this feature's splits. Higher percentage means
  higher importance.

- `Cover`: Metric of the number of observation related to this feature.

- `Frequency`: Percentage of times a feature has been used in trees.

For a linear model:

- `Features`: Names of the features used in the model.

- `Weight`: Linear coefficient of this feature.

- `Class`: Class label (only for multiclass models). For objects of
  class `xgboost` (as produced by
  [`xgboost()`](https://github.com/dmlc/xgboost/reference/xgboost.md)),
  it will be a `factor`, while for objects of class `xgb.Booster` (as
  produced by
  [`xgb.train()`](https://github.com/dmlc/xgboost/reference/xgb.train.md)),
  it will be a zero-based integer vector.

If `feature_names` is not provided and `model` doesn't have
`feature_names`, the index of the features will be used instead. Because
the index is extracted from the model dump (based on C++ code), it
starts at 0 (as in C/C++ or Python) instead of 1 (usual in R).

## Details

This function works for both linear and tree models.

For linear models, the importance is the absolute magnitude of linear
coefficients. To obtain a meaningful ranking by importance for linear
models, the features need to be on the same scale (which is also
recommended when using L1 or L2 regularization).

## Examples

``` r
# binary classification using "gbtree":
data("ToothGrowth")
x <- ToothGrowth[, c("len", "dose")]
y <- ToothGrowth$supp
model_tree_binary <- xgboost(
  x, y,
  nrounds = 5L,
  nthreads = 1L,
  booster = "gbtree",
  max_depth = 2L
)
xgb.importance(model_tree_binary)

# binary classification using "gblinear":
model_tree_linear <- xgboost(
  x, y,
  nrounds = 5L,
  nthreads = 1L,
  booster = "gblinear",
  learning_rate = 0.3
)
xgb.importance(model_tree_linear)

# multi-class classification using "gbtree":
data("iris")
x <- iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]
y <- iris$Species
model_tree_multi <- xgboost(
  x, y,
  nrounds = 5L,
  nthreads = 1L,
  booster = "gbtree",
  max_depth = 3
)
# all classes clumped together:
xgb.importance(model_tree_multi)
# inspect importances separately for each class:
num_classes <- 3L
nrounds <- 5L
xgb.importance(
  model_tree_multi, trees = seq(from = 1, by = num_classes, length.out = nrounds)
)
xgb.importance(
  model_tree_multi, trees = seq(from = 2, by = num_classes, length.out = nrounds)
)
xgb.importance(
  model_tree_multi, trees = seq(from = 3, by = num_classes, length.out = nrounds)
)

# multi-class classification using "gblinear":
model_linear_multi <- xgboost(
  x, y,
  nrounds = 5L,
  nthreads = 1L,
  booster = "gblinear",
  learning_rate = 0.2
)
xgb.importance(model_linear_multi)
```