Represents previously calculated feature importance as a bar graph.
xgb.plot.importance()
uses base R graphics, whilexgb.ggplot.importance()
uses "ggplot".
Usage
xgb.ggplot.importance(
importance_matrix = NULL,
top_n = NULL,
measure = NULL,
rel_to_first = FALSE,
n_clusters = seq_len(10),
...
)
xgb.plot.importance(
importance_matrix = NULL,
top_n = NULL,
measure = NULL,
rel_to_first = FALSE,
left_margin = 10,
cex = NULL,
plot = TRUE,
...
)
Arguments
- importance_matrix
A
data.table
as returned byxgb.importance()
.- top_n
Maximal number of top features to include into the plot.
- measure
The name of importance measure to plot. When
NULL
, 'Gain' would be used for trees and 'Weight' would be used for gblinear.- rel_to_first
Whether importance values should be represented as relative to the highest ranked feature, see Details.
- n_clusters
A numeric vector containing the min and the max range of the possible number of clusters of bars.
- ...
Other parameters passed to
graphics::barplot()
(excepthoriz
,border
,cex.names
,names.arg
, andlas
). Only used inxgb.plot.importance()
.- left_margin
Adjust the left margin size to fit feature names. When
NULL
, the existingpar("mar")
is used.- cex
Passed as
cex.names
parameter tographics::barplot()
.- plot
Should the barplot be shown? Default is
TRUE
.
Value
The return value depends on the function:
xgb.plot.importance()
: Invisibly, a "data.table" withn_top
features sorted by importance. Ifplot = TRUE
, the values are also plotted as barplot.xgb.ggplot.importance()
: A customizable "ggplot" object. E.g., to change the title, set+ ggtitle("A GRAPH NAME")
.
Details
The graph represents each feature as a horizontal bar of length proportional to the importance of a feature. Features are sorted by decreasing importance. It works for both "gblinear" and "gbtree" models.
When rel_to_first = FALSE
, the values would be plotted as in importance_matrix
.
For a "gbtree" model, that would mean being normalized to the total of 1
("what is feature's importance contribution relative to the whole model?").
For linear models, rel_to_first = FALSE
would show actual values of the coefficients.
Setting rel_to_first = TRUE
allows to see the picture from the perspective of
"what is feature's importance contribution relative to the most important feature?"
The "ggplot" backend performs 1-D clustering of the importance values, with bar colors corresponding to different clusters having similar importance values.
Examples
data(agaricus.train)
## Keep the number of threads to 2 for examples
nthread <- 2
data.table::setDTthreads(nthread)
model <- xgboost(
agaricus.train$data, factor(agaricus.train$label),
nrounds = 2,
max_depth = 3,
nthreads = nthread
)
importance_matrix <- xgb.importance(model)
xgb.plot.importance(
importance_matrix, rel_to_first = TRUE, xlab = "Relative importance"
)
gg <- xgb.ggplot.importance(
importance_matrix, measure = "Frequency", rel_to_first = TRUE
)
gg
gg + ggplot2::ylab("Frequency")