Represents previously calculated feature importance as a bar graph.
xgb.plot.importance()uses base R graphics, whilexgb.ggplot.importance()uses "ggplot".
Usage
xgb.ggplot.importance(
importance_matrix = NULL,
top_n = NULL,
measure = NULL,
rel_to_first = FALSE,
n_clusters = seq_len(10),
...
)
xgb.plot.importance(
importance_matrix = NULL,
top_n = NULL,
measure = NULL,
rel_to_first = FALSE,
left_margin = 10,
cex = NULL,
plot = TRUE,
...
)Arguments
- importance_matrix
A
data.tableas returned byxgb.importance().- top_n
Maximal number of top features to include into the plot.
- measure
The name of importance measure to plot. When
NULL, 'Gain' would be used for trees and 'Weight' would be used for gblinear.- rel_to_first
Whether importance values should be represented as relative to the highest ranked feature, see Details.
- n_clusters
A numeric vector containing the min and the max range of the possible number of clusters of bars.
- ...
Other parameters passed to
graphics::barplot()(excepthoriz,border,cex.names,names.arg, andlas). Only used inxgb.plot.importance().- left_margin
Adjust the left margin size to fit feature names. When
NULL, the existingpar("mar")is used.- cex
Passed as
cex.namesparameter tographics::barplot().- plot
Should the barplot be shown? Default is
TRUE.
Value
The return value depends on the function:
xgb.plot.importance(): Invisibly, a "data.table" withn_topfeatures sorted by importance. Ifplot = TRUE, the values are also plotted as barplot.xgb.ggplot.importance(): A customizable "ggplot" object. E.g., to change the title, set+ ggtitle("A GRAPH NAME").
Details
The graph represents each feature as a horizontal bar of length proportional to the importance of a feature. Features are sorted by decreasing importance. It works for both "gblinear" and "gbtree" models.
When rel_to_first = FALSE, the values would be plotted as in importance_matrix.
For a "gbtree" model, that would mean being normalized to the total of 1
("what is feature's importance contribution relative to the whole model?").
For linear models, rel_to_first = FALSE would show actual values of the coefficients.
Setting rel_to_first = TRUE allows to see the picture from the perspective of
"what is feature's importance contribution relative to the most important feature?"
The "ggplot" backend performs 1-D clustering of the importance values, with bar colors corresponding to different clusters having similar importance values.
Examples
data(agaricus.train)
## Keep the number of threads to 2 for examples
nthread <- 2
data.table::setDTthreads(nthread)
model <- xgboost(
agaricus.train$data, factor(agaricus.train$label),
nrounds = 2,
max_depth = 3,
nthreads = nthread
)
importance_matrix <- xgb.importance(model)
xgb.plot.importance(
importance_matrix, rel_to_first = TRUE, xlab = "Relative importance"
)
gg <- xgb.ggplot.importance(
importance_matrix, measure = "Frequency", rel_to_first = TRUE
)
gg
gg + ggplot2::ylab("Frequency")