Skip to contents

Visualizes distributions related to the depth of tree leaves.

  • xgb.plot.deepness() uses base R graphics, while

  • xgb.ggplot.deepness() uses "ggplot2".

Usage

xgb.ggplot.deepness(
  model = NULL,
  which = c("2x1", "max.depth", "med.depth", "med.weight")
)

xgb.plot.deepness(
  model = NULL,
  which = c("2x1", "max.depth", "med.depth", "med.weight"),
  plot = TRUE,
  ...
)

Arguments

model

Either an xgb.Booster model, or the "data.table" returned by xgb.model.dt.tree().

which

Which distribution to plot (see details).

plot

Should the plot be shown? Default is TRUE.

...

Other parameters passed to graphics::barplot() or graphics::plot().

Value

The return value of the two functions is as follows:

  • xgb.plot.deepness(): A "data.table" (invisibly). Each row corresponds to a terminal leaf in the model. It contains its information about depth, cover, and weight (used in calculating predictions). If plot = TRUE, also a plot is shown.

  • xgb.ggplot.deepness(): When which = "2x1", a list of two "ggplot" objects, and a single "ggplot" object otherwise.

Details

When which = "2x1", two distributions with respect to the leaf depth are plotted on top of each other:

  1. The distribution of the number of leaves in a tree model at a certain depth.

  2. The distribution of the average weighted number of observations ("cover") ending up in leaves at a certain depth.

Those could be helpful in determining sensible ranges of the max_depth and min_child_weight parameters.

When which = "max.depth" or which = "med.depth", plots of either maximum or median depth per tree with respect to the tree number are created.

Finally, which = "med.weight" allows to see how a tree's median absolute leaf weight changes through the iterations.

These functions have been inspired by the blog post https://github.com/aysent/random-forest-leaf-visualization.

Examples


data(agaricus.train, package = "xgboost")
## Keep the number of threads to 2 for examples
nthread <- 2
data.table::setDTthreads(nthread)

## Change max_depth to a higher number to get a more significant result
model <- xgboost(
  agaricus.train$data, factor(agaricus.train$label),
  nrounds = 50,
  max_depth = 6,
  nthreads = nthread,
  subsample = 0.5,
  min_child_weight = 2
)

xgb.plot.deepness(model)
xgb.ggplot.deepness(model)

xgb.plot.deepness(
  model, which = "max.depth", pch = 16, col = rgb(0, 0, 1, 0.3), cex = 2
)

xgb.plot.deepness(
  model, which = "med.weight", pch = 16, col = rgb(0, 0, 1, 0.3), cex = 2
)