Skip to contents

Get the quantile cuts (a.k.a. borders) from an xgb.DMatrix that has been quantized for the histogram method (tree_method = "hist").

These cuts are used in order to assign observations to bins - i.e. these are ordered boundaries which are used to determine assignment condition border_low < x < border_high. As such, the first and last bin will be outside of the range of the data, so as to include all of the observations there.

If a given column has 'n' bins, then there will be 'n+1' cuts / borders for that column, which will be output in sorted order from lowest to highest.

Different columns can have different numbers of bins according to their range.

Usage

xgb.get.DMatrix.qcut(dmat, output = c("list", "arrays"))

Arguments

dmat

An xgb.DMatrix object, as returned by xgb.DMatrix().

output

Output format for the quantile cuts. Possible options are:

  • "list"will return the output as a list with one entry per column, where each column will have a numeric vector with the cuts. The list will be named ifdmat` has column names assigned to it.

  • "arrays" will return a list with entries indptr (base-0 indexing) and data. Here, the cuts for column 'i' are obtained by slicing 'data' from entries indptr[i]+1 to indptr[i+1].

Value

The quantile cuts, in the format specified by parameter output.

Examples

data(mtcars)

y <- mtcars$mpg
x <- as.matrix(mtcars[, -1])
dm <- xgb.DMatrix(x, label = y, nthread = 1)

# DMatrix is not quantized right away, but will be once a hist model is generated
model <- xgb.train(
  data = dm,
  params = xgb.params(tree_method = "hist", max_bin = 8, nthread = 1),
  nrounds = 3
)

# Now can get the quantile cuts
xgb.get.DMatrix.qcut(dm)