Get the quantile cuts (a.k.a. borders) from an xgb.DMatrix
that has been quantized for the histogram method (tree_method = "hist"
).
These cuts are used in order to assign observations to bins - i.e. these are ordered
boundaries which are used to determine assignment condition border_low < x < border_high
.
As such, the first and last bin will be outside of the range of the data, so as to include
all of the observations there.
If a given column has 'n' bins, then there will be 'n+1' cuts / borders for that column, which will be output in sorted order from lowest to highest.
Different columns can have different numbers of bins according to their range.
Usage
xgb.get.DMatrix.qcut(dmat, output = c("list", "arrays"))
Arguments
- dmat
An
xgb.DMatrix
object, as returned byxgb.DMatrix()
.- output
Output format for the quantile cuts. Possible options are:
"list"
will return the output as a list with one entry per column, where each column will have a numeric vector with the cuts. The list will be named if
dmat` has column names assigned to it."arrays"
will return a list with entriesindptr
(base-0 indexing) anddata
. Here, the cuts for column 'i' are obtained by slicing 'data' from entriesindptr[i]+1
toindptr[i+1]
.
Examples
data(mtcars)
y <- mtcars$mpg
x <- as.matrix(mtcars[, -1])
dm <- xgb.DMatrix(x, label = y, nthread = 1)
# DMatrix is not quantized right away, but will be once a hist model is generated
model <- xgb.train(
data = dm,
params = xgb.params(tree_method = "hist", max_bin = 8, nthread = 1),
nrounds = 3
)
# Now can get the quantile cuts
xgb.get.DMatrix.qcut(dm)