Helper function to supply data in batches of a data iterator when
constructing a DMatrix from external memory through xgb.ExtMemDMatrix()
or through xgb.QuantileDMatrix.from_iterator()
.
This function is only meant to be called inside of a callback function (which
is passed as argument to function xgb.DataIter()
to construct a data iterator)
when constructing a DMatrix through external memory - otherwise, one should call
xgb.DMatrix()
or xgb.QuantileDMatrix()
.
The object that results from calling this function directly is not like
an xgb.DMatrix
- i.e. cannot be used to train a model, nor to get predictions - only
possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
For more information and for example usage, see the documentation for xgb.ExtMemDMatrix()
.
Usage
xgb.DataBatch(
data,
label = NULL,
weight = NULL,
base_margin = NULL,
feature_names = colnames(data),
feature_types = NULL,
group = NULL,
qid = NULL,
label_lower_bound = NULL,
label_upper_bound = NULL,
feature_weights = NULL
)
Arguments
- data
Batch of data belonging to this batch.
Note that not all of the input types supported by
xgb.DMatrix()
are possible to pass here. Supported types are:matrix
, with typesnumeric
,integer
, andlogical
. Note that for typesinteger
andlogical
, missing values might not be automatically recognized as as such - see the documentation for parametermissing
inxgb.ExtMemDMatrix()
for details on this.data.frame
, with the same types as supported by 'xgb.DMatrix' and same conversions applied to it. See the documentation for parameterdata
inxgb.DMatrix()
for details on it.CSR matrices, as class
dgRMatrix
from package "Matrix".
- label
Label of the training data. For classification problems, should be passed encoded as integers with numeration starting at zero.
- weight
Weight for each instance.
Note that, for ranking task, weights are per-group. In ranking task, one weight is assigned to each group (not each data point). This is because we only care about the relative ordering of data points within each group, so it doesn't make sense to assign weights to individual data points.
- base_margin
Base margin used for boosting from existing model.
In the case of multi-output models, one can also pass multi-dimensional base_margin.
- feature_names
Set names for features. Overrides column names in data frame and matrix.
Note: columns are not referenced by name when calling
predict
, so the column order there must be the same as in the DMatrix construction, regardless of the column names.- feature_types
Set types for features.
If
data
is adata.frame
and passingfeature_types
is not supplied, feature types will be deduced automatically from the column types.Otherwise, one can pass a character vector with the same length as number of columns in
data
, with the following possible values:"c", which represents categorical columns.
"q", which represents numeric columns.
"int", which represents integer columns.
"i", which represents logical (boolean) columns.
Note that, while categorical types are treated differently from the rest for model fitting purposes, the other types do not influence the generated model, but have effects in other functionalities such as feature importances.
Important: Categorical features, if specified manually through
feature_types
, must be encoded as integers with numeration starting at zero, and the same encoding needs to be applied when passing data topredict()
. Even if passingfactor
types, the encoding will not be saved, so make sure thatfactor
columns passed topredict
have the samelevels
.- group
Group size for all ranking group.
- qid
Query ID for data samples, used for ranking.
- label_lower_bound
Lower bound for survival training.
- label_upper_bound
Upper bound for survival training.
- feature_weights
Set feature weights for column sampling.
Value
An object of class xgb.DataBatch
, which is just a list containing the
data and parameters passed here. It does not inherit from xgb.DMatrix
.