Quantile DMatrix and external memory DMatrix can be created from batches of data.
More...
|
int | XGDMatrixCreateFromDataIter (DataIterHandle data_handle, XGBCallbackDataIterNext *callback, const char *cache_info, DMatrixHandle *out) |
| Create a DMatrix from a data iterator. More...
|
|
int | XGProxyDMatrixCreate (DMatrixHandle *out) |
| Create a DMatrix proxy for setting data, can be free by XGDMatrixFree. More...
|
|
int | XGDMatrixCreateFromCallback (DataIterHandle iter, DMatrixHandle proxy, DataIterResetCallback *reset, XGDMatrixCallbackNext *next, char const *config, DMatrixHandle *out) |
| Create an external memory DMatrix with data iterator. More...
|
|
int | XGQuantileDMatrixCreateFromCallback (DataIterHandle iter, DMatrixHandle proxy, DataIterHandle ref, DataIterResetCallback *reset, XGDMatrixCallbackNext *next, char const *config, DMatrixHandle *out) |
| Create a Quantile DMatrix with data iterator. More...
|
|
int | XGDeviceQuantileDMatrixCreateFromCallback (DataIterHandle iter, DMatrixHandle proxy, DataIterResetCallback *reset, XGDMatrixCallbackNext *next, float missing, int nthread, int max_bin, DMatrixHandle *out) |
| Create a Device Quantile DMatrix with data iterator. More...
|
|
int | XGProxyDMatrixSetDataCudaArrayInterface (DMatrixHandle handle, const char *c_interface_str) |
| Set data on a DMatrix proxy. More...
|
|
int | XGProxyDMatrixSetDataCudaColumnar (DMatrixHandle handle, const char *c_interface_str) |
| Set data on a DMatrix proxy. More...
|
|
int | XGProxyDMatrixSetDataDense (DMatrixHandle handle, char const *c_interface_str) |
| Set data on a DMatrix proxy. More...
|
|
int | XGProxyDMatrixSetDataCSR (DMatrixHandle handle, char const *indptr, char const *indices, char const *data, bst_ulong ncol) |
| Set data on a DMatrix proxy. More...
|
|
Quantile DMatrix and external memory DMatrix can be created from batches of data.
There are 2 sets of data callbacks for DMatrix. The first one is currently exclusively used by JVM packages. It uses XGBoostBatchCSR
to accept batches for CSR formated input, and concatenate them into 1 final big CSR. The related functions are:
Another set is used by external data iterator. It accept foreign data iterators as callbacks. There are 2 different senarios where users might want to pass in callbacks instead of raw data. First it's the Quantile DMatrix used by hist and GPU Hist. For this case, the data is first compressed by quantile sketching then merged. This is particular useful for distributed setting as it eliminates 2 copies of data. 1 by a concat
from external library to make the data into a blob for normal DMatrix initialization, another by the internal CSR copy of DMatrix. The second use case is external memory support where users can pass a custom data iterator into XGBoost for loading data in batches. There are short notes on each of the use cases in respected DMatrix factory function.
Related functions are:
Factory functions
Proxy that callers can use to pass data to XGBoost
◆ DataHolderHandle
handle to a internal data holder.
◆ DataIterHandle
handle to a external data iterator
◆ DataIterResetCallback
Callback function prototype for resetting external iterator.
◆ XGBCallbackDataIterNext
The data reading callback function. The iterator will be able to give subset of batch in the data.
If there is data, the function will call set_function to set the data.
- Parameters
-
data_handle | The handle to the callback. |
set_function | The batch returned by the iterator |
set_function_handle | The handle to be passed to set function. |
- Returns
- 0 if we are reaching the end and batch is not returned.
◆ XGBCallbackSetData
Callback to set the data to handle,.
- Parameters
-
handle | The handle to the callback. |
batch | The data content to be set. |
◆ XGDMatrixCallbackNext
Callback function prototype for getting next batch of data.
- Parameters
-
iter | A handler to the user defined iterator. |
- Returns
- 0 when success, -1 when failure happens
◆ XGDeviceQuantileDMatrixCreateFromCallback()
◆ XGDMatrixCreateFromCallback()
Create an external memory DMatrix with data iterator.
Short note for how to use second set of callback for external memory data support:
- Step 0: Define a data iterator with 2 methods
reset
, and next
.
- Step 1: Create a DMatrix proxy by XGProxyDMatrixCreate and hold the handle.
- Step 2: Pass the iterator handle, proxy handle and 2 methods into
XGDMatrixCreateFromCallback
, along with other parameters encoded as a JSON object.
- Step 3: Call appropriate data setters in
next
functions.
- Parameters
-
| iter | A handle to external data iterator. |
| proxy | A DMatrix proxy handle created by XGProxyDMatrixCreate. |
| reset | Callback function resetting the iterator state. |
| next | Callback function yielding the next batch of data. |
| config | JSON encoded parameters for DMatrix construction. Accepted fields are:
- missing: Which value to represent missing value
- cache_prefix: The path of cache file, caller must initialize all the directories in this path.
- nthread (optional): Number of threads used for initializing DMatrix.
|
[out] | out | The created external memory DMatrix |
- Returns
- 0 when success, -1 when failure happens
- Examples
- external_memory.c.
◆ XGDMatrixCreateFromDataIter()
Create a DMatrix from a data iterator.
- Parameters
-
data_handle | The handle to the data. |
callback | The callback to get the data. |
cache_info | Additional information about cache file, can be null. |
out | The created DMatrix |
- Returns
- 0 when success, -1 when failure happens.
◆ XGProxyDMatrixCreate()
Create a DMatrix proxy for setting data, can be free by XGDMatrixFree.
Second set of callback functions, used by constructing Quantile DMatrix or external memory DMatrix using custom iterator.
- Parameters
-
out | The created Device Quantile DMatrix |
- Returns
- 0 when success, -1 when failure happens
- Examples
- external_memory.c.
◆ XGProxyDMatrixSetDataCSR()
int XGProxyDMatrixSetDataCSR |
( |
DMatrixHandle |
handle, |
|
|
char const * |
indptr, |
|
|
char const * |
indices, |
|
|
char const * |
data, |
|
|
bst_ulong |
ncol |
|
) |
| |
Set data on a DMatrix proxy.
- Parameters
-
handle | A DMatrix proxy created by XGProxyDMatrixCreate |
indptr | JSON encoded array_interface to row pointer in CSR. |
indices | JSON encoded array_interface to column indices in CSR. |
data | JSON encoded array_interface to values in CSR.. |
ncol | The number of columns of input CSR matrix. |
- Returns
- 0 when success, -1 when failure happens
◆ XGProxyDMatrixSetDataCudaArrayInterface()
int XGProxyDMatrixSetDataCudaArrayInterface |
( |
DMatrixHandle |
handle, |
|
|
const char * |
c_interface_str |
|
) |
| |
Set data on a DMatrix proxy.
- Parameters
-
handle | A DMatrix proxy created by XGProxyDMatrixCreate |
c_interface_str | Null terminated JSON document string representation of CUDA array interface. |
- Returns
- 0 when success, -1 when failure happens
◆ XGProxyDMatrixSetDataCudaColumnar()
int XGProxyDMatrixSetDataCudaColumnar |
( |
DMatrixHandle |
handle, |
|
|
const char * |
c_interface_str |
|
) |
| |
Set data on a DMatrix proxy.
- Parameters
-
handle | A DMatrix proxy created by XGProxyDMatrixCreate |
c_interface_str | Null terminated JSON document string representation of CUDA array interface, with an array of columns. |
- Returns
- 0 when success, -1 when failure happens
◆ XGProxyDMatrixSetDataDense()
int XGProxyDMatrixSetDataDense |
( |
DMatrixHandle |
handle, |
|
|
char const * |
c_interface_str |
|
) |
| |
Set data on a DMatrix proxy.
- Parameters
-
handle | A DMatrix proxy created by XGProxyDMatrixCreate |
c_interface_str | Null terminated JSON document string representation of array interface. |
- Returns
- 0 when success, -1 when failure happens
- Examples
- external_memory.c.
◆ XGQuantileDMatrixCreateFromCallback()
Create a Quantile DMatrix with data iterator.
Short note for how to use the second set of callback for (GPU)Hist tree method:
- Step 0: Define a data iterator with 2 methods
reset
, and next
.
- Step 1: Create a DMatrix proxy by XGProxyDMatrixCreate and hold the handle.
- Step 2: Pass the iterator handle, proxy handle and 2 methods into
XGQuantileDMatrixCreateFromCallback
.
- Step 3: Call appropriate data setters in
next
functions.
See test_iterative_dmatrix.cu or Python interface for examples.
- Parameters
-
iter | A handle to external data iterator. |
proxy | A DMatrix proxy handle created by XGProxyDMatrixCreate. |
ref | Reference DMatrix for providing quantile information. |
reset | Callback function resetting the iterator state. |
next | Callback function yielding the next batch of data. |
config | JSON encoded parameters for DMatrix construction. Accepted fields are:
- missing: Which value to represent missing value
- nthread (optional): Number of threads used for initializing DMatrix.
- max_bin (optional): Maximum number of bins for building histogram.
|
out | The created Device Quantile DMatrix |
- Returns
- 0 when success, -1 when failure happens