XGBoost GPU Support¶

This page contains information about GPU algorithms supported in XGBoost. To install GPU support, checkout the Installation Guide.

Note

CUDA 8.0, Compute Capability 3.5 required

The GPU algorithms in XGBoost require a graphics card with compute capability 3.5 or higher, with CUDA toolkits 8.0 or later. (See this list to look up compute capability of your GPU card.)

CUDA Accelerated Tree Construction Algorithms¶

Tree construction (training) and prediction can be accelerated with CUDA-capable GPUs.

Usage¶

Specify the tree_method parameter as one of the following algorithms.

Algorithms¶

tree_method	Description
gpu_exact	The standard XGBoost tree construction algorithm. Performs exact search for splits. Slower and uses considerably more memory than `gpu_hist`.
gpu_hist	Equivalent to the XGBoost fast histogram algorithm. Much faster and uses considerably less memory. NOTE: Will run very slowly on GPUs older than Pascal architecture.

Supported parameters¶

parameter	`gpu_exact`	`gpu_hist`
`subsample`	✘	✔
`colsample_bytree`	✘	✔
`colsample_bylevel`	✘	✔
`max_bin`	✘	✔
`gpu_id`	✔	✔
`n_gpus`	✘	✔
`predictor`	✔	✔
`grow_policy`	✘	✔
`monotone_constraints`	✘	✔
`single_precision_histogram`	✘	✔

GPU accelerated prediction is enabled by default for the above mentioned tree_method parameters but can be switched to CPU prediction by setting predictor to cpu_predictor. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting predictor to gpu_predictor.

The experimental parameter single_precision_histogram can be set to True to enable building histograms using single precision. This may improve speed, in particular on older architectures.

The device ordinal can be selected using the gpu_id parameter, which defaults to 0.

Multiple GPUs can be used with the gpu_hist tree method using the n_gpus parameter. which defaults to 1. If this is set to -1 all available GPUs will be used. If gpu_id is specified as non-zero, the selected gpu devices will be from gpu_id to gpu_id+n_gpus, please note that gpu_id+n_gpus must be less than or equal to the number of available GPUs on your system. As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.

Note

Enabling multi-GPU training

Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read Building with GPU support.

The GPU algorithms currently work with CLI, Python and R packages. See Installation Guide for details.

Python example¶

param['gpu_id'] = 0
param['max_bin'] = 16
param['tree_method'] = 'gpu_hist'

Objective functions¶

Most of the objective functions implemented in XGBoost can be run on GPU. Following table shows current support status.

Objectives	GPU support
reg:squarederror	✔
reg:logistic	✔
binary:logistic	✔
binary:logitraw	✔
binary:hinge	✔
count:poisson	✔
reg:gamma	✔
reg:tweedie	✔
multi:softmax	✔
multi:softprob	✔
survival:cox	✘
rank:pairwise	✘
rank:ndcg	✘
rank:map	✘

For multi-gpu support, objective functions also honor the n_gpus parameter, which, by default is set to 1. To disable running objectives on GPU, just set n_gpus to 0.

Metric functions¶

Following table shows current support status for evaluation metrics on the GPU.

Metric	GPU Support
rmse	✔
mae	✔
logloss	✔
error	✔
merror	✘
mlogloss	✘
auc	✘
aucpr	✘
ndcg	✘
map	✘
poisson-nloglik	✔
gamma-nloglik	✔
cox-nloglik	✘
gamma-deviance	✔
tweedie-nloglik	✔

As for objective functions, metrics honor the n_gpus parameter, which, by default is set to 1. To disable running metrics on GPU, just set n_gpus to 0.

Benchmarks¶

You can run benchmarks on synthetic data for binary classification:

python tests/benchmark/benchmark.py

Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations and 0.25/0.75 test/train split on i7-6700K CPU @ 4.00GHz and Pascal Titan X yields the following results:

tree_method	Time (s)
gpu_hist	13.87
hist	63.55
gpu_exact	161.08
exact	1082.20

See GPU Accelerated XGBoost and Updates to the XGBoost GPU algorithms for additional performance benchmarks of the gpu_exact and gpu_hist tree methods.

Developer notes¶

The application may be profiled with annotations by specifying USE_NTVX to cmake and providing the path to the stand-alone nvtx header via NVTX_HEADER_DIR. Regions covered by the ‘Monitor’ class in cuda code will automatically appear in the nsight profiler.

References¶

Mitchell R, Frank E. (2017) Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science 3:e127 https://doi.org/10.7717/peerj-cs.127

Nvidia Parallel Forall: Gradient Boosting, Decision Trees and XGBoost with CUDA

Contributors¶

Many thanks to the following contributors (alphabetical order): * Andrey Adinets * Jiaming Yuan * Jonathan C. McKinney * Matthew Jones * Philip Cho * Rory Mitchell * Shankara Rao Thejaswi Nanditale * Vinay Deshpande

Please report bugs to the user forum https://discuss.xgboost.ai/.

Table Of Contents