XGBoost GPU Support

This page contains information about GPU algorithms supported in XGBoost. To install GPU support, checkout the Installation Guide.

Note

CUDA 8.0, Compute Capability 3.5 required

The GPU algorithms in XGBoost require a graphics card with compute capability 3.5 or higher, with CUDA toolkits 8.0 or later. (See this list to look up compute capability of your GPU card.)

CUDA Accelerated Tree Construction Algorithms

Tree construction (training) and prediction can be accelerated with CUDA-capable GPUs.

Usage

Specify the tree_method parameter as one of the following algorithms.

Algorithms

tree_method Description
gpu_exact The standard XGBoost tree construction algorithm. Performs exact search for splits. Slower and uses considerably more memory than gpu_hist.
gpu_hist Equivalent to the XGBoost fast histogram algorithm. Much faster and uses considerably less memory. NOTE: Will run very slowly on GPUs older than Pascal architecture.

Supported parameters

parameter gpu_exact gpu_hist
subsample
colsample_bytree
colsample_bylevel
max_bin
gpu_id
n_gpus
predictor
grow_policy
monotone_constraints
single_precision_histogram

GPU accelerated prediction is enabled by default for the above mentioned tree_method parameters but can be switched to CPU prediction by setting predictor to cpu_predictor. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting predictor to gpu_predictor.

The experimental parameter single_precision_histogram can be set to True to enable building histograms using single precision. This may improve speed, in particular on older architectures.

The device ordinal can be selected using the gpu_id parameter, which defaults to 0.

Multiple GPUs can be used with the gpu_hist tree method using the n_gpus parameter. which defaults to 1. If this is set to -1 all available GPUs will be used. If gpu_id is specified as non-zero, the selected gpu devices will be from gpu_id to gpu_id+n_gpus, please note that gpu_id+n_gpus must be less than or equal to the number of available GPUs on your system. As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.

Note

Enabling multi-GPU training

Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read Building with GPU support.

The GPU algorithms currently work with CLI, Python and R packages. See Installation Guide for details.

Python example
param['gpu_id'] = 0
param['max_bin'] = 16
param['tree_method'] = 'gpu_hist'

Objective functions

Most of the objective functions implemented in XGBoost can be run on GPU. Following table shows current support status.

Objectives GPU support
reg:squarederror
reg:logistic
binary:logistic
binary:logitraw
binary:hinge
count:poisson
reg:gamma
reg:tweedie
multi:softmax
multi:softprob
survival:cox
rank:pairwise
rank:ndcg
rank:map

For multi-gpu support, objective functions also honor the n_gpus parameter, which, by default is set to 1. To disable running objectives on GPU, just set n_gpus to 0.

Metric functions

Following table shows current support status for evaluation metrics on the GPU.

Metric GPU Support
rmse
mae
logloss
error
merror
mlogloss
auc
aucpr
ndcg
map
poisson-nloglik
gamma-nloglik
cox-nloglik
gamma-deviance
tweedie-nloglik

As for objective functions, metrics honor the n_gpus parameter, which, by default is set to 1. To disable running metrics on GPU, just set n_gpus to 0.

Benchmarks

You can run benchmarks on synthetic data for binary classification:

python tests/benchmark/benchmark.py

Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations and 0.25/0.75 test/train split on i7-6700K CPU @ 4.00GHz and Pascal Titan X yields the following results:

tree_method Time (s)
gpu_hist 13.87
hist 63.55
gpu_exact 161.08
exact 1082.20

See GPU Accelerated XGBoost and Updates to the XGBoost GPU algorithms for additional performance benchmarks of the gpu_exact and gpu_hist tree methods.

Developer notes

The application may be profiled with annotations by specifying USE_NTVX to cmake and providing the path to the stand-alone nvtx header via NVTX_HEADER_DIR. Regions covered by the ‘Monitor’ class in cuda code will automatically appear in the nsight profiler.

References

Mitchell R, Frank E. (2017) Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science 3:e127 https://doi.org/10.7717/peerj-cs.127

Nvidia Parallel Forall: Gradient Boosting, Decision Trees and XGBoost with CUDA

Contributors

Many thanks to the following contributors (alphabetical order): * Andrey Adinets * Jiaming Yuan * Jonathan C. McKinney * Matthew Jones * Philip Cho * Rory Mitchell * Shankara Rao Thejaswi Nanditale * Vinay Deshpande

Please report bugs to the user forum https://discuss.xgboost.ai/.