XGBoost GPU Support

This page contains information about GPU algorithms supported in XGBoost. To install GPU support, checkout the Installation Guide.

CUDA Accelerated Tree Construction Algorithms

Tree construction (training) and prediction can be accelerated with CUDA-capable GPUs.


Specify the tree_method parameter as one of the following algorithms.


tree_method Description
gpu_exact The standard XGBoost tree construction algorithm. Performs exact search for splits. Slower and uses considerably more memory than gpu_hist.
gpu_hist Equivalent to the XGBoost fast histogram algorithm. Much faster and uses considerably less memory. NOTE: Will run very slowly on GPUs older than Pascal architecture.

Supported parameters

parameter gpu_exact gpu_hist

GPU accelerated prediction is enabled by default for the above mentioned tree_method parameters but can be switched to CPU prediction by setting predictor to cpu_predictor. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting predictor to gpu_predictor.

The device ordinal can be selected using the gpu_id parameter, which defaults to 0.

Multiple GPUs can be used with the gpu_hist tree method using the n_gpus parameter. which defaults to 1. If this is set to -1 all available GPUs will be used. If gpu_id is specified as non-zero, the gpu device order is mod(gpu_id + i) % n_visible_devices for i=0 to n_gpus-1. As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.


Enabling multi-GPU training

Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read Building with GPU support.

The GPU algorithms currently work with CLI, Python and R packages. See Installation Guide for details.

Python example
param['gpu_id'] = 0
param['max_bin'] = 16
param['tree_method'] = 'gpu_hist'


You can run benchmarks on synthetic data for binary classification:

python tests/benchmark/benchmark.py

Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations and 0.25/0.75 test/train split on i7-6700K CPU @ 4.00GHz and Pascal Titan X yields the following results:

tree_method Time (s)
gpu_hist 13.87
hist 63.55
gpu_exact 161.08
exact 1082.20

See GPU Accelerated XGBoost and Updates to the XGBoost GPU algorithms for additional performance benchmarks of the gpu_exact and gpu_hist tree methods.


Mitchell R, Frank E. (2017) Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science 3:e127 https://doi.org/10.7717/peerj-cs.127

Nvidia Parallel Forall: Gradient Boosting, Decision Trees and XGBoost with CUDA


  • Rory Mitchell
  • Jonathan C. McKinney
  • Shankara Rao Thejaswi Nanditale
  • Vinay Deshpande
  • … and the rest of the H2O.ai and NVIDIA team.

Please report bugs to the user forum https://discuss.xgboost.ai/.