################################# 3.0.3 Patch Release (Jul 30 2025) ################################# - Fix NDCG metric with non-exp gain. (:pr:`11534`) - Avoid using mean intercept for ``rmsle``. (:pr:`11588`) - [jvm-packages] add ``setNumEarlyStoppingRounds`` API (:pr:`11571`) - Avoid implicit synchronization in GPU evaluation. (:pr:`11542`) - Remove CUDA check in the array interface handler (:pr:`11386`) - Fix check in GPU histogram. (:pr:`11574`) - Support Rapids 25.06 (:pr:`11504`) - Adding ``enable_categorical`` to the sklearn ``.apply`` method (:pr:`11550`) - Make xgboost.testing compatible with scikit-learn 1.7 (:pr:`11502`) - Add support for building xgboost wheels on Win-ARM64 (:pr:`11572`, :pr:`11597`, :pr:`11559`) ################################# 3.0.2 Patch Release (May 25 2025) ################################# - Dask 2025.4.0 scheduler info compatibility fix (:pr:`11462`) - Fix CUDA virtual memory fallback logic on WSL2 (:pr:`11471`) ################################# 3.0.1 Patch Release (May 13 2025) ################################# - Use ``nvidia-smi`` to detect the driver version and handle old drivers that don't support virtual memory. (:pr:`11391`) - Optimize deep trees for GPU external memory. (:pr:`11387`) - Small fix for page concatenation with external memory (:pr:`11338`) - Build xgboost-cpu for ``manylinux_2_28_x86_64`` (:pr:`11406`) - Workaround for different Dask versions (:pr:`11436`) - Output models now use denormal floating-point instead of ``nan``. (:pr:`11428`) - Fix aarch64 CI. (:pr:`11454`) ################### 3.0.0 (2025 Feb 27) ################### 3.0.0 is a milestone for XGBoost. This note will summarize some general changes and then list package-specific updates. The bump in the major version is for a reworked R package along with a significant update to the JVM packages. .. contents:: :backlinks: none :local: *********************** External Memory Support *********************** This release features a major update to the external memory implementation with improved performance, a new :py:class:`~xgboost.ExtMemQuantileDMatrix` for more efficient data initialization, new feature coverage including categorical data support and quantile regression support. Additionally, GPU-based external memory is reworked to support using CPU memory as a data cache. Last but not least, we worked on distributed training using external memory along with the spark package's initial support. - A new :py:class:`~xgboost.ExtMemQuantileDMatrix` class for fast data initialization with the ``hist`` tree method. The new class supports both CPU and GPU training. (:pr:`10689`, :pr:`10682`, :pr:`10886`, :pr:`10860`, :pr:`10762`, :pr:`10694`, :pr:`10876`) - External memory now supports distributed training (:pr:`10492`, :pr:`10861`). In addition, the Spark package can use external memory (the host memory) when the device is GPU. The default package on maven doesn't support RMM yet. For better performance, one needs to compile XGBoost from the source for now. (:pr:`11186`, :pr:`11238`, :pr:`11219`) - Improved performance with new optimizations for both the ``hist``-specific training and the ``approx`` (:py:class:`~xgboost.DMatrix`) method. (:pr:`10529`, :pr:`10980`, :pr:`10342`) - New demos and documents for external memory, including distributed training. (:pr:`11234`, :pr:`10929`, :pr:`10916`, :pr:`10426`, :pr:`11113`) - Reduced binary cache size and memory allocation overhead by not writing the cut matrix. (:pr:`10444`) - More feature coverage, including categorical data and all objective functions, including quantile regression. In addition, various prediction types like SHAP values are supported. (:pr:`10918`, :pr:`10820`, :pr:`10751`, :pr:`10724`) Significant updates for the GPU-based external memory training implementation. (:pr:`10924`, :pr:`10895`, :pr:`10766`, :pr:`10544`, :pr:`10677`, :pr:`10615`, :pr:`10927`, :pr:`10608`, :pr:`10711`) - GPU-based external memory supports both batch-based and sampling-based training. Before the 3.0 release, XGBoost concatenates the data during training and stores the cache on disk. In 3.0, XGBoost can now stage the data on the host and fetch them by batch. (:pr:`10602`, :pr:`10595`, :pr:`10606`, :pr:`10549`, :pr:`10488`, :pr:`10766`, :pr:`10765`, :pr:`10764`, :pr:`10760`, :pr:`10753`, :pr:`10734`, :pr:`10691`, :pr:`10713`, :pr:`10826`, :pr:`10811`, :pr:`10810`, :pr:`10736`, :pr:`10538`, :pr:`11333`) - XGBoost can now utilize `NVLink-C2C` for GPU-based external memory training and can handle up to terabytes of data. - Support prediction cache (:pr:`10707`). - Automatic page concatenation for improved GPU utilization (:pr:`10887`). - Improved quantile sketching algorithm for batch-based inputs. See the section for :ref:`new features <3_0_features>` for more info. - Optimization for nearly-dense input, see the section for :ref:`optimization <3_0_optimization>` for more info. See our latest document for details :doc:`/tutorials/external_memory`. The PyPI package (``pip install``) doesn't have ``RMM`` support, which is required by the GPU external memory implementation. To experiment, you can compile XGBoost from source or wait for the RAPIDS conda package to be available. .. _3_0_networking: ********** Networking ********** Continuing the work from the previous release, we updated the network module to improve reliability. (:pr:`10453`, :pr:`10756`, :pr:`11111`, :pr:`10914`, :pr:`10828`, :pr:`10735`, :pr:`10693`, :pr:`10676`, :pr:`10349`, :pr:`10397`, :pr:`10566`, :pr:`10526`, :pr:`10349`) The timeout option is now supported for NCCL using the NCCL asynchronous mode (:pr:`10850`, :pr:`10934`, :pr:`10945`, :pr:`10930`). In addition, a new :py:class:`~xgboost.collective.Config` class is added for users to specify various options including timeout, tracker port, etc for distributed training. Both the Dask interface and the PySpark interface support the new configuration. (:pr:`11003`, :pr:`10281`, :pr:`10983`, :pr:`10973`) **** SYCL **** Continuing the work on the SYCL integration, there are significant improvements in the feature coverage for this release from more training parameters and more objectives to distributed training, along with various optimization (:pr:`10884`, :pr:`10883`). Starting with 3.0, the SYCL-plugin is close to feature-complete, users can start working on SYCL devices for in-core training and inference. Newly introduced features include: - Dask support for distributed training (:pr:`10812`) - Various training procedures, including split evaluation (:pr:`10605`, :pr:`10636`), grow policy (:pr:`10690`, :pr:`10681`), cached prediction (:pr:`10701`). - Updates for objective functions. (:pr:`11029`, :pr:`10931`, :pr:`11016`, :pr:`10993`, :pr:`11064`, :pr:`10325`) - On-going work for float32-only devices. (:pr:`10702`) Other related PRs (:pr:`10842`, :pr:`10543`, :pr:`10806`, :pr:`10943`, :pr:`10987`, :pr:`10548`, :pr:`10922`, :pr:`10898`, :pr:`10576`) .. _3_0_features: ******** Features ******** This section describes new features in the XGBoost core. For language-specific features, please visit corresponding sections. - A new initialization method for objectives that are derived from GLM. The new method is based on the mean value of the input labels. The new method changes the result of the estimated ``base_score``. (:pr:`10298`, :pr:`11331`) - The :py:class:`xgboost.QuantileDMatrix` can be used with all prediction types for both CPU and GPU. - In prior releases, XGBoost makes a copy for the booster to release memory held by internal tree methods. We formalize the procedure into a new booster method :py:meth:`~xgboost.Booster.reset` / :cpp:func:`XGBoosterReset`. (:pr:`11042`) - OpenMP thread setting is exposed to the XGBoost global configuration. Users can use it to workaround hardcoded OpenMP environment variables. (:pr:`11175`) - We improved learning to rank tasks for better hyper-parameter configuration and for distributed training. + In 3.0, all three distributed interfaces, including Dask, Spark, and PySpark, support sorting the data based on query ID. The option for the :py:class:`~xgboost.dask.DaskXGBRanker` is true by default and can be opted out. (:pr:`11146`, :pr:`11007`, :pr:`11047`, :pr:`11012`, :pr:`10823`, :pr:`11023`) + Also for learning to rank, a new parameter ``lambdarank_score_normalization`` is introduced to make one of the normalizations optional. (:pr:`11272`) + The ``lambdarank_normalization`` now uses the number of pairs when normalizing the ``mean`` pair strategy. Previously, the gradient was used for both ``topk`` and ``mean``. :pr:`11322` - We have improved GPU quantile sketching to reduce memory usage. The improvement helps the construction of the :py:class:`~xgboost.QuantileDMatrix` and the new :py:class:`~xgboost.ExtMemQuantileDMatrix`. + A new multi-level sketching algorithm is employed to reduce the overall memory usage with batched inputs. + In addition to algorithmic changes, internal memory usage estimation and the quantile container is also updated. (:pr:`10761`, :pr:`10843`) + The change introduces two more parameters for the :py:class:`~xgboost.QuantileDMatrix` and :py:class:`~xgboost.DataIter`, namely, ``max_quantile_batches`` and ``min_cache_page_bytes``. - More work is needed to improve the support of categorical features. This release supports plotting trees with stat for categorical nodes (:pr:`11053`). In addition, some preparation work is ongoing for auto re-coding categories. (:pr:`11094`, :pr:`11114`, :pr:`11089`) These are feature enhancements instead of blocking issues. - Implement weight-based feature importance for vector-leaf. (:pr:`10700`) - Reduced logging in the DMatrix construction. (:pr:`11080`) .. _3_0_optimization: ************ Optimization ************ In addition to the external memory and quantile sketching improvements, we have a number of optimizations and performance fixes. - GPU tree methods now use significantly less memory for both dense inputs and near-dense inputs. (:pr:`10821`, :pr:`10870`) - For near-dense inputs, GPU training is much faster for both ``hist`` (about 2x) and ``approx``. - Quantile regression on CPU now can handle imbalance trees much more efficiently. (:pr:`11275`) - Small optimization for DMatrix construction to reduce latency. Also, C users can now reuse the :cpp:func:`ProxyDMatrix ` for multiple inference calls. (:pr:`11273`) - CPU prediction performance for :py:class:`~xgboost.QuantileDMatrix` has been improved (:pr:`11139`) and now is on par with normal ``DMatrix``. - Fixed a performance issue for running inference using CPU with extremely sparse :py:class:`~xgboost.QuantileDMatrix` (:pr:`11250`). - Optimize CPU training memory allocation for improved performance. (:pr:`11112`) - Improved RMM (rapids memory manager) integration. Now, with the help of :py:func:`~xgboost.config_context`, all memory allocated by XGBoost should be routed to RMM. As a bonus, all ``thrust`` algorithms now use async policy. (:pr:`10873`, :pr:`11173`, :pr:`10712`, :pr:`10712`, :pr:`10562`) - When used without RMM, XGBoost is more careful with its use of caching allocator to avoid holding too much device memory. (:pr:`10582`) **************** Breaking Changes **************** This section lists breaking changes that affect all packages. - Remove the deprecated ``DeviceQuantileDMatrix``. (:pr:`10974`, :pr:`10491`) - Support for saving the model in the ``deprecated`` has been removed. Users can still load old models in 3.0. (:pr:`10490`) - Support for the legacy (blocking) CUDA stream is removed (:pr:`10607`) - XGBoost now requires CUDA 12.0 or later. ********* Bug Fixes ********* - Fix the quantile error metric (pinball loss) with multiple quantiles. (:pr:`11279`) - Fix potential access error when running prediction in multi-thread environment. (:pr:`11167`) - Check the correct dump format for the ``gblinear``. (:pr:`10831`) ************* Documentation ************* - A new tutorial for advanced usage with custom objective functions. (:pr:`10283`, :pr:`10725`) - The new online document site now shows documents for all packages including Python, R, and JVM-based packages. (:pr:`11240`, :pr:`11216`, :pr:`11166`) - Lots of enhancements. (:pr:`10822`, 11137, :pr:`11138`, :pr:`11246`, :pr:`11266`, :pr:`11253`, :pr:`10731`, :pr:`11222`, :pr:`10551`, :pr:`10533`) - Consistent use of cmake in documents. (:pr:`10717`) - Add a brief description for using the ``offset`` from the GLM setting (like ``Poisson``). (:pr:`10996`) - Cleanup document for building from source. (:pr:`11145`) - Various fixes. (:pr:`10412`, :pr:`10405`, :pr:`10353`, :pr:`10464`, :pr:`10587`, :pr:`10350`, :pr:`11131`, :pr:`10815`) - Maintenance. (:pr:`11052`, :pr:`10380`) ************** Python Package ************** - The ``feature_weights`` parameter in the sklearn interface is now defined as a scikit-learn parameter. (:pr:`9506`) - Initial support for polars, categorical feature is not yet supported. (:pr:`11126`, :pr:`11172`, :pr:`11116`) - Reduce pandas dataframe overhead and overhead for various imports. (:pr:`11058`, :pr:`11068`) - Better xlabel in :py:func:`~xgboost.plot_importance` (:pr:`11009`) - Validate reference dataset for training. The :py:func:`~xgboost.train` function now throws an error if a :py:class:`~xgboost.QuantileDMatrix` is used as a validation dataset without a reference. (:pr:`11105`) - Fix misleading errors when feature names are missing during inference (:pr:`10814`) - Add Stacklevel to Python warning callback. The change helps improve the error message for the Python package. (:pr:`10977`) - Remove circular reference in DataIter. It helps reduce memory usage. (:pr:`11177`) - Add checks for invalid inputs for `cv`. (:pr:`11255`) - Update Python project classifiers. (:pr:`10381`, :pr:`11028`) - Support doc link for the sklearn module. Users can now find links to documents in a jupyter notebook. (:pr:`10287`) - Dask + Prevent the training from hanging due to aborted workers. (:pr:`10985`) This helps Dask XGBoost be robust against error. When a worker is killed, the training will fail with an exception instead of hang. + Optional support for client-side logging. (:pr:`10942`) + Fix LTR with empty partition and NCCL error. (:pr:`11152`) + Update to work with the latest Dask. (:pr:`11291`) + See the :ref:`3_0_features` section for changes to ranking models. + See the :ref:`3_0_networking` section for changes with the communication module. - PySpark + Expose Training and Validation Metrics. (:pr:`11133`) + Add barrier before initializing the communicator. (:pr:`10938`) + Extend support for columnar input to CPU (GPU-only previously). (:pr:`11299`) + See the :ref:`3_0_features` section for changes to ranking models. + See the :ref:`3_0_networking` section for changes with the communication module. - Document updates (:pr:`11265`). - Maintenance. (:pr:`11071`, :pr:`11211`, :pr:`10837`, :pr:`10754`, :pr:`10347`, :pr:`10678`, :pr:`11002`, :pr:`10692`, :pr:`11006`, :pr:`10972`, :pr:`10907`, :pr:`10659`, :pr:`10358`, :pr:`11149`, :pr:`11178`, :pr:`11248`) - Breaking changes + Remove deprecated `feval`. (:pr:`11051`) + Remove dask from the default import. (:pr:`10935`) Users are now required to import the XGBoost Dask through: .. code-block:: python from xgboost import dask as dxgb instead of: .. code-block:: python import xgboost as xgb xgb.dask The change helps avoid introducing dask into the default import set. + Bump Python requirement to 3.10. (:pr:`10434`) + Drop support for datatable. (:pr:`11070`) ********* R Package ********* We have been reworking the R package for a few releases now. In 3.0, we will start publishing a new R package on R-universe, before moving toward a CRAN update. The new package features a much more ergonomic interface, which is also more idiomatic to R speakers. In addition, a range of new features are introduced to the package. To name a few, the new package includes categorical feature support, ``QuantileDMatrix``, and an initial implementation of the external memory training. To test the new package: .. code-block:: R install.packages('xgboost', repos = c('https://dmlc.r-universe.dev', 'https://cloud.r-project.org')) Also, we finally have an online documentation site for the R package featuring both vignettes and API references (:pr:`11166`, :pr:`11257`). A good starting point for the new interface is the new ``xgboost()`` function. We won't list all the feature gains here, as there are too many! Please visit the :doc:`/R-package/index` for more info. There's a migration guide (:pr:`11197`) there if you use a previous XGBoost R package version. - Support for the MSVC build was dropped due to incompatibility with R headers. (:pr:`10355`, :pr:`11150`) - Maintenance (:pr:`11259`) - Related PRs. (:pr:`11171`, :pr:`11231`, :pr:`11223`, :pr:`11073`, :pr:`11224`, :pr:`11076`, :pr:`11084`, :pr:`11081`, :pr:`11072`, :pr:`11170`, :pr:`11123`, :pr:`11168`, :pr:`11264`, :pr:`11140`, :pr:`11117`, :pr:`11104`, :pr:`11095`, :pr:`11125`, :pr:`11124`, :pr:`11122`, :pr:`11108`, :pr:`11102`, :pr:`11101`, :pr:`11100`, :pr:`11077`, :pr:`11099`, :pr:`11074`, :pr:`11065`, :pr:`11092`, :pr:`11090`, :pr:`11096`, :pr:`11148`, :pr:`11151`, :pr:`11159`, :pr:`11204`, :pr:`11254`, :pr:`11109`, :pr:`11141`, :pr:`10798`, :pr:`10743`, :pr:`10849`, :pr:`10747`, :pr:`11022`, :pr:`10989`, :pr:`11026`, :pr:`11060`, :pr:`11059`, :pr:`11041`, :pr:`11043`, :pr:`11025`, :pr:`10674`, :pr:`10727`, :pr:`10745`, :pr:`10733`, :pr:`10750`, :pr:`10749`, :pr:`10744`, :pr:`10794`, :pr:`10330`, :pr:`10698`, :pr:`10687`, :pr:`10688`, :pr:`10654`, :pr:`10456`, :pr:`10556`, :pr:`10465`, :pr:`10337`) ************ JVM Packages ************ The XGBoost 3.0 release features a significant update to the JVM packages, and in particular, the Spark package. There are breaking changes in packaging and some parameters. Please visit the :doc:`migration guide ` for related changes. The work brings new features and a more unified feature set between CPU and GPU implementation. (:pr:`10639`, :pr:`10833`, :pr:`10845`, :pr:`10847`, :pr:`10635`, :pr:`10630`, :pr:`11179`, :pr:`11184`) - Automatic partitioning for distributed learning to rank. See the :ref:`features <3_0_features>` section above (:pr:`11023`). - Resolve spark compatibility issue (:pr:`10917`) - Support missing value when constructing dmatrix with iterator (:pr:`10628`) - Fix transform performance issue (:pr:`10925`) - Honor skip.native.build option in xgboost4j-gpu (:pr:`10496`) - Support array features type for CPU (:pr:`10937`) - Change default missing value to ``NaN`` for better alignment (:pr:`11225`) - Don't cast to float if it's already float (:pr:`10386`) - Maintenance. (:pr:`10982`, :pr:`10979`, :pr:`10978`, :pr:`10673`, :pr:`10660`, :pr:`10835`, :pr:`10836`, :pr:`10857`, :pr:`10618`, :pr:`10627`) *********** Maintenance *********** Code maintenance includes both refactoring (:pr:`10531`, :pr:`10573`, :pr:`11069`), cleanups (:pr:`11129`, :pr:`10878`, :pr:`11244`, :pr:`10401`, :pr:`10502`, :pr:`11107`, :pr:`11097`, :pr:`11130`, :pr:`10758`, :pr:`10923`, :pr:`10541`, :pr:`10990`), and improvements for tests (:pr:`10611`, :pr:`10658`, :pr:`10583`, :pr:`11245`, :pr:`10708`), along with fixing various warnings in compilers and test dependencies (:pr:`10757`, :pr:`10641`, :pr:`11062`, :pr:`11226`). Also, miscellaneous updates, including some dev scripts and profiling annotations (:pr:`10485`, :pr:`10657`, :pr:`10854`, :pr:`10718`, :pr:`11158`, :pr:`10697`, :pr:`11276`). Lastly, dependency updates (:pr:`10362`, :pr:`10363`, :pr:`10360`, :pr:`10373`, :pr:`10377`, :pr:`10368`, :pr:`10369`, :pr:`10366`, :pr:`11032`, :pr:`11037`, :pr:`11036`, :pr:`11035`, :pr:`11034`, :pr:`10518`, :pr:`10536`, :pr:`10586`, :pr:`10585`, :pr:`10458`, :pr:`10547`, :pr:`10429`, :pr:`10517`, :pr:`10497`, :pr:`10588`, :pr:`10975`, :pr:`10971`, :pr:`10970`, :pr:`10949`, :pr:`10947`, :pr:`10863`, :pr:`10953`, :pr:`10954`, :pr:`10951`, :pr:`10590`, :pr:`10600`, :pr:`10599`, :pr:`10535`, :pr:`10516`, :pr:`10786`, :pr:`10859`, :pr:`10785`, :pr:`10779`, :pr:`10790`, :pr:`10777`, :pr:`10855`, :pr:`10848`, :pr:`10778`, :pr:`10772`, :pr:`10771`, :pr:`10862`, :pr:`10952`, :pr:`10768`, :pr:`10770`, :pr:`10769`, :pr:`10664`, :pr:`10663`, :pr:`10892`, :pr:`10979`, :pr:`10978`). *** CI *** - The CI is reworked to use `RunsOn` to integrate custom CI pipelines with GitHub action. The migration helps us reduce the maintenance burden and make the CI configuration more accessible to others. (:pr:`11001`, :pr:`11079`, :pr:`10649`, :pr:`11196`, :pr:`11055`, :pr:`10483`, :pr:`11078`, :pr:`11157`) - Other maintenance work includes various small fixes, enhancements, and tooling updates. (:pr:`10877`, :pr:`10494`, :pr:`10351`, :pr:`10609`, :pr:`11192`, :pr:`11188`, :pr:`11142`, :pr:`10730`, :pr:`11066`, :pr:`11063`, :pr:`10800`, :pr:`10995`, :pr:`10858`, :pr:`10685`, :pr:`10593`, :pr:`11061`)