Multiple Outputs

Contents

Added in version 1.6.

Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels. For instance, a movie can be simultaneously classified as both sci-fi and comedy. For detailed explanation of terminologies related to different multi-output models please refer to the scikit-learn user guide.

Note

As of XGBoost 3.0, the feature is experimental and has limited features. Only the Python package is tested. In addition, glinear is not supported.

Training with One-Model-Per-Target

By default, XGBoost builds one model for each target similar to sklearn meta estimators, with the added benefit of reusing data and other integrated features like SHAP. For a worked example of regression, see A demo for multi-output regression. For multi-label classification, the binary relevance strategy is used. Input y should be of shape (n_samples, n_classes) with each column having a value of 0 or 1 to specify whether the sample is labeled as positive for respective class. Given a sample with 3 output classes and 2 labels, the corresponding y should be encoded as [1, 0, 1] with the second class labeled as negative and the rest labeled as positive. At the moment XGBoost supports only dense matrix for labels.

from sklearn.datasets import make_multilabel_classification
import numpy as np

X, y = make_multilabel_classification(
    n_samples=32, n_classes=5, n_labels=3, random_state=0
)
clf = xgb.XGBClassifier(tree_method="hist")
clf.fit(X, y)
np.testing.assert_allclose(clf.predict(X), y)

The feature is still under development with limited support from objectives and metrics.

Training with Vector Leaf

Added in version 2.0.0.

Note

This is still working-in-progress, and most features are missing.

XGBoost can optionally build multi-output trees with the size of leaf equals to the number of targets when the tree method hist is used. The behavior can be controlled by the multi_strategy training parameter, which can take the value one_output_per_tree (the default) for building one model per-target or multi_output_tree for building multi-output trees.

clf = xgb.XGBClassifier(tree_method="hist", multi_strategy="multi_output_tree")

See A demo for multi-output regression for a worked example with regression.

Using Reduced Gradient (Sketch Boost)

Added in version 3.2.0.

Note

This is still working-in-progress, and most features are missing. It is documented here for early testers to provide feedback. Related interface might change without notice.

When the number of targets is large, training a gradient boosting tree model using the full gradient matrix becomes challenging. The training procedure may run out of memory for storing the histogram, or run extremely slowly due to the amount of computation needed. As an optimization, XGBoost implements an interface for using two types of gradients based on the concepts from Sketch Boost [1].

The key insight is that we can use different gradients for two distinct purposes:

Split gradient: A reduced-dimension gradient used to determine the tree structure.
Value gradient: The full gradient used to calculate the final leaf values for accurate predictions.

This separation allows the expensive histogram building and split finding to operate on a smaller gradient matrix, while still producing valid predictions using the full loss function for leaf values. The Sketch Boost paper proposes using dimensionality reduction on the gradient matrix. In practice, one can also define a different but related loss with a small gradient matrix for finding the tree structure.

To access this feature, create a custom objective that inherits from TreeObjective and implement the split_grad method.

from xgboost.objective import TreeObjective
from cuml.decomposition import TruncatedSVD

import cupy as cp

class LsObj(TreeObjective):
    def __call__(self, iteration: int, y_pred, dtrain):
        """Least squared error."""
        y_true = dtrain.get_label()
        grad = y_pred - y_true
        hess = cp.ones(grad.shape)
        return cp.array(grad), cp.array(hess)

    def split_grad(self, iteration: int, grad, hess):
        svd_params = {"algorithm": "jacobi", "n_components": 2, "n_iter": 8}
        svd = TruncatedSVD(output_type="cupy", **svd_params)
        svd.fit(grad)
        grad = svd.transform(grad)
        hess = svd.transform(hess)
        hess = cp.clip(hess, 0.01, None)

        return grad, hess

See A demo for multi-output regression using reduced gradient for a complete worked example. The feature supports only the multi_strategy=multi_output_tree.

Partitioning for categorical splits

Added in version 3.4.0.

For scalar leaves, XGBoost uses optimal partitioning to avoid enumerating all \(2^{k-1} - 1\) binary partitions of \(k\) categories.

For vector leaves, each category has a weight vector and there is no canonical ordering of vectors. XGBoost induces an ordering by projecting each category weight onto the parent’s Newton update direction:

\[\begin{split}u_p &= \frac{w_p}{\lVert w_p \rVert_2} \\ s_c &= u_p^T w_c\end{split}\]

where \(w_p\) is the parent leaf weight and \(w_c\) is the category-level leaf weight. The score measures how strongly category \(c\) follows the parent update direction.

The projection has some nice consistency properties. For one output, it agrees with the standard scalar ordering up to reversal. For parent-aligned vector effects,

\[w_c = \alpha_c w_p\]

we have \(s_c = \alpha_c \lVert w_p \rVert_2\), so ordering the projected scores is equivalent to ordering the scalar coefficients \(\alpha_c\). In general, however, it is an approximation: category contrasts orthogonal to the parent update can be missed, and a weak parent update provides little ordering signal.

References

[1] Leonid Iosipoi, Anton Vakhrushev. “Fast Gradient Boosted Decision Tree for Multioutput Problems”. NeurIPS 2022, pp 25422 - 25435.