Multiple Outputs
Added in version 1.6.
Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels. For instance, a movie can be simultaneously classified as both sci-fi and comedy. For detailed explanation of terminologies related to different multi-output models please refer to the scikit-learn user guide.
Note
As of XGBoost 2.0, the feature is experimental and has limited features. Only the Python package is tested.
Training with One-Model-Per-Target
By default, XGBoost builds one model for each target similar to sklearn meta estimators,
with the added benefit of reusing data and other integrated features like SHAP. For a
worked example of regression, see
A demo for multi-output regression. For multi-label classification,
the binary relevance strategy is used. Input y
should be of shape (n_samples,
n_classes)
with each column having a value of 0 or 1 to specify whether the sample is
labeled as positive for respective class. Given a sample with 3 output classes and 2
labels, the corresponding y should be encoded as [1, 0, 1]
with the second class
labeled as negative and the rest labeled as positive. At the moment XGBoost supports only
dense matrix for labels.
from sklearn.datasets import make_multilabel_classification
import numpy as np
X, y = make_multilabel_classification(
n_samples=32, n_classes=5, n_labels=3, random_state=0
)
clf = xgb.XGBClassifier(tree_method="hist")
clf.fit(X, y)
np.testing.assert_allclose(clf.predict(X), y)
The feature is still under development with limited support from objectives and metrics.
Training with Vector Leaf
Added in version 2.0.
Note
This is still working-in-progress, and most features are missing.
XGBoost can optionally build multi-output trees with the size of leaf equals to the number
of targets when the tree method hist is used. The behavior can be controlled by the
multi_strategy
training parameter, which can take the value one_output_per_tree (the
default) for building one model per-target or multi_output_tree for building
multi-output trees.
clf = xgb.XGBClassifier(tree_method="hist", multi_strategy="multi_output_tree")
See A demo for multi-output regression for a worked example with regression.