Intercept
Added in version 2.0.0.
Since 2.0.0, XGBoost supports estimating the model intercept (named base_score)
automatically based on targets upon training. The behavior can be controlled by setting
base_score to a constant value. The following snippet disables the automatic
estimation:
import xgboost as xgb
clf = xgb.XGBClassifier(n_estimators=10)
clf.set_params(base_score=0.5)
library(xgboost)
# Load built-in dataset
data(agaricus.train, package = "xgboost")
# Set base_score parameter directly
model <- xgboost(
x = agaricus.train$data,
y = factor(agaricus.train$label),
base_score = 0.5,
nrounds = 10
)
In addition, here 0.5 represents the value after applying the inverse link function. See the end of the document for a description.
Other than the base_score, users can also provide global bias via the data field
base_margin, which is a vector or a matrix depending on the task. With multi-output
and multi-class, the base_margin is a matrix with size (n_samples, n_targets) or
(n_samples, n_classes).
import xgboost as xgb
from sklearn.datasets import make_classification
X, y = make_classification()
clf = xgb.XGBClassifier()
clf.fit(X, y)
# Request for raw prediction
m = clf.predict(X, output_margin=True)
clf_1 = xgb.XGBClassifier()
# Feed the prediction into the next model
# Using base margin overrides the base score, see below sections.
clf_1.fit(X, y, base_margin=m)
clf_1.predict(X, base_margin=m)
library(xgboost)
# Load built-in dataset
data(agaricus.train, package = "xgboost")
# Train first model
model_1 <- xgboost(
x = agaricus.train$data,
y = factor(agaricus.train$label),
nrounds = 10
)
# Request for raw prediction
m <- predict(model_1, agaricus.train$data, type = "raw")
# Feed the prediction into the next model using base_margin
# Using base margin overrides the base score, see below sections.
model_2 <- xgboost(
x = agaricus.train$data,
y = factor(agaricus.train$label),
base_margin = m,
nrounds = 10
)
# Make predictions with base_margin
pred <- predict(model_2, agaricus.train$data, base_margin = m)
It specifies the bias for each sample and can be used for stacking an XGBoost model on top
of other models, see Demo for boosting from prediction for a worked
example. When base_margin is specified, it automatically overrides the base_score
parameter. If you are stacking XGBoost models, then the usage should be relatively
straightforward, with the previous model providing raw prediction and a new model using
the prediction as bias. For more customized inputs, users need to take extra care of the
link function. Let \(F\) be the model and \(g\) be the link function, since
base_score is overridden when sample-specific base_margin is available, we will
omit it here:
When base margin \(b\) is provided, it’s added to the raw model output \(F\):
and the output of the final model is:
Using the gamma deviance objective reg:gamma as an example, which has a log link
function, hence:
As a result, if you are feeding outputs from models like GLM with a corresponding objective function, make sure the outputs are not yet transformed by the inverse link (activation).
In the case of base_score (intercept), it can be accessed through
save_config() after estimation. Unlike the base_margin, the
returned value represents a value after applying inverse link. With logistic regression
and the logit link function as an example, given the base_score as 0.5,
\(g(intercept) = logit(0.5) = 0\) is added to the raw model output:
and 0.5 is the same as \(base\_score = g^{-1}(0) = 0.5\). This is more intuitive if you remove the model and consider only the intercept, which is estimated before the model is fitted:
For some objectives like MAE, there are close solutions, while for others it’s estimated with one step Newton method.
Offset
The base_margin is a form of offset in GLM. Using the Poisson objective as an
example, we might want to model the rate instead of the count:
And the offset is defined as log link applied to the exposure variable: \(\ln{exposure}\). Let \(c\) be the count and \(\gamma\) be the exposure, substituting the response \(y\) in our previous formulation of base margin:
Substitute \(g\) with \(\ln\) for Poisson regression:
We have:
As you can see, we can use the base_margin for modeling with offset similar to GLMs
Example
The following example shows the relationship between base_score and base_margin
using binary logistic with a logit link function:
import numpy as np
from scipy.special import logit
from sklearn.datasets import make_classification
import xgboost as xgb
X, y = make_classification(random_state=2025)
library(xgboost)
# Load built-in dataset
data(agaricus.train, package = "xgboost")
X <- agaricus.train$data
y <- agaricus.train$label
The intercept is a valid probability (0.5). It’s used as the initial estimation of the probability of obtaining a positive sample.
intercept = 0.5
intercept <- 0.5
First we use the intercept to train a model:
booster = xgb.train(
{"base_score": intercept, "objective": "binary:logistic"},
dtrain=xgb.DMatrix(X, y),
num_boost_round=1,
)
predt_0 = booster.predict(xgb.DMatrix(X, y))
# First model with base_score
model_0 <- xgboost(
x = X, y = factor(y),
base_score = intercept,
objective = "binary:logistic",
nrounds = 1
)
predt_0 <- predict(model_0, X)
Apply logit() to obtain the “margin”:
# Apply logit function to obtain the "margin"
margin = np.full(y.shape, fill_value=logit(intercept), dtype=np.float32)
Xy = xgb.DMatrix(X, y, base_margin=margin)
# Second model with base_margin
# 0.2 is a dummy value to show that `base_margin` overrides `base_score`.
booster = xgb.train(
{"base_score": 0.2, "objective": "binary:logistic"},
dtrain=Xy,
num_boost_round=1,
)
predt_1 = booster.predict(Xy)
# Apply logit function to obtain the "margin"
logit_intercept <- log(intercept / (1 - intercept))
margin <- rep(logit_intercept, length(y))
# Second model with base_margin
# 0.2 is a dummy value to show that `base_margin` overrides `base_score`
model_1 <- xgboost(
x = X, y = factor(y),
base_margin = margin,
base_score = 0.2,
objective = "binary:logistic",
nrounds = 1
)
predt_1 <- predict(model_1, X, base_margin = margin)
Compare the results:
np.testing.assert_allclose(predt_0, predt_1)
all.equal(predt_0, predt_1, tolerance = 1e-6)