#########
Intercept
#########

.. versionadded:: 2.0.0

Since 2.0.0, XGBoost supports estimating the model intercept (named ``base_score``)
automatically based on targets upon training. The behavior can be controlled by setting
``base_score`` to a constant value. The following snippet disables the automatic
estimation:

.. tabs::
    .. code-tab:: py

        import xgboost as xgb

        clf = xgb.XGBClassifier(n_estimators=10)
        clf.set_params(base_score=0.5)

    .. code-tab:: r R

        library(xgboost)

        # Load built-in dataset
        data(agaricus.train, package = "xgboost")

        # Set base_score parameter directly
        model <- xgboost(
          x = agaricus.train$data,
          y = factor(agaricus.train$label),
          base_score = 0.5,
          nrounds = 10
        )

In addition, here 0.5 represents the value after applying the inverse link function. See
the end of the document for a description.

Other than the ``base_score``, users can also provide global bias via the data field
``base_margin``, which is a vector or a matrix depending on the task. With multi-output
and multi-class, the ``base_margin`` is a matrix with size ``(n_samples, n_targets)`` or
``(n_samples, n_classes)``.

.. tabs::
    .. code-tab:: py

        import xgboost as xgb
        from sklearn.datasets import make_classification

        X, y = make_classification()

        clf = xgb.XGBClassifier()
        clf.fit(X, y)
        # Request for raw prediction
        m = clf.predict(X, output_margin=True)

        clf_1 = xgb.XGBClassifier()
        # Feed the prediction into the next model
        # Using base margin overrides the base score, see below sections.
        clf_1.fit(X, y, base_margin=m)
        clf_1.predict(X, base_margin=m)

    .. code-tab:: r R

        library(xgboost)

        # Load built-in dataset
        data(agaricus.train, package = "xgboost")

        # Train first model
        model_1 <- xgboost(
          x = agaricus.train$data,
          y = factor(agaricus.train$label),
          nrounds = 10
        )

        # Request for raw prediction
        m <- predict(model_1, agaricus.train$data, type = "raw")

        # Feed the prediction into the next model using base_margin
        # Using base margin overrides the base score, see below sections.
        model_2 <- xgboost(
          x = agaricus.train$data,
          y = factor(agaricus.train$label),
          base_margin = m,
          nrounds = 10
        )

        # Make predictions with base_margin
        pred <- predict(model_2, agaricus.train$data, base_margin = m)


It specifies the bias for each sample and can be used for stacking an XGBoost model on top
of other models, see :ref:`sphx_glr_python_examples_boost_from_prediction.py` for a worked
example. When ``base_margin`` is specified, it automatically overrides the ``base_score``
parameter. If you are stacking XGBoost models, then the usage should be relatively
straightforward, with the previous model providing raw prediction and a new model using
the prediction as bias. For more customized inputs, users need to take extra care of the
link function. Let :math:`F` be the model and :math:`g` be the link function, since
``base_score`` is overridden when sample-specific ``base_margin`` is available, we will
omit it here:

.. math::

   g(E[y_i]) = F(x_i)


When base margin :math:`b` is provided, it's added to the raw model output :math:`F`:

.. math::

   g(E[y_i]) = F(x_i) + b_i

and the output of the final model is:


.. math::

   g^{-1}(F(x_i) + b_i)

Using the gamma deviance objective ``reg:gamma`` as an example, which has a log link
function, hence:

.. math::

   \ln{(E[y_i])} = F(x_i) + b_i \\
   E[y_i] = \exp{(F(x_i) + b_i)}

As a result, if you are feeding outputs from models like GLM with a corresponding
objective function, make sure the outputs are not yet transformed by the inverse link
(activation).

In the case of ``base_score`` (intercept), it can be accessed through
:py:meth:`~xgboost.Booster.save_config` after estimation. Unlike the ``base_margin``, the
returned value represents a value after applying inverse link.  With logistic regression
and the logit link function as an example, given the ``base_score`` as 0.5,
:math:`g(intercept) = logit(0.5) = 0` is added to the raw model output:

.. math::

   E[y_i] = g^{-1}{(F(x_i) + g(intercept))}

and 0.5 is the same as :math:`base\_score = g^{-1}(0) = 0.5`. This is more intuitive if
you remove the model and consider only the intercept, which is estimated before the model
is fitted:

.. math::

   E[y] = g^{-1}{(g(intercept))} \\
   E[y] = intercept

For some objectives like MAE, there are close solutions, while for others it's estimated
with one step Newton method.

******
Offset
******

The ``base_margin`` is a form of ``offset`` in GLM. Using the Poisson objective as an
example, we might want to model the rate instead of the count:

.. math::

   rate = \frac{count}{exposure}

And the offset is defined as log link applied to the exposure variable:
:math:`\ln{exposure}`. Let :math:`c` be the count and :math:`\gamma` be the exposure,
substituting the response :math:`y` in our previous formulation of base margin:

.. math::

   g(\frac{E[c_i]}{\gamma_i}) = F(x_i)

Substitute :math:`g` with :math:`\ln` for Poisson regression:

.. math::

   \ln{\frac{E[c_i]}{\gamma_i}} = F(x_i)

We have:

.. math::

   E[c_i] &= \exp{(F(x_i) + \ln{\gamma_i})} \\
   E[c_i] &= g^{-1}(F(x_i) + g(\gamma_i))

As you can see, we can use the ``base_margin`` for modeling with offset similar to GLMs

*******
Example
*******

The following example shows the relationship between ``base_score`` and ``base_margin``
using binary logistic with a `logit` link function:

.. tabs::
    .. code-tab:: py

        import numpy as np
        from scipy.special import logit
        from sklearn.datasets import make_classification

        import xgboost as xgb

        X, y = make_classification(random_state=2025)

    .. code-tab:: r R

        library(xgboost)

        # Load built-in dataset
        data(agaricus.train, package = "xgboost")
        X <- agaricus.train$data
        y <- agaricus.train$label

The intercept is a valid probability (0.5). It's used as the initial estimation of the
probability of obtaining a positive sample.

.. tabs::
    .. code-tab:: py

        intercept = 0.5

    .. code-tab:: r R

        intercept <- 0.5

First we use the intercept to train a model:

.. tabs::
    .. code-tab:: py

        booster = xgb.train(
            {"base_score": intercept, "objective": "binary:logistic"},
            dtrain=xgb.DMatrix(X, y),
            num_boost_round=1,
        )
        predt_0 = booster.predict(xgb.DMatrix(X, y))

    .. code-tab:: r R

        # First model with base_score
        model_0 <- xgboost(
          x = X, y = factor(y),
          base_score = intercept,
          objective = "binary:logistic",
          nrounds = 1
        )
        predt_0 <- predict(model_0, X)

Apply :py:func:`~scipy.special.logit` to obtain the "margin":

.. tabs::
    .. code-tab:: py

        # Apply logit function to obtain the "margin"
        margin = np.full(y.shape, fill_value=logit(intercept), dtype=np.float32)
        Xy = xgb.DMatrix(X, y, base_margin=margin)
        # Second model with base_margin
        # 0.2 is a dummy value to show that `base_margin` overrides `base_score`.
        booster = xgb.train(
            {"base_score": 0.2, "objective": "binary:logistic"},
            dtrain=Xy,
            num_boost_round=1,
        )
        predt_1 = booster.predict(Xy)

    .. code-tab:: r R

        # Apply logit function to obtain the "margin"
        logit_intercept <- log(intercept / (1 - intercept))
        margin <- rep(logit_intercept, length(y))
        # Second model with base_margin
        # 0.2 is a dummy value to show that `base_margin` overrides `base_score`
        model_1 <- xgboost(
          x = X, y = factor(y),
          base_margin = margin,
          base_score = 0.2,
          objective = "binary:logistic",
          nrounds = 1
        )
        predt_1 <- predict(model_1, X, base_margin = margin)

Compare the results:

.. tabs::
    .. code-tab:: py

        np.testing.assert_allclose(predt_0, predt_1)

    .. code-tab:: r R

        all.equal(predt_0, predt_1, tolerance = 1e-6)