When it comes to serializing XGBoost models, it's possible to use R serializers such as
save()
or saveRDS()
to serialize an XGBoost model object, but XGBoost also provides
its own serializers with better compatibility guarantees, which allow loading
said models in other language bindings of XGBoost.
Note that an xgb.Booster
object (as produced by xgb.train()
, see rest of the doc
for objects produced by xgboost()
), outside of its core components, might also keep:
Additional model configuration (accessible through
xgb.config()
), which includes model fitting parameters likemax_depth
and runtime parameters likenthread
. These are not necessarily useful for prediction/importance/plotting.Additional R specific attributes - e.g. results of callbacks, such as evaluation logs, which are kept as a
data.table
object, accessible throughattributes(model)$evaluation_log
if present.
The first one (configurations) does not have the same compatibility guarantees as
the model itself, including attributes that are set and accessed through
xgb.attributes()
- that is, such configuration might be lost after loading the
booster in a different XGBoost version, regardless of the serializer that was used.
These are saved when using saveRDS()
, but will be discarded if loaded into an
incompatible XGBoost version. They are not saved when using XGBoost's
serializers from its public interface including xgb.save()
and xgb.save.raw()
.
The second ones (R attributes) are not part of the standard XGBoost model structure, and thus are not saved when using XGBoost's own serializers. These attributes are only used for informational purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R call that produced the model, but are otherwise not used for prediction / importance / plotting / etc. These R attributes are only preserved when using R's serializers.
In addition to the regular xgb.Booster
objects produced by xgb.train()
, the
function xgboost()
produces objects with a different subclass xgboost
(which
inherits from xgb.Booster
), which keeps other additional metadata as R attributes
such as class names in classification problems, and which has a dedicated predict
method that uses different defaults and takes different argument names. XGBoost's
own serializers can work with this xgboost
class, but as they do not keep R
attributes, the resulting object, when deserialized, is downcasted to the regular
xgb.Booster
class (i.e. it loses the metadata, and the resulting object will use
predict.xgb.Booster()
instead of predict.xgboost()
) - for these xgboost
objects,
saveRDS
might thus be a better option if the extra functionalities are needed.
Note that XGBoost models in R starting from version 2.1.0
and onwards, and
XGBoost models before version 2.1.0
; have a very different R object structure and
are incompatible with each other. Hence, models that were saved with R serializers
like saveRDS()
or save()
before version 2.1.0
will not work with latter
xgboost
versions and vice versa. Be aware that the structure of R model objects
could in theory change again in the future, so XGBoost's serializers should be
preferred for long-term storage.
Furthermore, note that model objects from XGBoost might not be serializable with third-party
R packages like qs
or qs2
.
Details
Use xgb.save()
to save the XGBoost model as a stand-alone file. You may opt into
the JSON format by specifying the JSON extension. To read the model back, use
xgb.load()
.
Use xgb.save.raw()
to save the XGBoost model as a sequence (vector) of raw bytes
in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
re-construct the corresponding model. To read the model back, use xgb.load.raw()
.
The xgb.save.raw()
function is useful if you would like to persist the XGBoost model
as part of another R object.
Use saveRDS()
if you require the R-specific attributes that a booster might have, such
as evaluation logs or the model class xgboost
instead of xgb.Booster
, but note that
future compatibility of such objects is outside XGBoost's control as it relies on R's
serialization format (see e.g. the details section in serialize and save()
from base R).
For more details and explanation about model persistence and archival, consult the page https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html.
Examples
data(agaricus.train, package = "xgboost")
bst <- xgb.train(
data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label, nthread = 1),
nrounds = 2,
params = xgb.params(
max_depth = 2,
nthread = 2,
objective = "binary:logistic"
)
)
# Save as a stand-alone file; load it with xgb.load()
fname <- file.path(tempdir(), "xgb_model.ubj")
xgb.save(bst, fname)
bst2 <- xgb.load(fname)
# Save as a stand-alone file (JSON); load it with xgb.load()
fname <- file.path(tempdir(), "xgb_model.json")
xgb.save(bst, fname)
bst2 <- xgb.load(fname)
# Save as a raw byte vector; load it with xgb.load.raw()
xgb_bytes <- xgb.save.raw(bst)
bst2 <- xgb.load.raw(xgb_bytes)
# Persist XGBoost model as part of another R object
obj <- list(xgb_model_bytes = xgb.save.raw(bst), description = "My first XGBoost model")
# Persist the R object. Here, saveRDS() is okay, since it doesn't persist
# xgb.Booster directly. What's being persisted is the future-proof byte representation
# as given by xgb.save.raw().
fname <- file.path(tempdir(), "my_object.Rds")
saveRDS(obj, fname)
# Read back the R object
obj2 <- readRDS(fname)
# Re-construct xgb.Booster object from the bytes
bst2 <- xgb.load.raw(obj2$xgb_model_bytes)