This page gives the Python API reference of xgboost, please also refer to Python Package Introduction for more information about python package.
Core XGBoost Library.
xgboost.
DMatrix
(data, label=None, missing=None, weight=None, silent=False, feature_names=None, feature_types=None, nthread=None)¶Bases: object
Data Matrix used in XGBoost.
DMatrix is a internal data structure that used by XGBoost which is optimized for both memory efficiency and training speed. You can construct DMatrix from numpy.arrays
Parameters: |
|
---|
get_float_info
(field)¶Get float property from the DMatrix.
Parameters: | field (str) – The field name of the information |
---|---|
Returns: | info – a numpy array of float information of the data |
Return type: | array |
get_label
()¶Get the label of the DMatrix.
Returns: | label |
---|---|
Return type: | array |
get_uint_info
(field)¶Get unsigned integer property from the DMatrix.
Parameters: | field (str) – The field name of the information |
---|---|
Returns: | info – a numpy array of unsigned integer information of the data |
Return type: | array |
get_weight
()¶Get the weight of the DMatrix.
Returns: | weight |
---|---|
Return type: | array |
num_col
()¶Get the number of columns (features) in the DMatrix.
Returns: | number of columns |
---|---|
Return type: | int |
save_binary
(fname, silent=True)¶Save DMatrix to an XGBoost buffer.
Parameters: |
|
---|
set_base_margin
(margin)¶Set base margin of booster to start from.
This can be used to specify a prediction value of existing model to be base_margin However, remember margin is needed, instead of transformed prediction e.g. for logistic regression: need to put in value before logistic transformation see also example/demo.py
Parameters: | margin (array like) – Prediction margin of each datapoint |
---|
set_float_info
(field, data)¶Set float type property into the DMatrix.
Parameters: |
|
---|
set_float_info_npy2d
(field, data)¶Parameters: |
|
---|
set_group
(group)¶Set group size of DMatrix (used for ranking).
Parameters: | group (array like) – Group size of each group |
---|
set_label
(label)¶Set label of dmatrix
Parameters: | label (array like) – The label information to be set into DMatrix |
---|
set_label_npy2d
(label)¶Set label of dmatrix
Parameters: | label (array like) – The label information to be set into DMatrix from numpy 2D array |
---|
set_uint_info
(field, data)¶Set uint type property into the DMatrix.
Parameters: |
|
---|
set_weight
(weight)¶Set weight of each instance.
Parameters: | weight (array like) – Weight for each data point |
---|
set_weight_npy2d
(weight)¶Parameters: | weight (array like) – Weight for each data point in numpy 2D array |
---|
xgboost.
Booster
(params=None, cache=(), model_file=None)¶Bases: object
A Booster of of XGBoost.
Booster is the model of xgboost, that contains low level routines for training, prediction and evaluation.
Parameters: |
---|
attr
(key)¶Get attribute string from the Booster.
Parameters: | key (str) – The key to get attribute from. |
---|---|
Returns: | value – The attribute value of the key, returns None if attribute do not exist. |
Return type: | str |
attributes
()¶Get attributes stored in the Booster as a dictionary.
Returns: | result – Returns an empty dict if there’s no attributes. |
---|---|
Return type: | dictionary of attribute_name: attribute_value pairs of strings. |
boost
(dtrain, grad, hess)¶Boost the booster for one iteration, with customized gradient statistics.
Parameters: |
---|
copy
()¶Copy the booster object.
Returns: | booster – a copied booster model |
---|---|
Return type: | Booster |
dump_model
(fout, fmap='', with_stats=False)¶Dump model into a text file.
Parameters: |
|
---|
eval
(data, name='eval', iteration=0)¶Evaluate the model on mat.
Parameters: | |
---|---|
Returns: | result – Evaluation result string. |
Return type: |
eval_set
(evals, iteration=0, feval=None)¶Evaluate a set of data.
Parameters: | |
---|---|
Returns: | result – Evaluation result string. |
Return type: |
get_dump
(fmap='', with_stats=False, dump_format='text')¶Returns the dump the model as a list of strings.
get_fscore
(fmap='')¶Get feature importance of each feature.
Parameters: | fmap (str (optional)) – The name of feature map file |
---|
get_score
(fmap='', importance_type='weight')¶Get feature importance of each feature. Importance type can be defined as:
Parameters: |
---|
get_split_value_histogram
(feature, fmap='', bins=None, as_pandas=True)¶Get split value histogram of a feature
Parameters: |
|
---|---|
Returns: |
|
load_model
(fname)¶Load the model from a file.
The model is loaded from an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature_names) will not be loaded. To preserve all attributes, pickle the Booster object.
Parameters: | fname (string or a memory buffer) – Input file name or memory buffer(see also save_raw) |
---|
load_rabit_checkpoint
()¶Initialize the model by load from rabit checkpoint.
Returns: | version – The version number of the model. |
---|---|
Return type: | integer |
predict
(data, output_margin=False, ntree_limit=0, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True)¶Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call bst.copy()
to make copies
of model object and then call predict()
.
Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
Parameters: |
|
---|---|
Returns: | prediction |
Return type: | numpy array |
save_model
(fname)¶Save the model to a file.
The model is saved in an XGBoost internal binary format which is universal among the various XGBoost interfaces. Auxiliary attributes of the Python Booster object (such as feature_names) will not be saved. To preserve all attributes, pickle the Booster object.
Parameters: | fname (string) – Output file name |
---|
save_rabit_checkpoint
()¶Save the current booster to rabit checkpoint.
save_raw
()¶Save the model to a in memory buffer representation
Returns: | |
---|---|
Return type: | a in memory buffer representation of the model |
set_attr
(**kwargs)¶Set the attribute of the Booster.
Parameters: | **kwargs – The attributes to set. Setting a value to None deletes an attribute. |
---|
set_param
(params, value=None)¶Set parameters into the Booster.
Parameters: |
|
---|
Training Library containing training routines.
xgboost.
train
(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None, evals_result=None, verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None)¶Train a booster with given parameters.
Parameters: |
|
---|---|
Returns: | booster |
Return type: | a trained booster model |
xgboost.
cv
(params, dtrain, num_boost_round=10, nfold=3, stratified=False, folds=None, metrics=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None, show_stdv=True, seed=0, callbacks=None, shuffle=True)¶Cross-validation with given parameters.
Parameters: |
|
---|---|
Returns: | evaluation history |
Return type: | list(string) |
Scikit-Learn Wrapper interface for XGBoost.
xgboost.
XGBRegressor
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)¶Bases: xgboost.sklearn.XGBModel
, object
Implementation of the scikit-learn API for XGBoost regression.
Parameters: |
|
---|
Note
A custom objective function can be provided for the objective
parameter. In this case, it should have the signature
objective(y_true, y_pred) -> grad, hess
:
apply
(X, ntree_limit=0)¶Return the predicted leaf every tree for each sample.
Parameters: |
|
---|---|
Returns: | X_leaves – For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within
|
Return type: | array_like, shape=[n_samples, n_trees] |
evals_result
()¶Return the evaluation results.
If eval_set
is passed to the fit function, you can call evals_result()
to
get evaluation results for all passed eval_sets. When eval_metric
is also
passed to the fit
function, the evals_result
will contain the eval_metrics
passed to the fit
function
Returns: | evals_result |
---|---|
Return type: | dictionary |
Example
param_dist = {'objective':'binary:logistic', 'n_estimators':2}
clf = xgb.XGBModel(**param_dist)
clf.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
eval_metric='logloss',
verbose=True)
evals_result = clf.evals_result()
The variable evals_result will contain:
{'validation_0': {'logloss': ['0.604835', '0.531479']},
'validation_1': {'logloss': ['0.41965', '0.17686']}}
feature_importances_
¶Feature importances property
Returns: | feature_importances_ |
---|---|
Return type: | array of shape [n_features] |
fit
(X, y, sample_weight=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None)¶Fit the gradient boosting model
Parameters: |
|
---|
get_booster
()¶Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called
Returns: | booster |
---|---|
Return type: | a xgboost booster of underlying model |
get_params
(deep=False)¶Get parameters.
get_xgb_params
()¶Get xgboost type parameters.
load_model
(fname)¶Load the model from a file.
Parameters: | fname (string or a memory buffer) – Input file name or memory buffer(see also save_raw) |
---|
predict
(data, output_margin=False, ntree_limit=None)¶Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy()
to make copies
of model object and then call predict()
.
Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
Parameters: |
|
---|---|
Returns: | prediction |
Return type: | numpy array |
save_model
(fname)¶Save the model to a file.
Parameters: | fname (string) – Output file name |
---|
xgboost.
XGBClassifier
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)¶Bases: xgboost.sklearn.XGBModel
, object
Implementation of the scikit-learn API for XGBoost classification.
Parameters: |
|
---|
Note
A custom objective function can be provided for the objective
parameter. In this case, it should have the signature
objective(y_true, y_pred) -> grad, hess
:
apply
(X, ntree_limit=0)¶Return the predicted leaf every tree for each sample.
Parameters: |
|
---|---|
Returns: | X_leaves – For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within
|
Return type: | array_like, shape=[n_samples, n_trees] |
evals_result
()¶Return the evaluation results.
If eval_set is passed to the fit function, you can call evals_result() to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function
Returns: | evals_result |
---|---|
Return type: | dictionary |
Example
param_dist = {'objective':'binary:logistic', 'n_estimators':2}
clf = xgb.XGBClassifier(**param_dist)
clf.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
eval_metric='logloss',
verbose=True)
evals_result = clf.evals_result()
The variable evals_result
will contain
{'validation_0': {'logloss': ['0.604835', '0.531479']},
'validation_1': {'logloss': ['0.41965', '0.17686']}}
feature_importances_
¶Feature importances property
Returns: | feature_importances_ |
---|---|
Return type: | array of shape [n_features] |
fit
(X, y, sample_weight=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None)¶Fit gradient boosting classifier
Parameters: |
|
---|
get_booster
()¶Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called
Returns: | booster |
---|---|
Return type: | a xgboost booster of underlying model |
get_params
(deep=False)¶Get parameters.
get_xgb_params
()¶Get xgboost type parameters.
load_model
(fname)¶Load the model from a file.
Parameters: | fname (string or a memory buffer) – Input file name or memory buffer(see also save_raw) |
---|
predict
(data, output_margin=False, ntree_limit=None)¶Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy()
to make copies
of model object and then call predict()
.
Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
Parameters: |
|
---|---|
Returns: | prediction |
Return type: | numpy array |
predict_proba
(data, ntree_limit=None)¶Predict the probability of each data example being of a given class.
Note
This function is not thread safe
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy()
to make copies
of model object and then call predict
Parameters: | |
---|---|
Returns: | prediction – a numpy array with the probability of each data example being of a given class. |
Return type: | numpy array |
save_model
(fname)¶Save the model to a file.
Parameters: | fname (string) – Output file name |
---|
Plotting Library.
xgboost.
plot_importance
(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='F score', ylabel='Features', importance_type='weight', max_num_features=None, grid=True, show_values=True, **kwargs)¶Plot importance based on fitted trees.
Parameters: |
|
---|---|
Returns: | ax |
Return type: | matplotlib Axes |
xgboost.
plot_tree
(booster, fmap='', num_trees=0, rankdir='UT', ax=None, **kwargs)¶Plot specified tree.
Parameters: |
|
---|---|
Returns: | ax |
Return type: | matplotlib Axes |
xgboost.
to_graphviz
(booster, fmap='', num_trees=0, rankdir='UT', yes_color='#0000FF', no_color='#FF0000', **kwargs)¶Convert specified tree to graphviz instance. IPython can automatically plot the returned graphiz instance. Otherwise, you should call .render() method of the returned graphiz instance.
Parameters: |
|
---|---|
Returns: | ax |
Return type: | matplotlib Axes |