This page gives the Python API reference of xgboost, please also refer to Python Package Introduction for more information about python package.
Core XGBoost Library.
xgboost.
DMatrix
(data, label=None, missing=None, weight=None, silent=False, feature_names=None, feature_types=None, nthread=None)¶Bases: object
Data Matrix used in XGBoost.
DMatrix is a internal data structure that used by XGBoost which is optimized for both memory efficiency and training speed. You can construct DMatrix from numpy.arrays
Parameters: 


get_float_info
(field)¶Get float property from the DMatrix.
Parameters:  field (str) – The field name of the information 

Returns:  info – a numpy array of float information of the data 
Return type:  array 
get_label
()¶Get the label of the DMatrix.
Returns:  label 

Return type:  array 
get_uint_info
(field)¶Get unsigned integer property from the DMatrix.
Parameters:  field (str) – The field name of the information 

Returns:  info – a numpy array of float information of the data 
Return type:  array 
get_weight
()¶Get the weight of the DMatrix.
Returns:  weight 

Return type:  array 
num_col
()¶Get the number of columns (features) in the DMatrix.
Returns:  number of columns 

Return type:  int 
save_binary
(fname, silent=True)¶Save DMatrix to an XGBoost buffer.
Parameters: 


set_base_margin
(margin)¶Set base margin of booster to start from.
This can be used to specify a prediction value of existing model to be base_margin However, remember margin is needed, instead of transformed prediction e.g. for logistic regression: need to put in value before logistic transformation see also example/demo.py
Parameters:  margin (array like) – Prediction margin of each datapoint 

set_float_info
(field, data)¶Set float type property into the DMatrix.
Parameters: 


set_float_info_npy2d
(field, data)¶Parameters: 


set_group
(group)¶Set group size of DMatrix (used for ranking).
Parameters:  group (array like) – Group size of each group 

set_label
(label)¶Set label of dmatrix
Parameters:  label (array like) – The label information to be set into DMatrix 

set_label_npy2d
(label)¶Set label of dmatrix
Parameters:  label (array like) – The label information to be set into DMatrix from numpy 2D array 

set_uint_info
(field, data)¶Set uint type property into the DMatrix.
Parameters: 


set_weight
(weight)¶Set weight of each instance.
Parameters:  weight (array like) – Weight for each data point 

set_weight_npy2d
(weight)¶Parameters:  weight (array like) – Weight for each data point in numpy 2D array 

xgboost.
Booster
(params=None, cache=(), model_file=None)¶Bases: object
A Booster of of XGBoost.
Booster is the model of xgboost, that contains low level routines for training, prediction and evaluation.
Parameters: 

attr
(key)¶Get attribute string from the Booster.
Parameters:  key (str) – The key to get attribute from. 

Returns:  value – The attribute value of the key, returns None if attribute do not exist. 
Return type:  str 
attributes
()¶Get attributes stored in the Booster as a dictionary.
Returns:  result – Returns an empty dict if there’s no attributes. 

Return type:  dictionary of attribute_name: attribute_value pairs of strings. 
boost
(dtrain, grad, hess)¶Boost the booster for one iteration, with customized gradient statistics.
Parameters: 

copy
()¶Copy the booster object.
Returns:  booster – a copied booster model 

Return type:  Booster 
dump_model
(fout, fmap='', with_stats=False)¶Dump model into a text file.
Parameters: 


eval
(data, name='eval', iteration=0)¶Evaluate the model on mat.
Parameters:  

Returns:  result – Evaluation result string. 
Return type: 
eval_set
(evals, iteration=0, feval=None)¶Evaluate a set of data.
Parameters:  

Returns:  result – Evaluation result string. 
Return type: 
get_dump
(fmap='', with_stats=False, dump_format='text')¶Returns the dump the model as a list of strings.
get_fscore
(fmap='')¶Get feature importance of each feature.
Parameters:  fmap (str (optional)) – The name of feature map file 

get_score
(fmap='', importance_type='weight')¶Get feature importance of each feature. Importance type can be defined as:
Parameters:  fmap (str (optional)) – The name of feature map file 

get_split_value_histogram
(feature, fmap='', bins=None, as_pandas=True)¶Get split value histogram of a feature
Parameters: 


Returns: 

load_model
(fname)¶Load the model from a file.
Parameters:  fname (string or a memory buffer) – Input file name or memory buffer(see also save_raw) 

load_rabit_checkpoint
()¶Initialize the model by load from rabit checkpoint.
Returns:  version – The version number of the model. 

Return type:  integer 
predict
(data, output_margin=False, ntree_limit=0, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True)¶Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call bst.copy()
to make copies
of model object and then call predict()
.
Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
Parameters: 


Returns:  prediction 
Return type:  numpy array 
save_model
(fname)¶Save the model to a file.
Parameters:  fname (string) – Output file name 

save_rabit_checkpoint
()¶Save the current booster to rabit checkpoint.
save_raw
()¶Save the model to a in memory buffer representation
Returns:  

Return type:  a in memory buffer representation of the model 
set_attr
(**kwargs)¶Set the attribute of the Booster.
Parameters:  **kwargs – The attributes to set. Setting a value to None deletes an attribute. 

set_param
(params, value=None)¶Set parameters into the Booster.
Parameters: 


Training Library containing training routines.
xgboost.
train
(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None, evals_result=None, verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None)¶Train a booster with given parameters.
Parameters: 


Returns:  booster 
Return type:  a trained booster model 
xgboost.
cv
(params, dtrain, num_boost_round=10, nfold=3, stratified=False, folds=None, metrics=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None, show_stdv=True, seed=0, callbacks=None, shuffle=True)¶Crossvalidation with given parameters.
Parameters: 


Returns:  evaluation history 
Return type:  list(string) 
ScikitLearn Wrapper interface for XGBoost.
xgboost.
XGBRegressor
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)¶Bases: xgboost.sklearn.XGBModel
, object
Implementation of the scikitlearn API for XGBoost regression.
Parameters: 


Note
A custom objective function can be provided for the objective
parameter. In this case, it should have the signature
objective(y_true, y_pred) > grad, hess
:
apply
(X, ntree_limit=0)¶Return the predicted leaf every tree for each sample.
Parameters: 


Returns:  X_leaves – For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within

Return type:  array_like, shape=[n_samples, n_trees] 
evals_result
()¶Return the evaluation results.
If eval_set
is passed to the fit function, you can call evals_result()
to
get evaluation results for all passed eval_sets. When eval_metric
is also
passed to the fit
function, the evals_result
will contain the eval_metrics
passed to the fit
function
Returns:  evals_result 

Return type:  dictionary 
Example
param_dist = {'objective':'binary:logistic', 'n_estimators':2}
clf = xgb.XGBModel(**param_dist)
clf.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
eval_metric='logloss',
verbose=True)
evals_result = clf.evals_result()
The variable evals_result will contain:
{'validation_0': {'logloss': ['0.604835', '0.531479']},
'validation_1': {'logloss': ['0.41965', '0.17686']}}
feature_importances_
¶Feature importances property
Returns:  feature_importances_ 

Return type:  array of shape [n_features] 
fit
(X, y, sample_weight=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None)¶Fit the gradient boosting model
Parameters: 


get_booster
()¶Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called
Returns:  booster 

Return type:  a xgboost booster of underlying model 
get_params
(deep=False)¶Get parameters.
get_xgb_params
()¶Get xgboost type parameters.
predict
(data, output_margin=False, ntree_limit=0)¶Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy()
to make copies
of model object and then call predict()
.
Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
Parameters:  

Returns:  prediction 
Return type:  numpy array 
xgboost.
XGBClassifier
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)¶Bases: xgboost.sklearn.XGBModel
, object
Implementation of the scikitlearn API for XGBoost classification.
Parameters: 


Note
A custom objective function can be provided for the objective
parameter. In this case, it should have the signature
objective(y_true, y_pred) > grad, hess
:
apply
(X, ntree_limit=0)¶Return the predicted leaf every tree for each sample.
Parameters: 


Returns:  X_leaves – For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within

Return type:  array_like, shape=[n_samples, n_trees] 
evals_result
()¶Return the evaluation results.
If eval_set is passed to the fit function, you can call evals_result() to get evaluation results for all passed eval_sets. When eval_metric is also passed to the fit function, the evals_result will contain the eval_metrics passed to the fit function
Returns:  evals_result 

Return type:  dictionary 
Example
param_dist = {'objective':'binary:logistic', 'n_estimators':2}
clf = xgb.XGBClassifier(**param_dist)
clf.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
eval_metric='logloss',
verbose=True)
evals_result = clf.evals_result()
The variable evals_result
will contain
{'validation_0': {'logloss': ['0.604835', '0.531479']},
'validation_1': {'logloss': ['0.41965', '0.17686']}}
feature_importances_
¶Feature importances property
Returns:  feature_importances_ 

Return type:  array of shape [n_features] 
fit
(X, y, sample_weight=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None)¶Fit gradient boosting classifier
Parameters: 


get_booster
()¶Get the underlying xgboost Booster of this model.
This will raise an exception when fit was not called
Returns:  booster 

Return type:  a xgboost booster of underlying model 
get_params
(deep=False)¶Get parameters.
get_xgb_params
()¶Get xgboost type parameters.
predict
(data, output_margin=False, ntree_limit=0)¶Predict with data.
Note
This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy()
to make copies
of model object and then call predict()
.
Note
Using predict()
with DART booster
If the booster object is DART type, predict()
will perform dropouts, i.e. only
some of the trees will be evaluated. This will produce incorrect results if data
is
not the training data. To obtain correct results on test sets, set ntree_limit
to
a nonzero value, e.g.
preds = bst.predict(dtest, ntree_limit=num_round)
Parameters:  

Returns:  prediction 
Return type:  numpy array 
predict_proba
(data, ntree_limit=0)¶Predict the probability of each data example being of a given class.
Note
This function is not thread safe
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy()
to make copies
of model object and then call predict
Parameters:  

Returns:  prediction – a numpy array with the probability of each data example being of a given class. 
Return type:  numpy array 
Plotting Library.
xgboost.
plot_importance
(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='F score', ylabel='Features', importance_type='weight', max_num_features=None, grid=True, show_values=True, **kwargs)¶Plot importance based on fitted trees.
Parameters: 


Returns:  ax 
Return type:  matplotlib Axes 
xgboost.
plot_tree
(booster, fmap='', num_trees=0, rankdir='UT', ax=None, **kwargs)¶Plot specified tree.
Parameters: 


Returns:  ax 
Return type:  matplotlib Axes 
xgboost.
to_graphviz
(booster, fmap='', num_trees=0, rankdir='UT', yes_color='#0000FF', no_color='#FF0000', **kwargs)¶Convert specified tree to graphviz instance. IPython can automatically plot the returned graphiz instance. Otherwise, you should call .render() method of the returned graphiz instance.
Parameters: 


Returns:  ax 
Return type:  matplotlib Axes 