Python API Reference
This page gives the Python API reference of xgboost, please also refer to Python Package Introduction for more information about python package.
Core Data Structure
Core XGBoost Library.

class
xgboost.
DMatrix
(data, label=None, missing=None, weight=None, silent=False, feature_names=None, feature_types=None, nthread=None)
Bases: object
Data Matrix used in XGBoost.
DMatrix is a internal data structure that used by XGBoost
which is optimized for both memory efficiency and training speed.
You can construct DMatrix from numpy.arrays
Parameters: 
 data (string/numpy array/scipy.sparse/pd.DataFrame/DataTable) – Data source of DMatrix.
When data is string type, it represents the path libsvm format txt file,
or binary file that xgboost can read from.
 label (list or numpy 1D array, optional) – Label of the training data.
 missing (float, optional) – Value in the data which needs to be present as a missing value. If
None, defaults to np.nan.
 weight (list or numpy 1D array , optional) – Weight for each instance.
 silent (boolean, optional) – Whether print messages during construction
 feature_names (list, optional) – Set names for features.
 feature_types (list, optional) – Set types for features.
 nthread (integer, optional) – Number of threads to use for loading data from numpy array. If 1,
uses maximum threads available on the system.


feature_names
Get feature names (column labels).
Returns:  feature_names 
Return type:  list or None 

feature_types
Get feature types (column types).
Returns:  feature_types 
Return type:  list or None 

get_base_margin
()
Get the base margin of the DMatrix.
Returns:  base_margin 
Return type:  float 

get_float_info
(field)
Get float property from the DMatrix.
Parameters:  field (str) – The field name of the information 
Returns:  info – a numpy array of float information of the data 
Return type:  array 

get_label
()
Get the label of the DMatrix.
Returns:  label 
Return type:  array 

get_uint_info
(field)
Get unsigned integer property from the DMatrix.
Parameters:  field (str) – The field name of the information 
Returns:  info – a numpy array of unsigned integer information of the data 
Return type:  array 

get_weight
()
Get the weight of the DMatrix.
Returns:  weight 
Return type:  array 

num_col
()
Get the number of columns (features) in the DMatrix.
Returns:  number of columns 
Return type:  int 

num_row
()
Get the number of rows in the DMatrix.
Returns:  number of rows 
Return type:  int 

save_binary
(fname, silent=True)
Save DMatrix to an XGBoost buffer.
Parameters: 
 fname (string) – Name of the output buffer file.
 silent (bool (optional; default: True)) – If set, the output is suppressed.


set_base_margin
(margin)
Set base margin of booster to start from.
This can be used to specify a prediction value of
existing model to be base_margin
However, remember margin is needed, instead of transformed prediction
e.g. for logistic regression: need to put in value before logistic transformation
see also example/demo.py
Parameters:  margin (array like) – Prediction margin of each datapoint 

set_float_info
(field, data)
Set float type property into the DMatrix.
Parameters: 
 field (str) – The field name of the information
 data (numpy array) – The array of data to be set


set_float_info_npy2d
(field, data)
 Set float type property into the DMatrix
 for numpy 2d array input
Parameters: 
 field (str) – The field name of the information
 data (numpy array) – The array of data to be set


set_group
(group)
Set group size of DMatrix (used for ranking).
Parameters:  group (array like) – Group size of each group 

set_label
(label)
Set label of dmatrix
Parameters:  label (array like) – The label information to be set into DMatrix 

set_label_npy2d
(label)
Set label of dmatrix
Parameters:  label (array like) – The label information to be set into DMatrix
from numpy 2D array 

set_uint_info
(field, data)
Set uint type property into the DMatrix.
Parameters: 
 field (str) – The field name of the information
 data (numpy array) – The array of data to be set


set_weight
(weight)
Set weight of each instance.
Parameters:  weight (array like) – Weight for each data point 

set_weight_npy2d
(weight)
 Set weight of each instance
 for numpy 2D array
Parameters:  weight (array like) – Weight for each data point in numpy 2D array 

slice
(rindex)
Slice the DMatrix and return a new DMatrix that only contains rindex.
Parameters:  rindex (list) – List of indices to be selected. 
Returns:  res – A new DMatrix containing only selected indices. 
Return type:  DMatrix 

class
xgboost.
Booster
(params=None, cache=(), model_file=None)
Bases: object
A Booster of of XGBoost.
Booster is the model of xgboost, that contains low level routines for
training, prediction and evaluation.
Parameters: 
 params (dict) – Parameters for boosters.
 cache (list) – List of cache items.
 model_file (string) – Path to the model file.


attr
(key)
Get attribute string from the Booster.
Parameters:  key (str) – The key to get attribute from. 
Returns:  value – The attribute value of the key, returns None if attribute do not exist. 
Return type:  str 

attributes
()
Get attributes stored in the Booster as a dictionary.
Returns:  result – Returns an empty dict if there’s no attributes. 
Return type:  dictionary of attribute_name: attribute_value pairs of strings. 

boost
(dtrain, grad, hess)
Boost the booster for one iteration, with customized gradient statistics.
Parameters: 
 dtrain (DMatrix) – The training DMatrix.
 grad (list) – The first order of gradient.
 hess (list) – The second order of gradient.


copy
()
Copy the booster object.
Returns:  booster – a copied booster model 
Return type:  Booster 

dump_model
(fout, fmap='', with_stats=False)
Dump model into a text file.
Parameters: 
 foout (string) – Output file name.
 fmap (string, optional) – Name of the file containing feature map names.
 with_stats (bool (optional)) – Controls whether the split statistics are output.


eval
(data, name='eval', iteration=0)
Evaluate the model on mat.
Parameters: 
 data (DMatrix) – The dmatrix storing the input.
 name (str, optional) – The name of the dataset.
 iteration (int, optional) – The current iteration number.

Returns:  result – Evaluation result string.

Return type:  str


eval_set
(evals, iteration=0, feval=None)
Evaluate a set of data.
Parameters: 
 evals (list of tuples (DMatrix, string)) – List of items to be evaluated.
 iteration (int) – Current iteration.
 feval (function) – Custom evaluation function.

Returns:  result – Evaluation result string.

Return type:  str


get_dump
(fmap='', with_stats=False, dump_format='text')
Returns the dump the model as a list of strings.

get_fscore
(fmap='')
Get feature importance of each feature.
Parameters:  fmap (str (optional)) – The name of feature map file 

get_score
(fmap='', importance_type='weight')
Get feature importance of each feature.
Importance type can be defined as:
‘weight’  the number of times a feature is used to split the data across all trees.
‘gain’  the average gain of the feature when it is used in trees
‘cover’  the average coverage of the feature when it is used in trees
Parameters:  fmap (str (optional)) – The name of feature map file 

get_split_value_histogram
(feature, fmap='', bins=None, as_pandas=True)
Get split value histogram of a feature
:param feature: The name of the feature.
:type feature: str
:param fmap: The name of feature map file.
:type fmap: str (optional)
:param bin: The maximum number of bins.
Number of bins equals number of unique split values n_unique,
if bins == None or bins > n_unique.
Parameters:  as_pandas (bool, default True) – Return pd.DataFrame when pandas is installed.
If False or pandas is not installed, return numpy ndarray. 
Returns: 
 a histogram of used splitting values for the specified feature
 either as numpy array or pandas DataFrame.


load_model
(fname)
Load the model from a file.
The model is loaded from an XGBoost internal binary format which is
universal among the various XGBoost interfaces. Auxiliary attributes of
the Python Booster object (such as feature_names) will not be loaded.
To preserve all attributes, pickle the Booster object.
Parameters:  fname (string or a memory buffer) – Input file name or memory buffer(see also save_raw) 

load_rabit_checkpoint
()
Initialize the model by load from rabit checkpoint.
Returns:  version – The version number of the model. 
Return type:  integer 

predict
(data, output_margin=False, ntree_limit=0, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True)
Predict with data.
 NOTE: This function is not thread safe.
 For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call bst.copy() to make copies
of model object and then call predict
Parameters: 
 data (DMatrix) – The dmatrix storing the input.
 output_margin (bool) – Whether to output the raw untransformed margin value.
 ntree_limit (int) – Limit number of trees in the prediction; defaults to 0 (use all trees).
 pred_leaf (bool) – When this option is on, the output will be a matrix of (nsample, ntrees)
with each record indicating the predicted leaf index of each sample in each tree.
Note that the leaf index of a tree is unique per tree, so you may find leaf 1
in both tree 1 and tree 0.
 pred_contribs (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1)
with each record indicating the feature contributions (SHAP values) for that
prediction. The sum of all feature contributions is equal to the raw untransformed
margin value of the prediction. Note the final column is the bias term.
 approx_contribs (bool) – Approximate the contributions of each feature
 pred_interactions (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1, nfeats + 1)
indicating the SHAP interaction values for each pair of features. The sum of each
row (or column) of the interaction values equals the corresponding SHAP value (from
pred_contribs), and the sum of the entire matrix equals the raw untransformed margin
value of the prediction. Note the last row and column correspond to the bias term.
 validate_features (bool) – When this is True, validate that the Booster’s and data’s feature_names are identical.
Otherwise, it is assumed that the feature_names are the same.

Returns:  prediction

Return type:  numpy array


save_model
(fname)
Save the model to a file.
The model is saved in an XGBoost internal binary format which is
universal among the various XGBoost interfaces. Auxiliary attributes of
the Python Booster object (such as feature_names) will not be saved.
To preserve all attributes, pickle the Booster object.
Parameters:  fname (string) – Output file name 

save_rabit_checkpoint
()
Save the current booster to rabit checkpoint.

save_raw
()
Save the model to a in memory buffer representation
Returns:  
Return type:  a in memory buffer representation of the model 

set_attr
(**kwargs)
Set the attribute of the Booster.
Parameters:  **kwargs – The attributes to set. Setting a value to None deletes an attribute. 

set_param
(params, value=None)
Set parameters into the Booster.
Parameters: 
 params (dict/list/str) – list of key,value pairs, dict of key to value or simply str key
 value (optional) – value of the specified parameter, when params is str key


update
(dtrain, iteration, fobj=None)
Update for one iteration, with objective function calculated internally.
Parameters: 
 dtrain (DMatrix) – Training data.
 iteration (int) – Current iteration number.
 fobj (function) – Customized objective function.

Learning API
Training Library containing training routines.

xgboost.
train
(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None, evals_result=None, verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None)
Train a booster with given parameters.
Parameters: 
 params (dict) – Booster params.
 dtrain (DMatrix) – Data to be trained.
 num_boost_round (int) – Number of boosting iterations.
 evals (list of pairs (DMatrix, string)) – List of items to be evaluated during training, this allows user to watch
performance on the validation set.
 obj (function) – Customized objective function.
 feval (function) – Customized evaluation function.
 maximize (bool) – Whether to maximize feval.
 early_stopping_rounds (int) – Activates early stopping. Validation error needs to decrease at least
every <early_stopping_rounds> round(s) to continue training.
Requires at least one item in evals.
If there’s more than one, will use the last.
Returns the model from the last iteration (not the best one).
If early stopping occurs, the model will have three additional fields:
bst.best_score, bst.best_iteration and bst.best_ntree_limit.
(Use bst.best_ntree_limit to get the correct value if num_parallel_tree
and/or num_class appears in the parameters)
 evals_result (dict) –
This dictionary stores the evaluation results of all the items in watchlist.
Example: with a watchlist containing [(dtest,’eval’), (dtrain,’train’)] and
a parameter containing (‘eval_metric’: ‘logloss’)
Returns: {‘train’: {‘logloss’: [‘0.48253’, ‘0.35953’]},
’eval’: {‘logloss’: [‘0.480385’, ‘0.357756’]}}
 verbose_eval (bool or int) – Requires at least one item in evals.
If verbose_eval is True then the evaluation metric on the validation set is
printed at each boosting stage.
If verbose_eval is an integer then the evaluation metric on the validation set
is printed at every given verbose_eval boosting stage. The last boosting stage
/ the boosting stage found by using early_stopping_rounds is also printed.
Example: with verbose_eval=4 and at least one item in evals, an evaluation metric
is printed every 4 boosting stages, instead of every boosting stage.
 learning_rates (list or function (deprecated  use callback API instead)) – List of learning rate for each boosting round
or a customized function that calculates eta in terms of
current number of round and the total number of boosting round (e.g. yields
learning rate decay)
 xgb_model (file name of stored xgb model or 'Booster' instance) – Xgb model to be loaded before training (allows training continuation).
 callbacks (list of callback functions) – List of callback functions that are applied at end of each iteration.
It is possible to use predefined callbacks by using xgb.callback module.
Example: [xgb.callback.reset_learning_rate(custom_rates)]

Returns:  booster

Return type:  a trained booster model


xgboost.
cv
(params, dtrain, num_boost_round=10, nfold=3, stratified=False, folds=None, metrics=(), obj=None, feval=None, maximize=False, early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None, show_stdv=True, seed=0, callbacks=None, shuffle=True)
Crossvalidation with given parameters.
Parameters: 
 params (dict) – Booster params.
 dtrain (DMatrix) – Data to be trained.
 num_boost_round (int) – Number of boosting iterations.
 nfold (int) – Number of folds in CV.
 stratified (bool) – Perform stratified sampling.
 folds (a KFold or StratifiedKFold instance or list of fold indices) – Sklearn KFolds or StratifiedKFolds object.
Alternatively may explicitly pass sample indices for each fold.
For n folds, folds should be a length n list of tuples.
Each tuple is (in,out) where in is a list of indices to be used
as the training samples for the n`th fold and `out is a list of
indices to be used as the testing samples for the `n`th fold.
 metrics (string or list of strings) – Evaluation metrics to be watched in CV.
 obj (function) – Custom objective function.
 feval (function) – Custom evaluation function.
 maximize (bool) – Whether to maximize feval.
 early_stopping_rounds (int) – Activates early stopping. CV error needs to decrease at least
every <early_stopping_rounds> round(s) to continue.
Last entry in evaluation history is the one from best iteration.
 fpreproc (function) – Preprocessing function that takes (dtrain, dtest, param) and returns
transformed versions of those.
 as_pandas (bool, default True) – Return pd.DataFrame when pandas is installed.
If False or pandas is not installed, return np.ndarray
 verbose_eval (bool, int, or None, default None) – Whether to display the progress. If None, progress will be displayed
when np.ndarray is returned. If True, progress will be displayed at
boosting stage. If an integer is given, progress will be displayed
at every given verbose_eval boosting stage.
 show_stdv (bool, default True) – Whether to display the standard deviation in progress.
Results are not affected, and always contains std.
 seed (int) – Seed used to generate the folds (passed to numpy.random.seed).
 callbacks (list of callback functions) –
 List of callback functions that are applied at end of each iteration.
 It is possible to use predefined callbacks by using xgb.callback module.
Example: [xgb.callback.reset_learning_rate(custom_rates)]
 shuffle : bool
 Shuffle data before creating folds.

Returns:  evaluation history

Return type:  list(string)

ScikitLearn API
ScikitLearn Wrapper interface for XGBoost.

class
xgboost.
XGBRegressor
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)
Bases: xgboost.sklearn.XGBModel
, object
 Implementation of the scikitlearn API for XGBoost regression.
 Parameters
 max_depth : int
 Maximum tree depth for base learners.
 learning_rate : float
 Boosting learning rate (xgb’s “eta”)
 n_estimators : int
 Number of boosted trees to fit.
 silent : boolean
 Whether to print messages while running boosting.
 objective : string or callable
 Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
 booster: string
 Specify which booster to use: gbtree, gblinear or dart.
 nthread : int
 Number of parallel threads used to run xgboost. (Deprecated, please use n_jobs)
 n_jobs : int
 Number of parallel threads used to run xgboost. (replaces nthread)
 gamma : float
 Minimum loss reduction required to make a further partition on a leaf node of the tree.
 min_child_weight : int
 Minimum sum of instance weight(hessian) needed in a child.
 max_delta_step : int
 Maximum delta step we allow each tree’s weight estimation to be.
 subsample : float
 Subsample ratio of the training instance.
 colsample_bytree : float
 Subsample ratio of columns when constructing each tree.
 colsample_bylevel : float
 Subsample ratio of columns for each split, in each level.
 reg_alpha : float (xgb’s alpha)
 L1 regularization term on weights
 reg_lambda : float (xgb’s lambda)
 L2 regularization term on weights
 scale_pos_weight : float
 Balancing of positive and negative weights.
 base_score:
 The initial prediction score of all instances, global bias.
 seed : int
 Random number seed. (Deprecated, please use random_state)
 random_state : int
 Random number seed. (replaces seed)
 missing : float, optional
 Value in the data which needs to be present as a missing value. If
None, defaults to np.nan.
 **kwargs : dict, optional
Keyword arguments for XGBoost Booster object. Full documentation of parameters can
be found here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.md.
Attempting to set a parameter via the constructor args and **kwargs dict simultaneously
will result in a TypeError.
Note:
**kwargs is unsupported by Sklearn. We do not guarantee that parameters passed via
this argument will interact properly with Sklearn.
Note
A custom objective function can be provided for the objective
parameter. In this case, it should have the signature
objective(y_true, y_pred) > grad, hess
:
 y_true: array_like of shape [n_samples]
 The target values
 y_pred: array_like of shape [n_samples]
 The predicted values
 grad: array_like of shape [n_samples]
 The value of the gradient for each sample point.
 hess: array_like of shape [n_samples]
 The value of the second derivative for each sample point

class
xgboost.
XGBClassifier
(max_depth=3, learning_rate=0.1, n_estimators=100, silent=True, objective='binary:logistic', booster='gbtree', n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)
Bases: xgboost.sklearn.XGBModel
, object
Implementation of the scikitlearn API for XGBoost classification.
Parameters
 max_depth : int
 Maximum tree depth for base learners.
 learning_rate : float
 Boosting learning rate (xgb’s “eta”)
 n_estimators : int
 Number of boosted trees to fit.
 silent : boolean
 Whether to print messages while running boosting.
 objective : string or callable
 Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
 booster: string
 Specify which booster to use: gbtree, gblinear or dart.
 nthread : int
 Number of parallel threads used to run xgboost. (Deprecated, please use n_jobs)
 n_jobs : int
 Number of parallel threads used to run xgboost. (replaces nthread)
 gamma : float
 Minimum loss reduction required to make a further partition on a leaf node of the tree.
 min_child_weight : int
 Minimum sum of instance weight(hessian) needed in a child.
 max_delta_step : int
 Maximum delta step we allow each tree’s weight estimation to be.
 subsample : float
 Subsample ratio of the training instance.
 colsample_bytree : float
 Subsample ratio of columns when constructing each tree.
 colsample_bylevel : float
 Subsample ratio of columns for each split, in each level.
 reg_alpha : float (xgb’s alpha)
 L1 regularization term on weights
 reg_lambda : float (xgb’s lambda)
 L2 regularization term on weights
 scale_pos_weight : float
 Balancing of positive and negative weights.
 base_score:
 The initial prediction score of all instances, global bias.
 seed : int
 Random number seed. (Deprecated, please use random_state)
 random_state : int
 Random number seed. (replaces seed)
 missing : float, optional
 Value in the data which needs to be present as a missing value. If
None, defaults to np.nan.
 **kwargs : dict, optional
Keyword arguments for XGBoost Booster object. Full documentation of parameters can
be found here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.md.
Attempting to set a parameter via the constructor args and **kwargs dict simultaneously
will result in a TypeError.
Note:
**kwargs is unsupported by Sklearn. We do not guarantee that parameters passed via
this argument will interact properly with Sklearn.
Note
A custom objective function can be provided for the objective
parameter. In this case, it should have the signature
objective(y_true, y_pred) > grad, hess
:
 y_true: array_like of shape [n_samples]
 The target values
 y_pred: array_like of shape [n_samples]
 The predicted values
 grad: array_like of shape [n_samples]
 The value of the gradient for each sample point.
 hess: array_like of shape [n_samples]
 The value of the second derivative for each sample point

evals_result
()
Return the evaluation results.
If eval_set is passed to the fit function, you can call evals_result() to
get evaluation results for all passed eval_sets. When eval_metric is also
passed to the fit function, the evals_result will contain the eval_metrics
passed to the fit function
Returns:  evals_result 
Return type:  dictionary 
Example
param_dist = {‘objective’:’binary:logistic’, ‘n_estimators’:2}
clf = xgb.XGBClassifier(**param_dist)
 clf.fit(X_train, y_train,
 eval_set=[(X_train, y_train), (X_test, y_test)],
eval_metric=’logloss’,
verbose=True)
evals_result = clf.evals_result()
The variable evals_result will contain:
{‘validation_0’: {‘logloss’: [‘0.604835’, ‘0.531479’]},
‘validation_1’: {‘logloss’: [‘0.41965’, ‘0.17686’]}}

fit
(X, y, sample_weight=None, eval_set=None, eval_metric=None, early_stopping_rounds=None, verbose=True, xgb_model=None, sample_weight_eval_set=None)
Fit gradient boosting classifier
Parameters: 
 X (array_like) – Feature matrix
 y (array_like) – Labels
 sample_weight (array_like) – Weight for each instance
 eval_set (list, optional) – A list of (X, y) pairs to use as a validation set for
earlystopping
 sample_weight_eval_set (list, optional) – A list of the form [L_1, L_2, …, L_n], where each L_i is a list of
instance weights on the ith validation set.
 eval_metric (str, callable, optional) – If a str, should be a builtin evaluation metric to use. See
doc/parameter.md. If callable, a custom evaluation metric. The call
signature is func(y_predicted, y_true) where y_true will be a
DMatrix object such that you may need to call the get_label
method. It must return a str, value pair where the str is a name
for the evaluation and value is the value of the evaluation
function. This objective is always minimized.
 early_stopping_rounds (int, optional) – Activates early stopping. Validation error needs to decrease at
least every <early_stopping_rounds> round(s) to continue training.
Requires at least one item in evals. If there’s more than one,
will use the last. Returns the model from the last iteration
(not the best one). If early stopping occurs, the model will
have three additional fields: bst.best_score, bst.best_iteration
and bst.best_ntree_limit.
(Use bst.best_ntree_limit to get the correct value if num_parallel_tree
and/or num_class appears in the parameters)
 verbose (bool) – If verbose and an evaluation set is used, writes the evaluation
metric measured on the validation set to stderr.
 xgb_model (str) – file name of stored xgb model or ‘Booster’ instance Xgb model to be
loaded before training (allows training continuation).


predict
(data, output_margin=False, ntree_limit=None)
Predict with data.
NOTE: This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy() to make copies
of model object and then call predict
Parameters: 
 data (DMatrix) – The dmatrix storing the input.
 output_margin (bool) – Whether to output the raw untransformed margin value.
 ntree_limit (int) – Limit number of trees in the prediction; defaults to best_ntree_limit if defined
(i.e. it has been trained with early stopping), otherwise 0 (use all trees).

Returns:  prediction

Return type:  numpy array


predict_proba
(data, ntree_limit=None)
Predict the probability of each data example being of a given class.
NOTE: This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy() to make copies
of model object and then call predict
Parameters: 
 data (DMatrix) – The dmatrix storing the input.
 ntree_limit (int) – Limit number of trees in the prediction; defaults to best_ntree_limit if defined
(i.e. it has been trained with early stopping), otherwise 0 (use all trees).

Returns:  prediction – a numpy array with the probability of each data example being of a given class.

Return type:  numpy array

Plotting API
Plotting Library.

xgboost.
plot_importance
(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='F score', ylabel='Features', importance_type='weight', max_num_features=None, grid=True, show_values=True, **kwargs)
Plot importance based on fitted trees.
Parameters: 
 booster (Booster, XGBModel or dict) – Booster or XGBModel instance, or dict taken by Booster.get_fscore()
 ax (matplotlib Axes, default None) – Target axes instance. If None, new figure and axes will be created.
 grid (bool, Turn the axes grids on or off. Default is True (On)) –
 importance_type (str, default "weight") –
How the importance is calculated: either “weight”, “gain”, or “cover”
“weight” is the number of times a feature appears in a tree
“gain” is the average gain of splits which use the feature
“cover” is the average coverage of splits which use the feature
where coverage is defined as the number of samples affected by the split
 max_num_features (int, default None) – Maximum number of top features displayed on plot. If None, all features will be displayed.
 height (float, default 0.2) – Bar height, passed to ax.barh()
 xlim (tuple, default None) – Tuple passed to axes.xlim()
 ylim (tuple, default None) – Tuple passed to axes.ylim()
 title (str, default "Feature importance") – Axes title. To disable, pass None.
 xlabel (str, default "F score") – X axis title label. To disable, pass None.
 ylabel (str, default "Features") – Y axis title label. To disable, pass None.
 show_values (bool, default True) – Show values on plot. To disable, pass False.
 kwargs – Other keywords passed to ax.barh()

Returns:  ax

Return type:  matplotlib Axes


xgboost.
plot_tree
(booster, fmap='', num_trees=0, rankdir='UT', ax=None, **kwargs)
Plot specified tree.
Parameters: 
 booster (Booster, XGBModel) – Booster or XGBModel instance
 fmap (str (optional)) – The name of feature map file
 num_trees (int, default 0) – Specify the ordinal number of target tree
 rankdir (str, default "UT") – Passed to graphiz via graph_attr
 ax (matplotlib Axes, default None) – Target axes instance. If None, new figure and axes will be created.
 kwargs – Other keywords passed to to_graphviz

Returns:  ax

Return type:  matplotlib Axes


xgboost.
to_graphviz
(booster, fmap='', num_trees=0, rankdir='UT', yes_color='#0000FF', no_color='#FF0000', **kwargs)
Convert specified tree to graphviz instance. IPython can automatically plot the
returned graphiz instance. Otherwise, you should call .render() method
of the returned graphiz instance.
Parameters: 
 booster (Booster, XGBModel) – Booster or XGBModel instance
 fmap (str (optional)) – The name of feature map file
 num_trees (int, default 0) – Specify the ordinal number of target tree
 rankdir (str, default "UT") – Passed to graphiz via graph_attr
 yes_color (str, default '#0000FF') – Edge color when meets the node condition.
 no_color (str, default '#FF0000') – Edge color when doesn’t meet the node condition.
 kwargs – Other keywords passed to graphviz graph_attr

Returns:  ax

Return type:  matplotlib Axes
