C API Tutorial

In this tutorial, we are going to install XGBoost library & configure the CMakeLists.txt file of our C/C++ application to link XGBoost library with our application. Later on, we will see some useful tips for using C API and code snippets as examples to use various functions available in C API to perform basic task like loading, training model & predicting on test dataset.

Requirements

Install CMake - Follow the cmake installation documentation for instructions. Install Conda - Follow the conda installation documentation for instructions

Install XGBoost on conda environment

Run the following commands on your terminal. The below commands will install the XGBoost in your XGBoost folder of the repository cloned

# clone the XGBoost repository & its submodules
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
mkdir build
cd build
# Activate the Conda environment, into which we'll install XGBoost
conda activate [env_name]
# Build the compiled version of XGBoost inside the build folder
cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
# install XGBoost in your conda environment (usually under [your home directory]/miniconda3)
make install

Usefull Tips To Remember

Below are some useful tips while using C API:

  1. Error handling: Always check the return value of the C API functions.

  1. In a C application: Use the following macro to guard all calls to XGBoost’s C API functions. The macro prints all the error/ exception occurred:

1#define safe_xgboost(call) {  \
2  int err = (call); \
3  if (err != 0) { \
4    fprintf(stderr, "%s:%d: error in %s: %s\n", __FILE__, __LINE__, #call, XGBGetLastError());  \
5    exit(1); \
6  } \
7}

In your application, wrap all C API function calls with the macro as follows:

DMatrixHandle train;
safe_xgboost(XGDMatrixCreateFromFile("/path/to/training/dataset/", silent, &train));
  1. In a C++ application: modify the macro safe_xgboost to throw an exception upon an error.

1#define safe_xgboost(call) {  \
2  int err = (call); \
3  if (err != 0) { \
4    throw new Exception(std::string(__FILE__) + ":" + std::to_string(__LINE__) + \
5                        ": error in " + #call + ":" + XGBGetLastError()));  \
6  } \
7}
  1. Assertion technique: It works both in C/ C++. If expression evaluates to 0 (false), then the expression, source code filename, and line number are sent to the standard error, and then abort() function is called. It can be used to test assumptions made by you in the code.

DMatrixHandle dmat;
assert( XGDMatrixCreateFromFile("training_data.libsvm", 0, &dmat) == 0);
  1. Always remember to free the allocated space by BoosterHandle & DMatrixHandle appropriately:

 1#include <assert.h>
 2#include <stdio.h>
 3#include <stdlib.h>
 4#include <xgboost/c_api.h>
 5
 6int main(int argc, char** argv) {
 7  int silent = 0;
 8
 9  BoosterHandle booster;
10
11  // do something with booster
12
13  //free the memory
14  XGBoosterFree(booster)
15
16  DMatrixHandle DMatrixHandle_param;
17
18  // do something with DMatrixHandle_param
19
20  // free the memory
21  XGDMatrixFree(DMatrixHandle_param);
22
23  return 0;
24}
  1. For tree models, it is important to use consistent data formats during training and scoring/ predicting otherwise it will result in wrong outputs. Example if we our training data is in dense matrix format then your prediction dataset should also be a dense matrix or if training in libsvm format then dataset for prediction should also be in libsvm format.

  2. Always use strings for setting values to the parameters in booster handle object. The paramter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings.

BoosterHandle booster;
XGBoosterSetParam(booster, "paramter_name", "0.1");

Sample examples along with Code snippet to use C API functions

  1. If the dataset is available in a file, it can be loaded into a DMatrix object using the XGDMatrixCreateFromFile

DMatrixHandle data; // handle to DMatrix
// Load the dat from file & store it in data variable of DMatrixHandle datatype
safe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename", silent, &data));
  1. You can also create a DMatrix object from a 2D Matrix using the XGDMatrixCreateFromMat function

 1// 1D matrix
 2const int data1[] = { 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 };
 3
 4// 2D matrix
 5const int ROWS = 5, COLS = 3;
 6const int data2[ROWS][COLS] = { {1, 2, 3}, {2, 4, 6}, {3, -1, 9}, {4, 8, -1}, {2, 5, 1}, {0, 1, 5} };
 7DMatrixHandle dmatrix1, dmatrix2;
 8// Pass the matrix, no of rows & columns contained in the matrix variable
 9// here '0' represents the missing value in the matrix dataset
10// dmatrix variable will contain the created DMatrix using it
11safe_xgboost(XGDMatrixCreateFromMat(data1, 1, 50, 0, &dmatrix));
12// here -1 represents the missing value in the matrix dataset
13safe_xgboost(XGDMatrixCreateFromMat(data2, ROWS, COLS, -1, &dmatrix2));
  1. Create a Booster object for training & testing on dataset using XGBoosterCreate

1BoosterHandle booster;
2const int eval_dmats_size;
3// We assume that training and test data have been loaded into 'train' and 'test'
4DMatrixHandle eval_dmats[eval_dmats_size] = {train, test};
5safe_xgboost(XGBoosterCreate(eval_dmats, eval_dmats_size, &booster));
  1. For each DMatrix object, set the labels using XGDMatrixSetFloatInfo. Later you can access the label using XGDMatrixGetFloatInfo.

 1const int ROWS=5, COLS=3;
 2const int data[ROWS][COLS] = { {1, 2, 3}, {2, 4, 6}, {3, -1, 9}, {4, 8, -1}, {2, 5, 1}, {0, 1, 5} };
 3DMatrixHandle dmatrix;
 4
 5safe_xgboost(XGDMatrixCreateFromMat(data, ROWS, COLS, -1, &dmatrix));
 6
 7// variable to store labels for the dataset created from above matrix
 8float labels[ROWS];
 9
10for (int i = 0; i < ROWS; i++) {
11  labels[i] = i;
12}
13
14// Loading the labels
15safe_xgboost(XGDMatrixSetFloatInfo(dmatrix, "label", labels, ROWS));
16
17// reading the labels and store the length of the result
18bst_ulong result_len;
19
20// labels result
21const float *result;
22
23safe_xgboost(XGDMatrixGetFloatInfo(dmatrix, "label", &result_len, &result));
24
25for(unsigned int i = 0; i < result_len; i++) {
26  printf("label[%i] = %f\n", i, result[i]);
27}
  1. Set the parameters for the Booster object according to the requirement using XGBoosterSetParam . Check out the full list of parameters available here .

1BoosterHandle booster;
2safe_xgboost(XGBoosterSetParam(booster, "booster", "gblinear"));
3// default max_depth =6
4safe_xgboost(XGBoosterSetParam(booster, "max_depth", "3"));
5// default eta  = 0.3
6safe_xgboost(XGBoosterSetParam(booster, "eta", "0.1"));
  1. Train & evaluate the model using XGBoosterUpdateOneIter and XGBoosterEvalOneIter respectively.

 1int num_of_iterations = 20;
 2const char* eval_names[eval_dmats_size] = {"train", "test"};
 3const char* eval_result = NULL;
 4
 5for (int i = 0; i < num_of_iterations; ++i) {
 6  // Update the model performance for each iteration
 7  safe_xgboost(XGBoosterUpdateOneIter(booster, i, train));
 8
 9  // Give the statistics for the learner for training & testing dataset in terms of error after each iteration
10  safe_xgboost(XGBoosterEvalOneIter(booster, i, eval_dmats, eval_names, eval_dmats_size, &eval_result));
11  printf("%s\n", eval_result);
12}

Note

For customized loss function, use XGBoosterBoostOneIter function instead and manually specify the gradient and 2nd order gradient.

  1. Predict the result on a test set using XGBoosterPredict

1bst_ulong output_length;
2
3const float *output_result;
4safe_xgboost(XGBoosterPredict(booster, test, 0, 0, &output_length, &output_result));
5
6for (unsigned int i = 0; i < output_length; i++){
7  printf("prediction[%i] = %f \n", i, output_result[i]);
8}
  1. Free all the internal structure used in your code using XGDMatrixFree and XGBoosterFree. This step is important to prevent memory leak.

safe_xgboost(XGDMatrixFree(dmatrix));
safe_xgboost(XGBoosterFree(booster));
  1. Get the number of features in your dataset using XGBoosterGetNumFeature.

1bst_ulong num_of_features = 0;
2
3// Assuming booster variable of type BoosterHandle is already declared
4// and dataset is loaded and trained on booster
5// storing the results in num_of_features variable
6safe_xgboost(XGBoosterGetNumFeature(booster, &num_of_features));
7
8// Printing number of features by type conversion of num_of_features variable from bst_ulong to unsigned long
9printf("num_feature: %lu\n", (unsigned long)(num_of_features));
  1. Load the model using XGBoosterLoadModel function

 1BoosterHandle booster;
 2const char *model_path = "/path/of/model";
 3
 4// create booster handle first
 5safe_xgboost(XGBoosterCreate(NULL, 0, &booster));
 6
 7// set the model parameters here
 8
 9// load model
10safe_xgboost(XGBoosterLoadModel(booster, model_path));
11
12// predict the model here