Getting Started with XGBoost4J
This tutorial introduces Java API for XGBoost.
Data Interface
Like the XGBoost python module, XGBoost4J uses DMatrix to handle data. LIBSVM txt format file, sparse matrix in CSR/CSC format, and dense matrix are supported.
The first step is to import DMatrix:
import ml.dmlc.xgboost4j.java.DMatrix;
Use DMatrix constructor to load data from a libsvm text format file:
DMatrix dmat = new DMatrix("train.svm.txt");
Pass arrays to DMatrix constructor to load from sparse matrix.
Suppose we have a sparse matrix
1 0 2 0 4 0 0 3 3 1 2 0
We can express the sparse matrix in Compressed Sparse Row (CSR) format:
long[] rowHeaders = new long[] {0,2,4,7}; float[] data = new float[] {1f,2f,4f,3f,3f,1f,2f}; int[] colIndex = new int[] {0,2,0,3,0,1,2}; int numColumn = 4; DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR, numColumn);
… or in Compressed Sparse Column (CSC) format:
long[] colHeaders = new long[] {0,3,4,6,7}; float[] data = new float[] {1f,4f,3f,1f,2f,2f,3f}; int[] rowIndex = new int[] {0,1,2,2,0,2,1}; int numRow = 3; DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC, numRow);
You may also load your data from a dense matrix. Let’s assume we have a matrix of form
1 2 3 4 5 6
Using row-major layout, we specify the dense matrix as follows:
float[] data = new float[] {1f,2f,3f,4f,5f,6f}; int nrow = 3; int ncol = 2; float missing = 0.0f; DMatrix dmat = new DMatrix(data, nrow, ncol, missing);
To set weight:
float[] weights = new float[] {1f,2f,1f}; dmat.setWeight(weights);
Setting Parameters
To set parameters, parameters are specified as a Map:
Map<String, Object> params = new HashMap<String, Object>() {
{
put("eta", 1.0);
put("max_depth", 2);
put("objective", "binary:logistic");
put("eval_metric", "logloss");
}
};
Training Model
With parameters and data, you are able to train a booster model.
Import Booster and XGBoost:
import ml.dmlc.xgboost4j.java.Booster; import ml.dmlc.xgboost4j.java.XGBoost;
Training
DMatrix trainMat = new DMatrix("train.svm.txt"); DMatrix validMat = new DMatrix("valid.svm.txt"); // Specify a watch list to see model accuracy on data sets Map<String, DMatrix> watches = new HashMap<String, DMatrix>() { { put("train", trainMat); put("test", testMat); } }; int nround = 2; Booster booster = XGBoost.train(trainMat, params, nround, watches, null, null);
Saving model
After training, you can save model and dump it out.
booster.saveModel("model.bin");
Generating model dump with feature map
// dump without feature map String[] model_dump = booster.getModelDump(null, false); // dump with feature map String[] model_dump_with_feature_map = booster.getModelDump("featureMap.txt", false);
Load a model
Booster booster = XGBoost.loadModel("model.bin");
Prediction
After training and loading a model, you can use it to make prediction for other data. The result will be a two-dimension float array (nsample, nclass)
; for predictLeaf()
, the result would be of shape (nsample, nclass*ntrees)
.
DMatrix dtest = new DMatrix("test.svm.txt");
// predict
float[][] predicts = booster.predict(dtest);
// predict leaf
float[][] leafPredicts = booster.predictLeaf(dtest, 0);