XGBoost JVM Package

Build Status GitHub license

You have found the XGBoost JVM Package!

Installation

Installation from Maven repository

Access release version

You can use XGBoost4J in your Java/Scala application by adding XGBoost4J as a dependency:

Maven
<properties>
  ...
  <!-- Specify Scala version in package name -->
  <scala.binary.version>2.12</scala.binary.version>
</properties>

<dependencies>
  ...
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j_${scala.binary.version}</artifactId>
      <version>latest_version_num</version>
  </dependency>
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
      <version>latest_version_num</version>
  </dependency>
</dependencies>
sbt
libraryDependencies ++= Seq(
  "ml.dmlc" %% "xgboost4j" % "latest_version_num",
  "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
)

This will check out the latest stable version from the Maven Central.

For the latest release version number, please check here.

To enable the GPU algorithm (tree_method='gpu_hist'), use artifacts xgboost4j-gpu_2.12 and xgboost4j-spark-gpu_2.12 instead (note the gpu suffix).

Note

Using Maven repository hosted by the XGBoost project

There may be some delay until a new release becomes available to Maven Central. If you would like to access the latest release immediately, add the Maven repository hosted by the XGBoost project:

Maven
<repository>
  <id>XGBoost4J Release Repo</id>
  <name>XGBoost4J Release Repo</name>
  <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release/</url>
</repository>
sbt
resolvers += "XGBoost4J Release Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release/"

Note

Windows not supported in the JVM package

Currently, XGBoost4J-Spark does not support Windows platform, as the distributed training algorithm is inoperational for Windows. Please use Linux or MacOS.

Access SNAPSHOT version

First add the following Maven repository hosted by the XGBoost project:

Maven
<repository>
  <id>XGBoost4J Snapshot Repo</id>
  <name>XGBoost4J Snapshot Repo</name>
  <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
</repository>
sbt
resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/"

Then add XGBoost4J as a dependency:

maven
<properties>
  ...
  <!-- Specify Scala version in package name -->
  <scala.binary.version>2.12</scala.binary.version>
</properties>

<dependencies>
  ...
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j_${scala.binary.version}</artifactId>
      <version>latest_version_num-SNAPSHOT</version>
  </dependency>
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
      <version>latest_version_num-SNAPSHOT</version>
  </dependency>
</dependencies>
sbt
libraryDependencies ++= Seq(
  "ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT",
  "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT"
)

Look up the version field in pom.xml to get the correct version number.

The SNAPSHOT JARs are hosted by the XGBoost project. Every commit in the master branch will automatically trigger generation of a new SNAPSHOT JAR. You can control how often Maven should upgrade your SNAPSHOT installation by specifying updatePolicy. See here for details.

You can browse the file listing of the Maven repository at https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html.

To enable the GPU algorithm (tree_method='gpu_hist'), use artifacts xgboost4j-gpu_2.12 and xgboost4j-spark-gpu_2.12 instead (note the gpu suffix).

Installation from source

Building XGBoost4J using Maven requires Maven 3 or newer, Java 7+ and CMake 3.13+ for compiling the JNI bindings.

Before you install XGBoost4J, you need to define environment variable JAVA_HOME as your JDK directory to ensure that your compiler can find jni.h correctly, since XGBoost4J relies on JNI to implement the interaction between the JVM and native libraries.

After your JAVA_HOME is defined correctly, it is as simple as run mvn package under jvm-packages directory to install XGBoost4J. You can also skip the tests by running mvn -DskipTests=true package, if you are sure about the correctness of your local setup.

To publish the artifacts to your local maven repository, run

mvn install

Or, if you would like to skip tests, run

mvn -DskipTests install

This command will publish the xgboost binaries, the compiled java classes as well as the java sources to your local repository. Then you can use XGBoost4J in your Java projects by including the following dependency in pom.xml:

<dependency>
  <groupId>ml.dmlc</groupId>
  <artifactId>xgboost4j</artifactId>
  <version>latest_source_version_num</version>
</dependency>

For sbt, please add the repository and dependency in build.sbt as following:

resolvers += "Local Maven Repository" at "file://"+Path.userHome.absolutePath+"/.m2/repository"

"ml.dmlc" % "xgboost4j" % "latest_source_version_num"

If you want to use XGBoost4J-Spark, replace xgboost4j with xgboost4j-spark.

Note

XGBoost4J-Spark requires Apache Spark 2.3+

XGBoost4J-Spark now requires Apache Spark 2.3+. Latest versions of XGBoost4J-Spark uses facilities of org.apache.spark.ml.param.shared extensively to provide for a tight integration with Spark MLLIB framework, and these facilities are not fully available on earlier versions of Spark.

Also, make sure to install Spark directly from Apache website. Upstream XGBoost is not guaranteed to work with third-party distributions of Spark, such as Cloudera Spark. Consult appropriate third parties to obtain their distribution of XGBoost.

Enabling OpenMP for Mac OS

If you are on Mac OS and using a compiler that supports OpenMP, you need to go to the file xgboost/jvm-packages/create_jni.py and comment out the line

CONFIG["USE_OPENMP"] = "OFF"

in order to get the benefit of multi-threading.

Building with GPU support

If you want to build XGBoost4J that supports distributed GPU training, run

mvn -Duse.cuda=ON install

Contents