Distributed XGBoost with Kubernetes

Kubeflow community provides XGBoost Operator to support distributed XGBoost training and batch prediction in a Kubernetes cluster. It provides an easy and efficient XGBoost model training and batch prediction in distributed fashion.

How to use

In order to run a XGBoost job in a Kubernetes cluster, carry out the following steps:

  1. Install XGBoost Operator in Kubernetes.

    1. XGBoost Operator is designed to manage XGBoost jobs, including job scheduling, monitoring, pods and services recovery etc. Follow the installation guide to install XGBoost Operator.

  2. Write application code to interface with the XGBoost operator.

    1. You’ll need to furnish a few scripts to inteface with the XGBoost operator. Refer to the Iris classification example.

    2. Data reader/writer: you need to have your data source reader and writer based on the requirement. For example, if your data is stored in a Hive Table, you have to write your own code to read/write Hive table based on the ID of worker.

    3. Model persistence: in this example, model is stored in the OSS storage. If you want to store your model into Amazon S3, Google NFS or other storage, you’ll need to specify the model reader and writer based on the requirement of storage system.

  3. Configure the XGBoost job using a YAML file.

    1. YAML file is used to configure the computation resource and environment for your XGBoost job to run, e.g. the number of workers and masters. The template YAML template is provided for reference.

  4. Submit XGBoost job to Kubernetes cluster.

    1. Kubectl command is used to submit a XGBoost job, and then you can monitor the job status.

Work in progress

  • XGBoost Model serving

  • Distributed data reader/writer from/to HDFS, HBase, Hive etc.

  • Model persistence on Amazon S3, Google NFS etc.