xgboost
Functions
Collective

Experimental support for exposing internal communicator in XGBoost. More...

Functions

int XGCommunicatorInit (char const *config)
 Initialize the collective communicator. More...
 
int XGCommunicatorFinalize (void)
 Finalize the collective communicator. More...
 
int XGCommunicatorGetRank (void)
 Get rank of current process. More...
 
int XGCommunicatorGetWorldSize (void)
 Get total number of processes. More...
 
int XGCommunicatorIsDistributed (void)
 Get if the communicator is distributed. More...
 
int XGCommunicatorPrint (char const *message)
 Print the message to the communicator. More...
 
int XGCommunicatorGetProcessorName (const char **name_str)
 Get the name of the processor. More...
 
int XGCommunicatorBroadcast (void *send_receive_buffer, size_t size, int root)
 Broadcast a memory region to all others from root. This function is NOT thread-safe. More...
 
int XGCommunicatorAllreduce (void *send_receive_buffer, size_t count, int data_type, int op)
 Perform in-place allreduce. This function is NOT thread-safe. More...
 

Detailed Description

Experimental support for exposing internal communicator in XGBoost.

Function Documentation

◆ XGCommunicatorAllreduce()

int XGCommunicatorAllreduce ( void *  send_receive_buffer,
size_t  count,
int  data_type,
int  op 
)

Perform in-place allreduce. This function is NOT thread-safe.

Example Usage: the following code gives sum of the result

vector<int> data(10);
...
Allreduce(&data[0], data.size(), DataType:kInt32, Op::kSum);
...
DataType
data type accepted by xgboost interface
Definition: data.h:33
Parameters
send_receive_bufferBuffer for both sending and receiving data.
countNumber of elements to be reduced.
data_typeEnumeration of data type, see xgboost::collective::DataType in communicator.h.
opEnumeration of operation type, see xgboost::collective::Operation in communicator.h.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorBroadcast()

int XGCommunicatorBroadcast ( void *  send_receive_buffer,
size_t  size,
int  root 
)

Broadcast a memory region to all others from root. This function is NOT thread-safe.

Example:

int a = 1;
Broadcast(&a, sizeof(a), root);
Parameters
send_receive_bufferPointer to the send or receive buffer.
sizeSize of the data.
rootThe process rank to broadcast from.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorFinalize()

int XGCommunicatorFinalize ( void  )

Finalize the collective communicator.

Call this function after you finished all jobs.

Returns
0 for success, -1 for failure.

◆ XGCommunicatorGetProcessorName()

int XGCommunicatorGetProcessorName ( const char **  name_str)

Get the name of the processor.

Parameters
name_strPointer to received returned processor name.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorGetRank()

int XGCommunicatorGetRank ( void  )

Get rank of current process.

Returns
Rank of the worker.

◆ XGCommunicatorGetWorldSize()

int XGCommunicatorGetWorldSize ( void  )

Get total number of processes.

Returns
Total world size.

◆ XGCommunicatorInit()

int XGCommunicatorInit ( char const *  config)

Initialize the collective communicator.

Currently the communicator API is experimental, function signatures may change in the future without notice.

Call this once before using anything.

The additional configuration is not required. Usually the communicator will detect settings from environment variables.

Parameters
configJSON encoded configuration. Accepted JSON keys are:
  • xgboost_communicator: The type of the communicator. Can be set as an environment variable.
    • rabit: Use Rabit. This is the default if the type is unspecified.
    • mpi: Use MPI.
    • federated: Use the gRPC interface for Federated Learning. Only applicable to the Rabit communicator (these are case-sensitive):
  • rabit_tracker_uri: Hostname of the tracker.
  • rabit_tracker_port: Port number of the tracker.
  • rabit_task_id: ID of the current task, can be used to obtain deterministic rank assignment.
  • rabit_world_size: Total number of workers.
  • rabit_hadoop_mode: Enable Hadoop support.
  • rabit_tree_reduce_minsize: Minimal size for tree reduce.
  • rabit_reduce_ring_mincount: Minimal count to perform ring reduce.
  • rabit_reduce_buffer: Size of the reduce buffer.
  • rabit_bootstrap_cache: Size of the bootstrap cache.
  • rabit_debug: Enable debugging.
  • rabit_timeout: Enable timeout.
  • rabit_timeout_sec: Timeout in seconds.
  • rabit_enable_tcp_no_delay: Enable TCP no delay on Unix platforms. Only applicable to the Rabit communicator (these are case-sensitive, and can be set as environment variables):
  • DMLC_TRACKER_URI: Hostname of the tracker.
  • DMLC_TRACKER_PORT: Port number of the tracker.
  • DMLC_TASK_ID: ID of the current task, can be used to obtain deterministic rank assignment.
  • DMLC_ROLE: Role of the current task, "worker" or "server".
  • DMLC_NUM_ATTEMPT: Number of attempts after task failure.
  • DMLC_WORKER_CONNECT_RETRY: Number of retries to connect to the tracker. Only applicable to the Federated communicator (use upper case for environment variables, use lower case for runtime configuration):
  • federated_server_address: Address of the federated server.
  • federated_world_size: Number of federated workers.
  • federated_rank: Rank of the current worker.
  • federated_server_cert: Server certificate file path. Only needed for the SSL mode.
  • federated_client_key: Client key file path. Only needed for the SSL mode.
  • federated_client_cert: Client certificate file path. Only needed for the SSL mode.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorIsDistributed()

int XGCommunicatorIsDistributed ( void  )

Get if the communicator is distributed.

Returns
True if the communicator is distributed.

◆ XGCommunicatorPrint()

int XGCommunicatorPrint ( char const *  message)

Print the message to the communicator.

This function can be used to communicate the information of the progress to the user who monitors the communicator.

Parameters
messageThe message to be printed.
Returns
0 for success, -1 for failure.