xgboost
Typedefs | Functions
Collective

Experimental support for exposing internal communicator in XGBoost. More...

Typedefs

typedef void * TrackerHandle
 Handle to tracker. More...
 

Functions

int XGTrackerCreate (char const *config, TrackerHandle *handle)
 Create a new tracker. More...
 
int XGTrackerWorkerArgs (TrackerHandle handle, char const **args)
 Get the arguments needed for running workers. This should be called after XGTrackerRun(). More...
 
int XGTrackerRun (TrackerHandle handle, char const *config)
 Start the tracker. The tracker runs in the background and this function returns once the tracker is started. More...
 
int XGTrackerWaitFor (TrackerHandle handle, char const *config)
 Wait for the tracker to finish, should be called after XGTrackerRun(). This function will block until the tracker task is finished or timeout is reached. More...
 
int XGTrackerFree (TrackerHandle handle)
 Free a tracker instance. This should be called after XGTrackerWaitFor(). If the tracker is not properly waited, this function will shutdown all connections with the tracker, potentially leading to undefined behavior. More...
 
int XGCommunicatorInit (char const *config)
 Initialize the collective communicator. More...
 
int XGCommunicatorFinalize (void)
 Finalize the collective communicator. More...
 
int XGCommunicatorGetRank (void)
 Get rank of current process. More...
 
int XGCommunicatorGetWorldSize (void)
 Get total number of processes. More...
 
int XGCommunicatorIsDistributed (void)
 Get if the communicator is distributed. More...
 
int XGCommunicatorPrint (char const *message)
 Print the message to the communicator. More...
 
int XGCommunicatorGetProcessorName (const char **name_str)
 Get the name of the processor. More...
 
int XGCommunicatorBroadcast (void *send_receive_buffer, size_t size, int root)
 Broadcast a memory region to all others from root. This function is NOT thread-safe. More...
 
int XGCommunicatorAllreduce (void *send_receive_buffer, size_t count, int data_type, int op)
 Perform in-place allreduce. This function is NOT thread-safe. More...
 

Detailed Description

Experimental support for exposing internal communicator in XGBoost.

Typedef Documentation

◆ TrackerHandle

typedef void* TrackerHandle

Handle to tracker.

There are currently two types of tracker in XGBoost, first one is rabit, while the other one is federated.

This is still under development.

Function Documentation

◆ XGCommunicatorAllreduce()

int XGCommunicatorAllreduce ( void *  send_receive_buffer,
size_t  count,
int  data_type,
int  op 
)

Perform in-place allreduce. This function is NOT thread-safe.

Example Usage: the following code gives sum of the result

vector<int> data(10);
...
Allreduce(&data[0], data.size(), DataType:kInt32, Op::kSum);
...
DataType
data type accepted by xgboost interface
Definition: data.h:32
Parameters
send_receive_bufferBuffer for both sending and receiving data.
countNumber of elements to be reduced.
data_typeEnumeration of data type, see xgboost::collective::DataType in communicator.h.
opEnumeration of operation type, see xgboost::collective::Operation in communicator.h.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorBroadcast()

int XGCommunicatorBroadcast ( void *  send_receive_buffer,
size_t  size,
int  root 
)

Broadcast a memory region to all others from root. This function is NOT thread-safe.

Example:

int a = 1;
Broadcast(&a, sizeof(a), root);
Parameters
send_receive_bufferPointer to the send or receive buffer.
sizeSize of the data.
rootThe process rank to broadcast from.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorFinalize()

int XGCommunicatorFinalize ( void  )

Finalize the collective communicator.

Call this function after you finished all jobs.

Returns
0 for success, -1 for failure.

◆ XGCommunicatorGetProcessorName()

int XGCommunicatorGetProcessorName ( const char **  name_str)

Get the name of the processor.

Parameters
name_strPointer to received returned processor name.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorGetRank()

int XGCommunicatorGetRank ( void  )

Get rank of current process.

Returns
Rank of the worker.

◆ XGCommunicatorGetWorldSize()

int XGCommunicatorGetWorldSize ( void  )

Get total number of processes.

Returns
Total world size.

◆ XGCommunicatorInit()

int XGCommunicatorInit ( char const *  config)

Initialize the collective communicator.

Currently the communicator API is experimental, function signatures may change in the future without notice.

Call this once before using anything.

The additional configuration is not required. Usually the communicator will detect settings from environment variables.

Parameters
configJSON encoded configuration. Accepted JSON keys are:
  • xgboost_communicator: The type of the communicator. Can be set as an environment variable.
    • rabit: Use Rabit. This is the default if the type is unspecified.
    • federated: Use the gRPC interface for Federated Learning. Only applicable to the Rabit communicator (these are case-sensitive):
  • rabit_tracker_uri: Hostname of the tracker.
  • rabit_tracker_port: Port number of the tracker.
  • rabit_task_id: ID of the current task, can be used to obtain deterministic rank assignment.
  • rabit_world_size: Total number of workers.
  • rabit_timeout: Enable timeout.
  • rabit_timeout_sec: Timeout in seconds. Only applicable to the Rabit communicator (these are case-sensitive, and can be set as environment variables):
  • DMLC_TRACKER_URI: Hostname of the tracker.
  • DMLC_TRACKER_PORT: Port number of the tracker.
  • DMLC_TASK_ID: ID of the current task, can be used to obtain deterministic rank assignment.
  • DMLC_WORKER_CONNECT_RETRY: Number of retries to connect to the tracker.
  • dmlc_nccl_path: The path to NCCL shared object. Only used if XGBoost is compiled with USE_DLOPEN_NCCL. Only applicable to the Federated communicator (use upper case for environment variables, use lower case for runtime configuration):
  • federated_server_address: Address of the federated server.
  • federated_world_size: Number of federated workers.
  • federated_rank: Rank of the current worker.
  • federated_server_cert: Server certificate file path. Only needed for the SSL mode.
  • federated_client_key: Client key file path. Only needed for the SSL mode.
  • federated_client_cert: Client certificate file path. Only needed for the SSL mode.
Returns
0 for success, -1 for failure.

◆ XGCommunicatorIsDistributed()

int XGCommunicatorIsDistributed ( void  )

Get if the communicator is distributed.

Returns
True if the communicator is distributed.

◆ XGCommunicatorPrint()

int XGCommunicatorPrint ( char const *  message)

Print the message to the communicator.

This function can be used to communicate the information of the progress to the user who monitors the communicator.

Parameters
messageThe message to be printed.
Returns
0 for success, -1 for failure.

◆ XGTrackerCreate()

int XGTrackerCreate ( char const *  config,
TrackerHandle handle 
)

Create a new tracker.

Parameters
configJSON encoded parameters.
  • dmlc_communicator: String, the type of tracker to create. Available options are rabit and federated.
  • n_workers: Integer, the number of workers.
  • port: (Optional) Integer, the port this tracker should listen to.
  • timeout: (Optional) Integer, timeout in seconds for various networking operations.

Some configurations are rabit specific:

  • host: (Optional) String, Used by the the rabit tracker to specify the address of the host.

Some federated specific configurations:

  • federated_secure: Boolean, whether this is a secure server.
  • server_key_path: Path to the server key. Used only if this is a secure server.
  • server_cert_path: Path to the server certificate. Used only if this is a secure server.
  • client_cert_path: Path to the client certificate. Used only if this is a secure server.
Parameters
handleThe handle to the created tracker.
Returns
0 for success, -1 for failure.

◆ XGTrackerFree()

int XGTrackerFree ( TrackerHandle  handle)

Free a tracker instance. This should be called after XGTrackerWaitFor(). If the tracker is not properly waited, this function will shutdown all connections with the tracker, potentially leading to undefined behavior.

Parameters
handleThe handle to the tracker.
Returns
0 for success, -1 for failure.

◆ XGTrackerRun()

int XGTrackerRun ( TrackerHandle  handle,
char const *  config 
)

Start the tracker. The tracker runs in the background and this function returns once the tracker is started.

Parameters
handleThe handle to the tracker.
configUnused at the moment, preserved for the future.
Returns
0 for success, -1 for failure.

◆ XGTrackerWaitFor()

int XGTrackerWaitFor ( TrackerHandle  handle,
char const *  config 
)

Wait for the tracker to finish, should be called after XGTrackerRun(). This function will block until the tracker task is finished or timeout is reached.

Parameters
handleThe handle to the tracker.
configJSON encoded configuration. No argument is required yet, preserved for the future.
Returns
0 for success, -1 for failure.

◆ XGTrackerWorkerArgs()

int XGTrackerWorkerArgs ( TrackerHandle  handle,
char const **  args 
)

Get the arguments needed for running workers. This should be called after XGTrackerRun().

Parameters
handleThe handle to the tracker.
argsThe arguments returned as a JSON document.
Returns
0 for success, -1 for failure.