xgboost
Public Member Functions | Public Attributes | Static Public Attributes | List of all members
xgboost::MetaInfo Class Reference

Meta information about dataset, always sit in memory. More...

#include <data.h>

Collaboration diagram for xgboost::MetaInfo:
Collaboration graph

Public Member Functions

 MetaInfo ()=default
 default constructor More...
 
 MetaInfo (MetaInfo &&that)=default
 
MetaInfooperator= (MetaInfo &&that)=default
 
MetaInfooperator= (MetaInfo const &that)=delete
 
void Validate (DeviceOrd device) const
 Validate all metainfo. More...
 
MetaInfo Slice (common::Span< int32_t const > ridxs) const
 
MetaInfo Copy () const
 
bst_float GetWeight (size_t i) const
 Get weight of each instances. More...
 
const std::vector< size_t > & LabelAbsSort (Context const *ctx) const
 get sorted indexes (argsort) of labels by absolute value (used by cox loss) More...
 
void Clear ()
 clear all the information More...
 
void LoadBinary (dmlc::Stream *fi)
 Load the Meta info from binary stream. More...
 
void SaveBinary (dmlc::Stream *fo) const
 Save the Meta info to binary stream. More...
 
void SetInfo (Context const &ctx, StringView key, StringView interface_str)
 Set information in the meta info with array interface. More...
 
void GetInfo (char const *key, bst_ulong *out_len, DataType dtype, const void **out_dptr) const
 
void SetFeatureInfo (const char *key, const char **info, const bst_ulong size)
 
void GetFeatureInfo (const char *field, std::vector< std::string > *out_str_vecs) const
 
void Extend (MetaInfo const &that, bool accumulate_rows, bool check_column)
 Extend with other MetaInfo. More...
 
void SynchronizeNumberOfColumns (Context const *ctx)
 Synchronize the number of columns across all workers. More...
 
bool IsRowSplit () const
 Whether the data is split row-wise. More...
 
bool IsColumnSplit () const
 Whether the data is split column-wise. More...
 
bool IsRanking () const
 Whether this is a learning to rank data. More...
 
bool IsVerticalFederated () const
 A convenient method to check if we are doing vertical federated learning, which requires some special processing. More...
 
bool ShouldHaveLabels () const
 A convenient method to check if the MetaInfo should contain labels. More...
 
bool HasCategorical () const
 Flag for whether the DMatrix has categorical features. More...
 

Public Attributes

uint64_t num_row_ {0}
 number of rows in the data More...
 
uint64_t num_col_ {0}
 number of columns in the data More...
 
uint64_t num_nonzero_ {0}
 number of nonzero entries in the data More...
 
linalg::Tensor< float, 2 > labels
 label of each instance More...
 
DataSplitMode data_split_mode {DataSplitMode::kRow}
 data split mode More...
 
std::vector< bst_group_tgroup_ptr_
 the index of begin and end of a group needed when the learning task is ranking. More...
 
HostDeviceVector< bst_floatweights_
 weights of each instance, optional More...
 
linalg::Tensor< float, 2 > base_margin_
 initialized margins, if specified, xgboost will start from this init margin can be used to specify initial prediction to boost from. More...
 
HostDeviceVector< bst_floatlabels_lower_bound_
 lower bound of the label, to be used for survival analysis (censored regression) More...
 
HostDeviceVector< bst_floatlabels_upper_bound_
 upper bound of the label, to be used for survival analysis (censored regression) More...
 
std::vector< std::string > feature_type_names
 Name of type for each feature provided by users. Eg. "int"/"float"/"i"/"q". More...
 
std::vector< std::string > feature_names
 Name for each feature. More...
 
HostDeviceVector< FeatureTypefeature_types
 
HostDeviceVector< float > feature_weights
 

Static Public Attributes

static constexpr uint64_t kNumField = 12
 number of data fields in MetaInfo More...
 

Detailed Description

Meta information about dataset, always sit in memory.

Constructor & Destructor Documentation

◆ MetaInfo() [1/2]

xgboost::MetaInfo::MetaInfo ( )
default

default constructor

◆ MetaInfo() [2/2]

xgboost::MetaInfo::MetaInfo ( MetaInfo &&  that)
default

Member Function Documentation

◆ Clear()

void xgboost::MetaInfo::Clear ( )

clear all the information

◆ Copy()

MetaInfo xgboost::MetaInfo::Copy ( ) const

◆ Extend()

void xgboost::MetaInfo::Extend ( MetaInfo const &  that,
bool  accumulate_rows,
bool  check_column 
)

Extend with other MetaInfo.

Parameters
thatThe other MetaInfo object.
accumulate_rowsWhether rows need to be accumulated in this function. If client code knows number of rows in advance, set this parameter to false.
check_columnWhether the extend method should check the consistency of columns.

◆ GetFeatureInfo()

void xgboost::MetaInfo::GetFeatureInfo ( const char *  field,
std::vector< std::string > *  out_str_vecs 
) const

◆ GetInfo()

void xgboost::MetaInfo::GetInfo ( char const *  key,
bst_ulong out_len,
DataType  dtype,
const void **  out_dptr 
) const

◆ GetWeight()

bst_float xgboost::MetaInfo::GetWeight ( size_t  i) const
inline

Get weight of each instances.

Parameters
iInstance index.
Returns
The weight.

◆ HasCategorical()

bool xgboost::MetaInfo::HasCategorical ( ) const
inline

Flag for whether the DMatrix has categorical features.

◆ IsColumnSplit()

bool xgboost::MetaInfo::IsColumnSplit ( ) const
inline

Whether the data is split column-wise.

◆ IsRanking()

bool xgboost::MetaInfo::IsRanking ( ) const
inline

Whether this is a learning to rank data.

◆ IsRowSplit()

bool xgboost::MetaInfo::IsRowSplit ( ) const
inline

Whether the data is split row-wise.

◆ IsVerticalFederated()

bool xgboost::MetaInfo::IsVerticalFederated ( ) const

A convenient method to check if we are doing vertical federated learning, which requires some special processing.

◆ LabelAbsSort()

const std::vector<size_t>& xgboost::MetaInfo::LabelAbsSort ( Context const *  ctx) const

get sorted indexes (argsort) of labels by absolute value (used by cox loss)

◆ LoadBinary()

void xgboost::MetaInfo::LoadBinary ( dmlc::Stream *  fi)

Load the Meta info from binary stream.

Parameters
fiThe input stream

◆ operator=() [1/2]

MetaInfo& xgboost::MetaInfo::operator= ( MetaInfo &&  that)
default

◆ operator=() [2/2]

MetaInfo& xgboost::MetaInfo::operator= ( MetaInfo const &  that)
delete

◆ SaveBinary()

void xgboost::MetaInfo::SaveBinary ( dmlc::Stream *  fo) const

Save the Meta info to binary stream.

Parameters
foThe output stream.

◆ SetFeatureInfo()

void xgboost::MetaInfo::SetFeatureInfo ( const char *  key,
const char **  info,
const bst_ulong  size 
)

◆ SetInfo()

void xgboost::MetaInfo::SetInfo ( Context const &  ctx,
StringView  key,
StringView  interface_str 
)

Set information in the meta info with array interface.

Parameters
keyThe key of the information.
interface_strString representation of json format array interface.

◆ ShouldHaveLabels()

bool xgboost::MetaInfo::ShouldHaveLabels ( ) const

A convenient method to check if the MetaInfo should contain labels.

Normally we assume labels are available everywhere. The only exception is in vertical federated learning where labels are only available on worker 0.

◆ Slice()

MetaInfo xgboost::MetaInfo::Slice ( common::Span< int32_t const >  ridxs) const

◆ SynchronizeNumberOfColumns()

void xgboost::MetaInfo::SynchronizeNumberOfColumns ( Context const *  ctx)

Synchronize the number of columns across all workers.

Normally we just need to find the maximum number of columns across all workers, but in vertical federated learning, since each worker loads its own list of columns, we need to sum them.

◆ Validate()

void xgboost::MetaInfo::Validate ( DeviceOrd  device) const

Validate all metainfo.

Member Data Documentation

◆ base_margin_

linalg::Tensor<float, 2> xgboost::MetaInfo::base_margin_

initialized margins, if specified, xgboost will start from this init margin can be used to specify initial prediction to boost from.

◆ data_split_mode

DataSplitMode xgboost::MetaInfo::data_split_mode {DataSplitMode::kRow}

data split mode

◆ feature_names

std::vector<std::string> xgboost::MetaInfo::feature_names

Name for each feature.

◆ feature_type_names

std::vector<std::string> xgboost::MetaInfo::feature_type_names

Name of type for each feature provided by users. Eg. "int"/"float"/"i"/"q".

◆ feature_types

HostDeviceVector<FeatureType> xgboost::MetaInfo::feature_types

◆ feature_weights

HostDeviceVector<float> xgboost::MetaInfo::feature_weights

◆ group_ptr_

std::vector<bst_group_t> xgboost::MetaInfo::group_ptr_

the index of begin and end of a group needed when the learning task is ranking.

◆ kNumField

constexpr uint64_t xgboost::MetaInfo::kNumField = 12
staticconstexpr

number of data fields in MetaInfo

◆ labels

linalg::Tensor<float, 2> xgboost::MetaInfo::labels

label of each instance

◆ labels_lower_bound_

HostDeviceVector<bst_float> xgboost::MetaInfo::labels_lower_bound_

lower bound of the label, to be used for survival analysis (censored regression)

◆ labels_upper_bound_

HostDeviceVector<bst_float> xgboost::MetaInfo::labels_upper_bound_

upper bound of the label, to be used for survival analysis (censored regression)

◆ num_col_

uint64_t xgboost::MetaInfo::num_col_ {0}

number of columns in the data

◆ num_nonzero_

uint64_t xgboost::MetaInfo::num_nonzero_ {0}

number of nonzero entries in the data

◆ num_row_

uint64_t xgboost::MetaInfo::num_row_ {0}

number of rows in the data

◆ weights_

HostDeviceVector<bst_float> xgboost::MetaInfo::weights_

weights of each instance, optional


The documentation for this class was generated from the following file: