xgboost
Public Member Functions | Static Public Member Functions | Static Public Attributes | Protected Member Functions | List of all members
xgboost::DMatrix Class Referenceabstract

Internal data structured used by XGBoost during training. There are two ways to create a customized DMatrix that reads in user defined-format. More...

#include <data.h>

Collaboration diagram for xgboost::DMatrix:
Collaboration graph

Public Member Functions

 DMatrix ()=default
 default constructor More...
 
virtual MetaInfoInfo ()=0
 meta information of the dataset More...
 
virtual const MetaInfoInfo () const =0
 meta information of the dataset More...
 
template<typename T >
BatchSet< T > GetBatches (const BatchParam &param={})
 Gets batches. Use range based for loop over BatchSet to access individual batches. More...
 
virtual bool SingleColBlock () const =0
 
virtual float GetColDensity (size_t cidx)=0
 get column density More...
 
virtual ~DMatrix ()=default
 virtual destructor More...
 
virtual void SaveToLocalFile (const std::string &fname)
 Save DMatrix to local file. The saved file only works for non-sharded dataset(single machine training). This API is deprecated and dis-encouraged to use. More...
 
bool IsDense () const
 Whether the matrix is dense. More...
 
template<>
BatchSet< SparsePageGetBatches (const BatchParam &)
 
template<>
BatchSet< CSCPageGetBatches (const BatchParam &)
 
template<>
BatchSet< SortedCSCPageGetBatches (const BatchParam &)
 
template<>
BatchSet< EllpackPageGetBatches (const BatchParam &param)
 

Static Public Member Functions

static DMatrixLoad (const std::string &uri, bool silent, bool load_row_split, const std::string &file_format="auto", size_t page_size=kPageSize)
 Load DMatrix from URI. More...
 
static DMatrixCreate (std::unique_ptr< DataSource< SparsePage >> &&source, const std::string &cache_prefix="")
 create a new DMatrix, by wrapping a row_iterator, and meta info. More...
 
template<typename AdapterT >
static DMatrixCreate (AdapterT *adapter, float missing, int nthread, const std::string &cache_prefix="", size_t page_size=kPageSize)
 Creates a new DMatrix from an external data adapter. More...
 

Static Public Attributes

static const size_t kPageSize = 32UL << 20UL
 page size 32 MB More...
 

Protected Member Functions

virtual BatchSet< SparsePageGetRowBatches ()=0
 
virtual BatchSet< CSCPageGetColumnBatches ()=0
 
virtual BatchSet< SortedCSCPageGetSortedColumnBatches ()=0
 
virtual BatchSet< EllpackPageGetEllpackBatches (const BatchParam &param)=0
 

Detailed Description

Internal data structured used by XGBoost during training. There are two ways to create a customized DMatrix that reads in user defined-format.

Constructor & Destructor Documentation

◆ DMatrix()

xgboost::DMatrix::DMatrix ( )
default

default constructor

◆ ~DMatrix()

virtual xgboost::DMatrix::~DMatrix ( )
virtualdefault

virtual destructor

Member Function Documentation

◆ Create() [1/2]

static DMatrix* xgboost::DMatrix::Create ( std::unique_ptr< DataSource< SparsePage >> &&  source,
const std::string &  cache_prefix = "" 
)
static

create a new DMatrix, by wrapping a row_iterator, and meta info.

Parameters
sourceThe source iterator of the data, the create function takes ownership of the source.
cache_prefixThe path to prefix of temporary cache file of the DMatrix when used in external memory mode. This can be nullptr for common cases, and in-memory mode will be used.
Returns
a Created DMatrix.

◆ Create() [2/2]

template<typename AdapterT >
static DMatrix* xgboost::DMatrix::Create ( AdapterT *  adapter,
float  missing,
int  nthread,
const std::string &  cache_prefix = "",
size_t  page_size = kPageSize 
)
static

Creates a new DMatrix from an external data adapter.

Template Parameters
AdapterTType of the adapter.
Parameters
[in,out]adapterView onto an external data.
missingValues to count as missing.
nthreadNumber of threads for construction.
cache_prefix(Optional) The cache prefix for external memory.
page_size(Optional) Size of the page.
Returns
a Created DMatrix.

◆ GetBatches() [1/5]

template<typename T >
BatchSet<T> xgboost::DMatrix::GetBatches ( const BatchParam param = {})

Gets batches. Use range based for loop over BatchSet to access individual batches.

◆ GetBatches() [2/5]

template<>
BatchSet<SparsePage> xgboost::DMatrix::GetBatches ( const BatchParam )
inline

◆ GetBatches() [3/5]

template<>
BatchSet<CSCPage> xgboost::DMatrix::GetBatches ( const BatchParam )
inline

◆ GetBatches() [4/5]

template<>
BatchSet<SortedCSCPage> xgboost::DMatrix::GetBatches ( const BatchParam )
inline

◆ GetBatches() [5/5]

template<>
BatchSet<EllpackPage> xgboost::DMatrix::GetBatches ( const BatchParam param)
inline

◆ GetColDensity()

virtual float xgboost::DMatrix::GetColDensity ( size_t  cidx)
pure virtual

get column density

◆ GetColumnBatches()

virtual BatchSet<CSCPage> xgboost::DMatrix::GetColumnBatches ( )
protectedpure virtual

◆ GetEllpackBatches()

virtual BatchSet<EllpackPage> xgboost::DMatrix::GetEllpackBatches ( const BatchParam param)
protectedpure virtual

◆ GetRowBatches()

virtual BatchSet<SparsePage> xgboost::DMatrix::GetRowBatches ( )
protectedpure virtual

◆ GetSortedColumnBatches()

virtual BatchSet<SortedCSCPage> xgboost::DMatrix::GetSortedColumnBatches ( )
protectedpure virtual

◆ Info() [1/2]

virtual MetaInfo& xgboost::DMatrix::Info ( )
pure virtual

meta information of the dataset

◆ Info() [2/2]

virtual const MetaInfo& xgboost::DMatrix::Info ( ) const
pure virtual

meta information of the dataset

◆ IsDense()

bool xgboost::DMatrix::IsDense ( ) const
inline

Whether the matrix is dense.

◆ Load()

static DMatrix* xgboost::DMatrix::Load ( const std::string &  uri,
bool  silent,
bool  load_row_split,
const std::string &  file_format = "auto",
size_t  page_size = kPageSize 
)
static

Load DMatrix from URI.

Parameters
uriThe URI of input.
silentWhether print information during loading.
load_row_splitFlag to read in part of rows, divided among the workers in distributed mode.
file_formatThe format type of the file, used for dmlc::Parser::Create. By default "auto" will be able to load in both local binary file.
page_sizePage size for external memory.
Returns
The created DMatrix.

◆ SaveToLocalFile()

virtual void xgboost::DMatrix::SaveToLocalFile ( const std::string &  fname)
virtual

Save DMatrix to local file. The saved file only works for non-sharded dataset(single machine training). This API is deprecated and dis-encouraged to use.

Parameters
fnameThe file name to be saved.
Returns
The created DMatrix.

◆ SingleColBlock()

virtual bool xgboost::DMatrix::SingleColBlock ( ) const
pure virtual
Returns
Whether the data columns single column block.

Member Data Documentation

◆ kPageSize

const size_t xgboost::DMatrix::kPageSize = 32UL << 20UL
static

page size 32 MB


The documentation for this class was generated from the following file: