xgboost
Public Member Functions | Static Public Member Functions | Static Public Attributes | List of all members
xgboost::DMatrix Class Referenceabstract

Internal data structured used by XGBoost during training. There are two ways to create a customized DMatrix that reads in user defined-format. More...

#include <data.h>

Collaboration diagram for xgboost::DMatrix:
Collaboration graph

Public Member Functions

 DMatrix ()=default
 default constructor More...
 
virtual MetaInfoInfo ()=0
 meta information of the dataset More...
 
virtual const MetaInfoInfo () const =0
 meta information of the dataset More...
 
virtual BatchSet GetRowBatches ()=0
 Gets row batches. Use range based for loop over BatchSet to access individual batches. More...
 
virtual BatchSet GetSortedColumnBatches ()=0
 
virtual BatchSet GetColumnBatches ()=0
 
virtual bool SingleColBlock () const =0
 
virtual float GetColDensity (size_t cidx)=0
 get column density More...
 
virtual ~DMatrix ()=default
 virtual destructor More...
 
virtual void SaveToLocalFile (const std::string &fname)
 Save DMatrix to local file. The saved file only works for non-sharded dataset(single machine training). This API is deprecated and dis-encouraged to use. More...
 

Static Public Member Functions

static DMatrixLoad (const std::string &uri, bool silent, bool load_row_split, const std::string &file_format="auto", const size_t page_size=kPageSize)
 Load DMatrix from URI. More...
 
static DMatrixCreate (std::unique_ptr< DataSource > &&source, const std::string &cache_prefix="")
 create a new DMatrix, by wrapping a row_iterator, and meta info. More...
 
static DMatrixCreate (dmlc::Parser< uint32_t > *parser, const std::string &cache_prefix="", const size_t page_size=kPageSize)
 Create a DMatrix by loading data from parser. Parser can later be deleted after the DMatrix i created. More...
 

Static Public Attributes

static const size_t kPageSize = 32UL << 20UL
 page size 32 MB More...
 

Detailed Description

Internal data structured used by XGBoost during training. There are two ways to create a customized DMatrix that reads in user defined-format.

Constructor & Destructor Documentation

◆ DMatrix()

xgboost::DMatrix::DMatrix ( )
default

default constructor

◆ ~DMatrix()

virtual xgboost::DMatrix::~DMatrix ( )
virtualdefault

virtual destructor

Member Function Documentation

◆ Create() [1/2]

static DMatrix* xgboost::DMatrix::Create ( std::unique_ptr< DataSource > &&  source,
const std::string &  cache_prefix = "" 
)
static

create a new DMatrix, by wrapping a row_iterator, and meta info.

Parameters
sourceThe source iterator of the data, the create function takes ownership of the source.
cache_prefixThe path to prefix of temporary cache file of the DMatrix when used in external memory mode. This can be nullptr for common cases, and in-memory mode will be used.
Returns
a Created DMatrix.

◆ Create() [2/2]

static DMatrix* xgboost::DMatrix::Create ( dmlc::Parser< uint32_t > *  parser,
const std::string &  cache_prefix = "",
const size_t  page_size = kPageSize 
)
static

Create a DMatrix by loading data from parser. Parser can later be deleted after the DMatrix i created.

Parameters
parserThe input data parser
cache_prefixThe path to prefix of temporary cache file of the DMatrix when used in external memory mode. This can be nullptr for common cases, and in-memory mode will be used.
page_sizePage size for external memory.
See also
dmlc::Parser
Note
dmlc-core provides efficient distributed data parser for libsvm format. User can create and register customized parser to load their own format using DMLC_REGISTER_DATA_PARSER. See "dmlc-core/include/dmlc/data.h" for detail.
Returns
A created DMatrix.

◆ GetColDensity()

virtual float xgboost::DMatrix::GetColDensity ( size_t  cidx)
pure virtual

get column density

◆ GetColumnBatches()

virtual BatchSet xgboost::DMatrix::GetColumnBatches ( )
pure virtual

◆ GetRowBatches()

virtual BatchSet xgboost::DMatrix::GetRowBatches ( )
pure virtual

Gets row batches. Use range based for loop over BatchSet to access individual batches.

◆ GetSortedColumnBatches()

virtual BatchSet xgboost::DMatrix::GetSortedColumnBatches ( )
pure virtual

◆ Info() [1/2]

virtual MetaInfo& xgboost::DMatrix::Info ( )
pure virtual

meta information of the dataset

◆ Info() [2/2]

virtual const MetaInfo& xgboost::DMatrix::Info ( ) const
pure virtual

meta information of the dataset

◆ Load()

static DMatrix* xgboost::DMatrix::Load ( const std::string &  uri,
bool  silent,
bool  load_row_split,
const std::string &  file_format = "auto",
const size_t  page_size = kPageSize 
)
static

Load DMatrix from URI.

Parameters
uriThe URI of input.
silentWhether print information during loading.
load_row_splitFlag to read in part of rows, divided among the workers in distributed mode.
file_formatThe format type of the file, used for dmlc::Parser::Create. By default "auto" will be able to load in both local binary file.
page_sizePage size for external memory.
Returns
The created DMatrix.

◆ SaveToLocalFile()

virtual void xgboost::DMatrix::SaveToLocalFile ( const std::string &  fname)
virtual

Save DMatrix to local file. The saved file only works for non-sharded dataset(single machine training). This API is deprecated and dis-encouraged to use.

Parameters
fnameThe file name to be saved.
Returns
The created DMatrix.

◆ SingleColBlock()

virtual bool xgboost::DMatrix::SingleColBlock ( ) const
pure virtual
Returns
Whether the data columns single column block.

Member Data Documentation

◆ kPageSize

const size_t xgboost::DMatrix::kPageSize = 32UL << 20UL
static

page size 32 MB


The documentation for this class was generated from the following file: