Class/Object

ml.dmlc.xgboost4j.scala.spark

XGBoostEstimator

Related Docs: object XGBoostEstimator | package spark

Permalink

class XGBoostEstimator extends Predictor[Vector, XGBoostEstimator, XGBoostModel] with LearningTaskParams with GeneralParams with BoosterParams with MLWritable

XGBoost Estimator to produce a XGBoost model

Linear Supertypes
MLWritable, BoosterParams, GeneralParams, LearningTaskParams, Predictor[Vector, XGBoostEstimator, XGBoostModel], PredictorParams, HasPredictionCol, HasFeaturesCol, HasLabelCol, Estimator[XGBoostModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. XGBoostEstimator
  2. MLWritable
  3. BoosterParams
  4. GeneralParams
  5. LearningTaskParams
  6. Predictor
  7. PredictorParams
  8. HasPredictionCol
  9. HasFeaturesCol
  10. HasLabelCol
  11. Estimator
  12. PipelineStage
  13. Logging
  14. Params
  15. Serializable
  16. Serializable
  17. Identifiable
  18. AnyRef
  19. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new XGBoostEstimator(uid: String)

    Permalink
  2. new XGBoostEstimator(xgboostParams: Map[String, Any])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. val alpha: DoubleParam

    Permalink

    L1 regularization term on weights, increase this value will make model more conservative.

    L1 regularization term on weights, increase this value will make model more conservative. [default=0]

    Definition Classes
    BoosterParams
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. val baseMarginCol: Param[String]

    Permalink

    Initial prediction (aka base margin) column name.

    Initial prediction (aka base margin) column name.

    Definition Classes
    LearningTaskParams
  8. val baseScore: DoubleParam

    Permalink

    the initial prediction score of all instances, global bias.

    the initial prediction score of all instances, global bias. default=0.5

    Definition Classes
    LearningTaskParams
  9. val boosterType: Param[String]

    Permalink

    Booster to use, options: {'gbtree', 'gblinear', 'dart'}

    Booster to use, options: {'gbtree', 'gblinear', 'dart'}

    Definition Classes
    BoosterParams
  10. val checkpointInterval: IntParam

    Permalink

    Param for set checkpoint interval (>= 1) or disable checkpoint (-1).

    Param for set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the trained model will get checkpointed every 10 iterations. Note: checkpoint_path must also be set if the checkpoint interval is greater than 0.

    Definition Classes
    GeneralParams
  11. val checkpointPath: Param[String]

    Permalink

    The hdfs folder to load and save checkpoint boosters.

    The hdfs folder to load and save checkpoint boosters. default: empty_string

    Definition Classes
    GeneralParams
  12. final def clear(param: Param[_]): XGBoostEstimator.this.type

    Permalink
    Definition Classes
    Params
  13. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  14. val colSampleByLevel: DoubleParam

    Permalink

    subsample ratio of columns for each split, in each level.

    subsample ratio of columns for each split, in each level. [default=1] range: (0,1]

    Definition Classes
    BoosterParams
  15. val colSampleByTree: DoubleParam

    Permalink

    subsample ratio of columns when constructing each tree.

    subsample ratio of columns when constructing each tree. [default=1] range: (0,1]

    Definition Classes
    BoosterParams
  16. def copy(extra: ParamMap): XGBoostEstimator

    Permalink
    Definition Classes
    XGBoostEstimator → Predictor → Estimator → PipelineStage → Params
  17. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  18. val customEval: CustomEvalParam

    Permalink

    customized evaluation function provided by user.

    customized evaluation function provided by user. default: null

    Definition Classes
    GeneralParams
  19. val customObj: CustomObjParam

    Permalink

    customized objective function provided by user.

    customized objective function provided by user. default: null

    Definition Classes
    GeneralParams
  20. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  21. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  22. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  23. val eta: DoubleParam

    Permalink

    step size shrinkage used in update to prevents overfitting.

    step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features and eta actually shrinks the feature weights to make the boosting process more conservative. [default=0.3] range: [0,1]

    Definition Classes
    BoosterParams
  24. val evalMetric: Param[String]

    Permalink

    evaluation metrics for validation data, a default metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking).

    evaluation metrics for validation data, a default metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking). options: rmse, mae, logloss, error, merror, mlogloss, auc, aucpr, ndcg, map, gamma-deviance

    Definition Classes
    LearningTaskParams
  25. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  26. def explainParams(): String

    Permalink

    Explains all params of this instance.

    Explains all params of this instance. See explainParam().

    Definition Classes
    BoosterParams → Params
  27. def extractLabeledPoints(dataset: Dataset[_]): RDD[org.apache.spark.ml.feature.LabeledPoint]

    Permalink
    Attributes
    protected
    Definition Classes
    Predictor
  28. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  29. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  30. final val featuresCol: Param[String]

    Permalink
    Definition Classes
    HasFeaturesCol
  31. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  32. def fit(dataset: Dataset[_]): XGBoostModel

    Permalink
    Definition Classes
    Predictor → Estimator
  33. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[XGBoostModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  34. def fit(dataset: Dataset[_], paramMap: ParamMap): XGBoostModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  35. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): XGBoostModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  36. val gamma: DoubleParam

    Permalink

    minimum loss reduction required to make a further partition on a leaf node of the tree.

    minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. [default=0] range: [0, Double.MaxValue]

    Definition Classes
    BoosterParams
  37. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  38. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  39. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  40. final def getFeaturesCol: String

    Permalink
    Definition Classes
    HasFeaturesCol
  41. final def getLabelCol: String

    Permalink
    Definition Classes
    HasLabelCol
  42. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  43. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  44. final def getPredictionCol: String

    Permalink
    Definition Classes
    HasPredictionCol
  45. val groupData: GroupDataParam

    Permalink

    group data specify each group sizes for ranking task.

    group data specify each group sizes for ranking task. To correspond to partition of training data, it is nested.

    Definition Classes
    LearningTaskParams
  46. val growthPolicty: Param[String]

    Permalink

    growth policy for fast histogram algorithm

    growth policy for fast histogram algorithm

    Definition Classes
    BoosterParams
  47. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  48. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  49. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  50. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  51. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  53. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  54. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  55. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. final val labelCol: Param[String]

    Permalink
    Definition Classes
    HasLabelCol
  57. val lambda: DoubleParam

    Permalink

    L2 regularization term on weights, increase this value will make model more conservative.

    L2 regularization term on weights, increase this value will make model more conservative. [default=1]

    Definition Classes
    BoosterParams
  58. val lambdaBias: DoubleParam

    Permalink

    Parameter of linear booster L2 regularization term on bias, default 0(no L1 reg on bias because it is not important)

    Parameter of linear booster L2 regularization term on bias, default 0(no L1 reg on bias because it is not important)

    Definition Classes
    BoosterParams
  59. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  64. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  65. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  66. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  67. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  68. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  69. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  70. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  71. val maxBins: IntParam

    Permalink

    maximum number of bins in histogram

    maximum number of bins in histogram

    Definition Classes
    BoosterParams
  72. val maxDeltaStep: DoubleParam

    Permalink

    Maximum delta step we allow each tree's weight estimation to be.

    Maximum delta step we allow each tree's weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update. [default=0] range: [0, Double.MaxValue]

    Definition Classes
    BoosterParams
  73. val maxDepth: IntParam

    Permalink

    maximum depth of a tree, increase this value will make model more complex / likely to be overfitting.

    maximum depth of a tree, increase this value will make model more complex / likely to be overfitting. [default=6] range: [1, Int.MaxValue]

    Definition Classes
    BoosterParams
  74. val minChildWeight: DoubleParam

    Permalink

    minimum sum of instance weight(hessian) needed in a child.

    minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. [default=1] range: [0, Double.MaxValue]

    Definition Classes
    BoosterParams
  75. val missing: FloatParam

    Permalink

    the value treated as missing.

    the value treated as missing. default: Float.NaN

    Definition Classes
    GeneralParams
  76. val nWorkers: IntParam

    Permalink

    number of workers used to train xgboost model.

    number of workers used to train xgboost model. default: 1

    Definition Classes
    GeneralParams
  77. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  78. val normalizeType: Param[String]

    Permalink

    Parameter of Dart booster.

    Parameter of Dart booster. type of normalization algorithm, options: {'tree', 'forest'}. [default="tree"]

    Definition Classes
    BoosterParams
  79. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  80. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  81. val numClasses: IntParam

    Permalink

    number of tasks to learn

    number of tasks to learn

    Definition Classes
    LearningTaskParams
  82. val numEarlyStoppingRounds: IntParam

    Permalink

    If non-zero, the training will be stopped after a specified number of consecutive increases in any evaluation metric.

    If non-zero, the training will be stopped after a specified number of consecutive increases in any evaluation metric.

    Definition Classes
    LearningTaskParams
  83. val numThreadPerTask: IntParam

    Permalink

    number of threads used by per worker.

    number of threads used by per worker. default 1

    Definition Classes
    GeneralParams
  84. val objective: Param[String]

    Permalink

    Specify the learning task and the corresponding learning objective.

    Specify the learning task and the corresponding learning objective. options: reg:linear, reg:logistic, binary:logistic, binary:logitraw, count:poisson, multi:softmax, multi:softprob, rank:pairwise, reg:gamma. default: reg:linear

    Definition Classes
    LearningTaskParams
  85. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  86. final val predictionCol: Param[String]

    Permalink
    Definition Classes
    HasPredictionCol
  87. val rateDrop: DoubleParam

    Permalink

    Parameter of Dart booster.

    Parameter of Dart booster. dropout rate. [default=0.0] range: [0.0, 1.0]

    Definition Classes
    BoosterParams
  88. val round: IntParam

    Permalink

    The number of rounds for boosting

    The number of rounds for boosting

    Definition Classes
    GeneralParams
  89. val sampleType: Param[String]

    Permalink

    Parameter for Dart booster.

    Parameter for Dart booster. Type of sampling algorithm. "uniform": dropped trees are selected uniformly. "weighted": dropped trees are selected in proportion to weight. [default="uniform"]

    Definition Classes
    BoosterParams
  90. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  91. val scalePosWeight: DoubleParam

    Permalink

    Control the balance of positive and negative weights, useful for unbalanced classes.

    Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative cases) / sum(positive cases). [default=1]

    Definition Classes
    BoosterParams
  92. val seed: LongParam

    Permalink

    Random seed for the C++ part of XGBoost and train/test splitting.

    Random seed for the C++ part of XGBoost and train/test splitting.

    Definition Classes
    GeneralParams
  93. final def set(paramPair: ParamPair[_]): XGBoostEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  94. final def set(param: String, value: Any): XGBoostEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  95. final def set[T](param: Param[T], value: T): XGBoostEstimator.this.type

    Permalink
    Definition Classes
    Params
  96. final def setDefault(paramPairs: ParamPair[_]*): XGBoostEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  97. final def setDefault[T](param: Param[T], value: T): XGBoostEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  98. def setFeaturesCol(value: String): XGBoostEstimator

    Permalink
    Definition Classes
    Predictor
  99. def setLabelCol(value: String): XGBoostEstimator

    Permalink
    Definition Classes
    Predictor
  100. def setPredictionCol(value: String): XGBoostEstimator

    Permalink
    Definition Classes
    Predictor
  101. val silent: IntParam

    Permalink

    0 means printing running messages, 1 means silent mode.

    0 means printing running messages, 1 means silent mode. default: 0

    Definition Classes
    GeneralParams
  102. val sketchEps: DoubleParam

    Permalink

    This is only used for approximate greedy algorithm.

    This is only used for approximate greedy algorithm. This roughly translated into O(1 / sketch_eps) number of bins. Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy. [default=0.03] range: (0, 1)

    Definition Classes
    BoosterParams
  103. val skipDrop: DoubleParam

    Permalink

    Parameter of Dart booster.

    Parameter of Dart booster. probability of skip dropout. If a dropout is skipped, new trees are added in the same manner as gbtree. [default=0.0] range: [0.0, 1.0]

    Definition Classes
    BoosterParams
  104. val subSample: DoubleParam

    Permalink

    subsample ratio of the training instance.

    subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collected half of the data instances to grow trees and this will prevent overfitting. [default=1] range:(0,1]

    Definition Classes
    BoosterParams
  105. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  106. val timeoutRequestWorkers: LongParam

    Permalink

    the maximum time to wait for the job requesting new workers.

    the maximum time to wait for the job requesting new workers. default: 30 minutes

    Definition Classes
    GeneralParams
  107. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  108. val trackerConf: TrackerConfParam

    Permalink

    Rabit tracker configurations.

    Rabit tracker configurations. The parameter must be provided as an instance of the TrackerConf class, which has the following definition:

    case class TrackerConf(workerConnectionTimeout: Duration, trainingTimeout: Duration, trackerImpl: String)

    See below for detailed explanations.

    • trackerImpl: Select the implementation of Rabit tracker. default: "python"

    Choice between "python" or "scala". The former utilizes the Java wrapper of the Python Rabit tracker (in dmlc_core), and does not support timeout settings. The "scala" version removes Python components, and fully supports timeout settings.

    • workerConnectionTimeout: the maximum wait time for all workers to connect to the tracker. default: 0 millisecond (no timeout)

    The timeout value should take the time of data loading and pre-processing into account, due to the lazy execution of Spark's operations. Alternatively, you may force Spark to perform data transformation before calling XGBoost.train(), so that this timeout truly reflects the connection delay. Set a reasonable timeout value to prevent model training/testing from hanging indefinitely, possible due to network issues. Note that zero timeout value means to wait indefinitely (equivalent to Duration.Inf). Ignored if the tracker implementation is "python".

    Definition Classes
    GeneralParams
  109. def train(trainingSet: Dataset[_]): XGBoostModel

    Permalink

    produce a XGBoostModel by fitting the given dataset

    produce a XGBoostModel by fitting the given dataset

    Definition Classes
    XGBoostEstimator → Predictor
  110. val trainTestRatio: DoubleParam

    Permalink

    Fraction of training points to use for testing.

    Fraction of training points to use for testing.

    Definition Classes
    LearningTaskParams
  111. def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    Predictor → PipelineStage
  112. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  113. val treeMethod: Param[String]

    Permalink

    The tree construction algorithm used in XGBoost.

    The tree construction algorithm used in XGBoost. options: {'auto', 'exact', 'approx'} [default='auto']

    Definition Classes
    BoosterParams
  114. val uid: String

    Permalink
    Definition Classes
    XGBoostEstimator → Identifiable
  115. val useExternalMemory: BooleanParam

    Permalink

    whether to use external memory as cache.

    whether to use external memory as cache. default: false

    Definition Classes
    GeneralParams
  116. def validateAndTransformSchema(schema: StructType, fitting: Boolean, featuresDataType: DataType): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PredictorParams
  117. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  118. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  119. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  120. val weightCol: Param[String]

    Permalink

    Instance weights column name.

    Instance weights column name.

    Definition Classes
    LearningTaskParams
  121. def write: MLWriter

    Permalink
    Definition Classes
    XGBoostEstimator → MLWritable

Inherited from MLWritable

Inherited from BoosterParams

Inherited from GeneralParams

Inherited from LearningTaskParams

Inherited from Predictor[Vector, XGBoostEstimator, XGBoostModel]

Inherited from PredictorParams

Inherited from HasPredictionCol

Inherited from HasFeaturesCol

Inherited from HasLabelCol

Inherited from Estimator[XGBoostModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped