L1 regularization term on weights, increase this value will make model more conservative.
L1 regularization term on weights, increase this value will make model more conservative. [default=0]
Booster to use, options: {'gbtree', 'gblinear', 'dart'}
Booster to use, options: {'gbtree', 'gblinear', 'dart'}
subsample ratio of columns for each split, in each level.
subsample ratio of columns for each split, in each level. [default=1] range: (0,1]
subsample ratio of columns when constructing each tree.
subsample ratio of columns when constructing each tree. [default=1] range: (0,1]
step size shrinkage used in update to prevents overfitting.
step size shrinkage used in update to prevents overfitting. After each boosting step, we can directly get the weights of new features and eta actually shrinks the feature weights to make the boosting process more conservative. [default=0.3] range: [0,1]
evaluate XGBoostModel with a RDD-wrapped dataset
evaluate XGBoostModel with a RDD-wrapped dataset
NOTE: you have to specify value of either eval or iter; when you specify both, this method adopts the default eval metric of model
the dataset used for evaluation
the name of evaluation
the customized evaluation function, null by default to use the default metric of model
the current iteration, -1 to be null to use customized evaluation functions
group data specify each group size for ranking task. Top level corresponds to partition id, second level is the group sizes.
the average metric over all partitions
Explains all params of this instance.
Explains all params of this instance. See explainParam()
.
minimum loss reduction required to make a further partition on a leaf node of the tree.
minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. [default=0] range: [0, Double.MaxValue]
growth policy for fast histogram algorithm
growth policy for fast histogram algorithm
L2 regularization term on weights, increase this value will make model more conservative.
L2 regularization term on weights, increase this value will make model more conservative. [default=1]
Parameter of linear booster L2 regularization term on bias, default 0(no L1 reg on bias because it is not important)
Parameter of linear booster L2 regularization term on bias, default 0(no L1 reg on bias because it is not important)
maximum number of bins in histogram
maximum number of bins in histogram
Maximum delta step we allow each tree's weight estimation to be.
Maximum delta step we allow each tree's weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update. [default=0] range: [0, Double.MaxValue]
maximum depth of a tree, increase this value will make model more complex / likely to be overfitting.
maximum depth of a tree, increase this value will make model more complex / likely to be overfitting. [default=6] range: [1, Int.MaxValue]
minimum sum of instance weight(hessian) needed in a child.
minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. [default=1] range: [0, Double.MaxValue]
Parameter of Dart booster.
Parameter of Dart booster. type of normalization algorithm, options: {'tree', 'forest'}. [default="tree"]
Predict result with the given test set (represented as RDD)
Predict result with the given test set (represented as RDD)
test set represented as RDD
whether to use external cache for the test set
whether to output raw untransformed margin value
Predict result with the given test set (represented as RDD)
Predict result with the given test set (represented as RDD)
test set represented as RDD
the specified value to represent the missing value
Predict leaf instances with the given test set (represented as RDD)
Predict leaf instances with the given test set (represented as RDD)
test set represented as RDD
Parameter of Dart booster.
Parameter of Dart booster. dropout rate. [default=0.0] range: [0.0, 1.0]
Parameter for Dart booster.
Parameter for Dart booster. Type of sampling algorithm. "uniform": dropped trees are selected uniformly. "weighted": dropped trees are selected in proportion to weight. [default="uniform"]
Save the model as to HDFS-compatible file system.
Save the model as to HDFS-compatible file system.
The model path as in Hadoop path.
Control the balance of positive and negative weights, useful for unbalanced classes.
Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative cases) / sum(positive cases). [default=1]
This is only used for approximate greedy algorithm.
This is only used for approximate greedy algorithm. This roughly translated into O(1 / sketch_eps) number of bins. Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy. [default=0.03] range: (0, 1)
Parameter of Dart booster.
Parameter of Dart booster. probability of skip dropout. If a dropout is skipped, new trees are added in the same manner as gbtree. [default=0.0] range: [0.0, 1.0]
subsample ratio of the training instance.
subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collected half of the data instances to grow trees and this will prevent overfitting. [default=1] range:(0,1]
Returns summary (e.g.
Returns summary (e.g. train/test objective history) of model on the training set. An exception is thrown if no summary is available.
produces the prediction results and append as an additional column in the original dataset NOTE: the prediction results is kept as the original format of xgboost
produces the prediction results and append as an additional column in the original dataset NOTE: the prediction results is kept as the original format of xgboost
the original dataframe with an additional column containing prediction results
append leaf index of each row as an additional column in the original dataset
append leaf index of each row as an additional column in the original dataset
the original dataframe with an additional column containing prediction results
The tree construction algorithm used in XGBoost.
The tree construction algorithm used in XGBoost. options: {'auto', 'exact', 'approx'} [default='auto']
the base class of XGBoostClassificationModel and XGBoostRegressionModel