just DO IT ←

提问的智慧

艾瑞克.史蒂文.雷蒙德(Eric Steven Raymond)

Thyrsus Enterprises

esr@thyrsus.com

瑞克.莫恩(Rick Moen)

Read On


杭州

image image

Read On


NLP可视化: 用Python生成词云

Read On


Regularization on GBDT

之前一篇文章简单地讲了XGBoost的实现与普通GBDT实现的不同之处,本文尝试总结一下GBDT运用的正则化技巧。

Early Stopping

Early Stopping是机器学习迭代式训练模型中很常见的防止过拟合技巧,维基百科里如下描述:

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent.

具体的做法是选择一部分样本作为验证集,在迭代拟合训练集的过程中,如果模型在验证集里错误率不再下降,就停止训练,也就是说控制迭代的轮数(树的个数)。

XGBoost Python关于early stopping的参数设置文档非常清晰,API如下:

# code snippets from xgboost python-package training.py
def	train(..., evals=(), early_stopping_rounds=None)
	"""Train a booster with given parameters.
	Parameters
    ----------
	early_stopping_rounds: int
        Activates early stopping. Validation error needs to decrease at least
        every <early_stopping_rounds> round(s) to continue training.
	"""

Read On