用XGBoost调XGBoost?我调我自己?
生活随笔
收集整理的这篇文章主要介绍了
用XGBoost调XGBoost?我调我自己?
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
上篇《深恶痛绝的超参》已经介绍了很多实用的调参方式,今天来看一篇更有趣的跳槽方法,用ML的方式调ML的模型我们用我们熟悉的模型去调我们熟悉的模型,看到这里很晕是不是,接下来我们就看看XGBoost如何调XGBoost。
Model-based HP Tuning
基于模型的调参其实想法很简单,我们需要有个方式指导超参优化,从而达到最好的效果。现在训练集很大,训练模型相当耗时,各种配置的组合往往又非常大,所以为什么不直接学一个estimator去给当前配置打分呢?每次训练都可以为我们探索方向给予启发。
基于模型优化超参可以概括为以下流程:
- 随机选n种配置
- 用estimator评估这些配置
- 从这些配置中挑出评分最高的
- 用评分最高的配置训练模型
- 把该配置和模型最终效果保存到estimator的训练数据中
- 重新训练estimator
- 返回最开始的一步,如果没达到停止条件
参数空间采样
怎么在参数空间采样呢?已经有现成的lib可以用了:
>>> import ConfigSpace as CS >>> import ConfigSpace.hyperparameters as CSH >>> cs = CS.ConfigurationSpace(seed=1234) >>> a = CSH.UniformIntegerHyperparameter('a', lower=10, upper=100, log=False) >>> b = CSH.CategoricalHyperparameter('b', choices=['red', 'green', 'blue']) >>> cs.add_hyperparameters([a, b]) [a, Type: UniformInteger, Range: [10, 100], Default: 55,...] >>> cs.sample_configuration() Configuration:a, Value: 27b, Value: 'blue'"我"调"我"自己
最早都是用高斯过程最为estimator来进行调参的,但是最近的研究显示树模型也很适合做estimator,而且高斯过程也不支持类目特征,所以用XGBoost做estimator当然是最合适的。
接下来就是构建超参优化器了:
import pandas as pd import numpy as np class Optimizer:""" This class optimise an algorithm/model configuration with respect to a given score. """def __init__(self,algo_score,max_iter,max_intensification,model,cs):""" :param algo_score: is the function called to evaluate algorithm / model score :param max_iter: the maximal number of training to perform :param max_intensification: the maximal number of candidates configuration to sample randomly :param model: the class of the internal model used as score estimator. :param cs: the configuration space to explore """self.traj = []self.algo_score = algo_score # 打分模型self.max_iter = max_iter # 迭代次数,停止条件可以按需求更改self.max_intensification = max_intensification # 候选参数组合随机的个数self.internal_model = model() # 评估参数模型self.trajectory = [] # 记录每次优化后的参数组合self.cfgs = []self.scores = {}self.best_cfg = Noneself.best_score = Noneself.cs = csdef cfg_to_dtf(self, cfgs):""" Convert configs list into pandas DataFrame to ease learning """cfgs = [dict(cfg) for cfg in cfgs]dtf = pd.DataFrame(cfgs)return dtfdef optimize(self):""" Optimize algo/model using internal score estimator """cfg = self.cs.sample_configuration()self.cfgs.append(cfg)self.trajectory.append(cfg)# initial runscore = self.algo_score(cfg)self.scores[cfg] = scoreself.best_cfg = cfgself.best_score = scoredtf = self.cfg_to_dtf(self.cfgs)for i in range(0, self.max_iter):# We need at least two datapoints for training# 至少2个数据才能训练调参模型if dtf.shape[0] > 1:scores = np.array([ val for key, val in self.scores.items()])self.internal_model.fit(dtf, scores)# intensificationcandidates = [self.cs.sample_configuration() for i in range(0, self.max_intensification)]candidate_scores = [self.internal_model.predict(self.cfg_to_dtf([cfg])) for cfg in candidates]best_candidates = np.argmax(candidate_scores)cfg = candidates[best_candidates]self.cfgs.append(cfg)score = self.algo_score(cfg)self.scores[cfg] = scoreif score > self.best_score:self.best_cfg = cfgself.best_score = scoreself.trajectory.append(cfg)dtf = self.cfg_to_dtf(self.cfgs)self.internal_model.fit(dtf,np.array([val for kay, val in self.scores.items()]))else:cfg = self.cs.sample_configuration()self.cfgs.append(cfg)score = self.algo_score(cfg)self.scores[cfg] = scoreif score > self.best_score:self.best_cfg = cfgself.best_score = scoreself.trajectory.append(cfg)dtf = self.cfg_to_dtf(self.cfgs)把algo_score换成需要调参数的XGB,并把internal_model替换成用于调参的XGB,就可以自动搜寻参数啦,还等什么,快去尝试下吧!
参考文献:
用XGB调XGB?"我"调"我"自己?总结
以上是生活随笔为你收集整理的用XGBoost调XGBoost?我调我自己?的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: AutoDim:自动Embedding维
- 下一篇: 兜兜转转一个圈,到底What is al