欢迎访问 生活随笔!

生活随笔

当前位置: 首页 >

机器学习偏差方差_机器学习101 —偏差方差难题

发布时间:2023/12/15 36 豆豆
生活随笔 收集整理的这篇文章主要介绍了 机器学习偏差方差_机器学习101 —偏差方差难题 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

机器学习偏差方差

Determining the performance of our model is one of the most crucial steps in the machine learning process. Understanding the bias-variance trade-off is a significant step towards interpreting the results of our model. Despite its trivial nature, the concepts behind this trade-off are simple to grasp and will allow us to create better and more useful models.

确定模型的性能是机器学习过程中最关键的步骤之一。 理解偏差方差的权衡是朝解释模型结果迈出的重要一步。 尽管具有微不足道的性质,但这种权衡取舍的概念仍然易于掌握,将使我们能够创建更好,更有用的模型。

The generalization error of any machine learning model can be defined as the sum of three different errors—

任何机器学习模型的泛化误差都可以定义为三个不同误差的总和-

  • Irreducible Error: As the name suggests, it can’t be reduced regardless of the algorithm we choose. It is introduced into our model because of the way we frame our problem and may be caused by unknown variables that affect the prediction of our target variable.

    不可减少的错误:顾名思义,无论我们选择哪种算法,都无法减少错误 。 由于我们将问题框架化的方式而被引入到我们的模型中,并且可能是由影响目标变量预测的未知变量引起的。

  • Bias Error: It occurs when our model makes the wrong assumptions

    偏差错误:当我们的模型做出错误的假设时会发生

  • Variance Error: It is caused by sensitivity to small variations in the training set

    方差错误:这是由于对训练集中的小变化敏感

  • When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to “bias” and error due to “variance”. There is a tradeoff between a model’s ability to minimize bias and variance. Understanding these two types of error can help us diagnose model results and avoid the mistake of over- or under-fitting. ~ Scott Fortman-Roe

    当我们讨论预测模型时,预测误差可以分解为我们关注的两个主要子组件:“偏差”引起的误差和“方差”引起的误差。 在模型最小化偏差和方差的能力之间需要权衡。 了解这两种错误类型可以帮助我们诊断模型结果,并避免过拟合或欠拟合的错误。 〜斯科特·福特曼·罗

    In this blog post, we’re going to focus on the bias error, variance error and the bias-variance trade-off.

    在此博客文章中,我们将重点介绍偏差误差,方差误差和偏差方差的权衡。

    偏差误差 (Bias Error)

    Bias is the amount by which the expected prediction of our model differs from the actual target value, i.e. how far our predictions are from the real values. Essentially, the bias of our model is determined by the assumptions it makes to predict our target value. Simply stated, a high bias means that the underlying patterns are not captured by our learning algorithm. Such models subsequently produce a large error on both the training and test sets.

    偏差是模型的预期预测与实际目标值相差的量,即我们的预测与实际值相差多远。 本质上,我们模型的偏差是由其预测目标值的假设所决定的。 简而言之,高偏差意味着我们的学习算法无法捕获基本模式。 这样的模型随后会在训练集和测试集上产生很大的误差。

    • Decision Trees, k-Nearest Neighbors and Support Vector Machines are low bias machine learning algorithms

      决策树,k最近邻和支持向量机是低偏差机器学习算法

    • Linear Regression and Logistic Regression are high bias machine learning algorithms

      线性回归和逻辑回归是高偏差机器学习算法

    方差误差 (Variance Error)

    It is defined as the amount by which the prediction of our model would changes if we use a different training set. Models with a high variance tend to pay more attention to the data present in the training set and don’t generalize well, i.e. they don’t perform well on the test set. In other words, such machine learning algorithms try to fit themselves to the training data as much as possible. By doing so they make complex assumptions which may only be true for the training data and hence they perform much worse on the test set.

    它定义为如果我们使用不同的训练集,模型预测的变化量。 具有高方差的模型倾向于更加关注训练集中的数据,并且不能很好地概括,即它们在测试集中的表现不佳。 换句话说,这样的机器学习算法试图使自己尽可能地适合训练数据。 这样,他们会做出复杂的假设,这可能仅适用于训练数据,因此它们在测试集上的表现要差得多。

    • Linear Regression and Logistic Regression are low variance machine learning algorithms

      线性回归和逻辑回归是低方差机器学习算法

    • Decision Trees, k-Nearest Neighbors and Support Vector Machines are high variance machine learning algorithms

      决策树,k最近邻和支持向量机是高方差机器学习算法

    偏差-偏差权衡 (Bias-Variance Trade-off)

    Now, let’s try and understand the trade-off between bias and variance with the help of a bullseye diagram. One thing we already know is that bias and variance are inversely proportional to one another, i.e. if bias increases then variance decreases and vice versa.

    现在,让我们尝试通过靶心图了解偏差和方差之间的权衡。 我们已经知道的一件事是,偏差和方差成反比,即,如果偏差增加,则方差减小,反之亦然。

    We assume that the center of the diagram is a model that perfectly predicts the target values, and the further we are from the center the worse our predictions get. If we repeat our model building process with a few changes here and there each time we get multiple hits on our target, each of which represents the performance of an individual model.

    我们假设图的中心是一个可以完美预测目标值的模型,并且距离中心越远,我们的预测就越糟。 如果我们重复进行模型构建过程,并且每次在目标上遇到多次打击时,都会在此处和那里进行一些更改,每个打击都代表单个模型的性能。

    Bulls-eye diagram depicting the Bias-Variance Tradeoff描绘偏差方差折衷的靶心图

    To learn how to interpret our results, let’s go through the different cases we may observe:

    要了解如何解释我们的结果,我们来研究一下我们可能观察到的不同情况:

  • Low Bias & Low Variance

    低偏差和低方差

    • Ideal situation for our machine learning model

      我们的机器学习模型的理想情况
    • The error of prediction is as low as possible

      预测误差尽可能低
    • The predictions don’t change much when we choose a different training set

      当我们选择不同的训练集时,预测不会有太大变化

    2. High Bias & High Variance

    2.高偏差和高方差

    • Worst possible situation for our machine learning model

      我们的机器学习模型可能出现的最糟糕情况
    • The error of prediction is extremely high

      预测误差极高
    • The predictions fluctuate massively when we use a different training set

      当我们使用不同的训练集时,预测会大幅波动

    3. High Bias & Low Variance

    3.高偏差低方差

    • Often referred to as underfitting, which means that our model is unable to capture the underlying patterns present in our data

      通常称为欠拟合,这意味着我们的模型无法捕获数据中存在的潜在模式

    • Usually occurs due to the presence of a small amount of data

      通常是由于存在少量数据而发生
    Underfitting vs Overfitting拟合不足与拟合过度

    4. Low Bias & High Variance

    4.低偏差和高方差

    • Also known as overfitting, which means that our model finds underlying patterns present in our data but also interprets the noise as useful information

      也称为过拟合 ,这意味着我们的模型可以找到数据中存在的潜在模式,但也可以将噪声解释为有用的信息

    • It occurs when we train our model over data which hasn’t been cleaned properly

      当我们针对未正确清理的数据训练模型时会发生这种情况

    摘要 (Summary)

    At its heart, the bias-variance trade-off aims to avoid both underfitting and overfitting. As the complexity of our model increases the bias reduces and while the variance also increases. In other words, if we keep adding more features to our model our primary concern shifts from reducing the bias to reducing the variance of our model.

    从本质上讲,偏差方差折衷旨在避免拟合不足和过度拟合。 随着模型复杂度的增加,偏差减小,而方差也增大。 换句话说,如果我们继续向模型添加更多功能,则我们的主要关注点将从减少偏差转变为减少模型的方差。

    Error Complexity Curve误差复杂度曲线

    As mentioned earlier, the generalization error of our model comprises of three different errors and can be depicted mathematically as follows:

    如前所述,我们模型的泛化误差包括三个不同的误差,可以用以下数学方式表示:

    The dotted line in the error complexity curve displayed above denotes the optimum model complexity and is considered the sweet spot for our machine learning model. We can say that the sweet spot has been found when the increase in bias is equal to the reduction in variance of our model. Mathematically we get:

    显示在以上错误复杂曲线的虚线表示的最佳模型的复杂性,被认为是甜蜜点我们的机器学习模型。 可以说,当偏差的增加等于模型方差的减少时,已经找到了最佳点。 数学上我们得到:

    If the complexity of our model goes past the sweet spot then we are overfitting our model, and if we do not reach the sweet spot then we are underfitting our model.

    如果模型的复杂性超过了最佳点,那么我们就过度拟合了模型,如果我们没有达到最佳点,那么我们就对模型进行了拟合。

    结语… (Wrapping Up…)

    In essence, we can define the relationship between bias and variance as follows:

    本质上,我们可以定义偏差和方差之间的关系,如下所示:

    • Increasing the bias will decrease the variance; and

      增加偏差将减小方差; 和
    • Increasing the variance will decrease the bias

      增加方差将减少偏差

    Although there is no definitive method to obtain the so called sweet spot, we can do our best to find it by either using appropriate metrics to analyse the performance of our model or by choosing the correct algorithms (and their proper configuration) for our purposes. Thus, we can conclude that the bias-variance trade-off is an important consideration that we can use as a starting point to determine the predictive performance of our machine learning models.

    尽管没有确定的方法来获得所谓的最佳位置,但是我们可以通过使用适当的度量来分析模型的性能,或者通过选择适合我们目的的正确算法(及其正确配置)来尽力找到它。 因此,我们可以得出结论,偏差方差折衷是一个重要的考虑因素,我们可以以此为出发点来确定机器学习模型的预测性能。

  • Gentle Introduction to the Bias-Variance Trade-off in Machine Learning

    机器学习中的偏方差权衡的温和介绍

  • Understanding the Bias-Variance Tradeoff

    了解偏差-方差折衷

  • Bias-Variance Tradeoff — Bhavesh Bhatt

    偏差偏差权衡— Bhavesh Bhatt

  • Gain Access to Expert View — Subscribe to DDI Intel

    获得访问专家视图的权限- 订阅DDI Intel

    翻译自: https://medium.com/datadriveninvestor/machine-learning-101-the-bias-variance-conundrum-f4143ba9f179

    机器学习偏差方差

    总结

    以上是生活随笔为你收集整理的机器学习偏差方差_机器学习101 —偏差方差难题的全部内容,希望文章能够帮你解决所遇到的问题。

    如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。