History of pruning algorithm development and python implementation(finished)
All the python-implementation for 7 post-pruning Algorithms
are here.
Table of Decision Trees:
| ID3 | Ross Quinlan | 《Discovering rules by induction from large collections of examples》 | 1979 |
| ID3 | Ross Quinlan | Another origin:《Learning efficient classification procedures and their application to chess end games》 | 1983a |
| CART | Leo Breiman | 《Classification and Regression Trees》 | 1984 |
| C4.5 | Ross Quinlan | 《C4.5: Programming for machine learning》 | 1993 |
| C5.0 | Ross Quinlan | Commercial Edition of C4.5 ,no relevant papers | - |
Table of Pruning Algorithms:
| Pessimistic Pruning | 《Simplifying Decision Trees》part2.3 | 1986b(也有1987b的说法,这里以论文上写的时间为准) | Quinlan | C4.5 | Ross Quinlan invented “Pessimistic Pruning”, John Mingers rename it as “Pessimistic Error Pruning” in his article《An Empirical Comparison of Pruning Methods for Decision Tree induction》 |
| Reduced Error Pruning | 《Simplifying Decision Trees》part2.2 | 1986b | Quinlan | C4.5 | 需要额外的验证集才能剪枝 |
| Cost-Complexity Pruning | 《Classification and Regression Trees》3.3节 | 1984 | L Breiman | CART | For Classification Tree only |
| Error-Complexity Pruning | 《Classification and Regression Trees》8.5.1节 | 1984 | L Breiman | CART | For Regression Tree only |
| Critical Value Pruning | 《Expert System-Rule Induction with Statistical Data》,还有一说是:《An Empirical Comparison of Pruning Methods for Decision Tree Induction》但是该文作者自己说是引用自1987年的论文 | 1987a | John Mingers | 论文中没有明说哪一种 | 关于CVP算法的出处众说纷纭,这里的出处是以《An Empirical Comparison of Pruning Methods for Ensemble Classifiers》P212提到的为准 |
| Minimum-Error Pruning | 《Learning decision rules in noisy domains》 | 1986 | Niblett and Bratko | - | Can Not be Downloaded from Internet |
| Minimum-Error Pruning | 《on estimating probabilities in tree pruning》 | 1991 | Bojan Cestnik,Ivan Bratko | - | modified MEP algorithm |
| Error-Based Pruning | 《C4.5: Programs for Machine Learning 》4.2节 | 1993 | Quinlan | C4.5 | EBP is an evolution of PEP |
Pruning Target of classification trees(ID3,C4.5,CART-classification)
1.simplifying your classification trees without losing precision too much
2.improve Generalization ability of validation Sets.
3.alleviate overfit
Pruning Target of classification trees(CART-Regression):
1.simplifying your classification trees without increasing MSE too much
2.improve Generalization ability of validation Sets.
3.alleviate overfit
----------------For C4.5-----------------
U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》
Weka的-3.6.10的C4.5与Quinlan教授的C4.5算法的区别
《C4.5: Programs for Machine Learning》chaper4实验结果重现
C4.5-Release8中Ross Quinlan对缺失值的处理
C4.5最新版本Release8与MDL的关系的详细解读
some understanding of《Inferring Decision Trees Using the Minimum Description Length Principle*》
some understanding of《Improved Use of Continuous Attributes in C4.5》
C4.5-Release8的代码架构图
------------REP-start--------
ID3的REP(Reduced Error Pruning)剪枝代码详细解释+周志华《机器学习》决策树图4.5、图4.6、图4.7绘制
周志華《機器學習》圖4.4和图4.9繪制(轉載+增加熵顯示功能)
ID3决策树中连续值的处理+周志华《機器學習》图4.8和图4.10绘制
sklearn没有实现ID3算法
------------REP-end--------
----------------PEP-start-----------------
Earliest PEP Algorithm Principles
Pessimistic Error Pruning example of C4.5
Pessimistic error pruning illustration with C4.5-python implemention
----------------PEP-end----------------
-------------EBP-start--------------------
Error Based Pruning剪枝算法、代码实现与举例
这里有人会质疑为何不直接采用weka中的J48的python接口?
注意,weka是以quinlan的C语言版本代码为准的(说白了就是J48抄的C4.5-Release8),在某些数据集中,例如使用hypo数据集,weka生成的决策树很庞大,这是非常糟糕的。
因为决策树的初衷是帮助分类,生成知识,
十分庞大的决策树显然是不利于使用的。
J48:Java edition of C4.5-Release8
-------------EBP-end--------------------
-------------MEP-start--------------------
Two Examples of Minimum Error Pruning(reprint)
MEP(minimum error pruning) principle with python implemention
-----------------MEP-end--------------------
-----------------CVP-start--------------------
proof of Chi-square statistics used in CVP
CVP(Critical Value Pruning)illustration with clear principle in details
CVP(Critical value pruning)examples with python implemention
Error in a paper about CVP
contingency(列联表)python计算与实验结果分析
python卡方分布计算
-----------------CVP-end--------------------
------------------------For CART-------------------------------------------------------
notes from《classification and regression trees》
\--------CCP start---------------------
《统计学习方法》P59决策树绘制-sklearn版本
CCP(Cost complexity pruning) on sklearn with python implemention
Theory Defect in selecting best pruned tree from CCP with Cross-validation
1SE rule details in CCP pruning of CART
\--------CCP-end---------------------
\--------ECP start---------------------
举例讲清楚模型树和回归树的区别
Error Complexity Pruning for sklearn’s Regression Tree with Python Implementation
\--------ECP end---------------------
Note:
Critical Value Pruning can be both used in pre-pruning and post-pruning.
When in pre-pruing,IM(Information Measure)is replaced with Chi-Square Statistics.
When post-pruning with χ2\chi^2χ2 test,then you need an independent datasets.
Of course,if you grow a tree with CVP,then you need not post-prune it with CVP with the same data which is used to grow the tree.
TablesaboutwhetheryouneedextravalidationsetswhenpruningTables\ about\ whether\ you\ need\ extra\ validation\ sets\ when\ pruningTables about whether you need extra validation sets when pruning
In “Reference and Quotation” of the following table,
the word “test"means sets with
“class label”,so it actually means"validation datasets”.
| REP(Reduced Error Pruning) | yes | According to 《An Empirical Comparison of Pruing Methods for Decision Tree Induction》Section2.2.4: “The pruned node will often make fewer errors on the test data than the sub-tree makes.” |
| CCP(Cost Complexity Pruning) | pruning stage:No selecting best pruned tree stage: ①(small datasets)cross-validation:yes ②(large datasets)1-SE rule:no | According to《Simplifying Decision Trees》part 2.1: “Finally,the procedure requires a test set distinct from the original training set” |
| ECP(Error Complexity Pruning) | pruning stage:No selecting best pruned tree stage: ①(small datasets)cross-validation:yes ②(large datasets)1-SE rule:no | According to《An Empirical Comparison of Pruing Methods for Decision Tree Induction》part 2.2.1-page230: “Instead,each of the pruned trees is used to classify an independent test data set” |
| CVP(Critical Value Pruning) | pre-pruning:no post-pruning:yes | 《Expert System-Rule Induction with Statistical Data》(pre-prune): “Intial runs were performed using chi-square purely as a means of choosing attributes-not of judging their significance-onthe two years separately and on the data as a whole.” |
| MEP(Minimum Error Pruning) | It all depends how you set m | According to《ON ESTIMATING PROBABILITIES IN TREE PRUNING》 Section1: "m can be adjusted to some essential properties of the learning domain,such as the level of noise in the learning data." Section 2: “m can be set so as to maximise the classification accuracy on an independent data set” |
| PEP(Pessimistic Error Pruning) | no | According to 《Simplifying Decision Trees》Section2.3: “Unlike these methods,it does not require a test set separate from the cases in the training set from which the tree was constructed.” |
| EBP(Error Based Pruning) | no | According to《C4.5:Programs for machine learning》Page 40th: “The approach taken in C4.5 belongs to the second family of techniques that use only the training set from which the tree was build.” |
Atttention:
When your datasets is small,you need Cross Validation in CCP、ECP to produce k candidate model,and each candidate model is then pruned many times to get its pruned tree sequence (K such sequences totally),and finally to choose the best pruned tree from among the K sequences.
When your datasets is large,you need 1-SE Rule in CCP、ECP to select the best pruned tree.
In above Github Link ,we use the ② method.
PruningStyleTableofAlgorithmsPruning\ Style\ Table\ of\ AlgorithmsPruning Style Table of Algorithms
| REP(Reduced Error Pruning) | bottom-up | NOT mentioned in earliest article 《Simplifying Decision Trees》about it |
| CCP(Cost Complexity Pruning) | bottom-up | 《Classification and Regression Trees》 -LEO BREIMAN Chapter 3.3 MINIMAL COST-COMPLEXITY PRUNING: "Thus, the pruning process produces a finite sequence of subtrees T1 , T2 , T3 , …with progressively fewer terminal nodes " |
| ECP(Error Complexity Pruning) | bottom-up | 《Classification and Regression Trees》 -LEO BREIMAN Chpater8.5.1: “Now minimal error-complexity pruning is done exactly as minimal cost-complexity pruning in classification.” |
| CVP(Critical Value Pruning) | top-down in pre-pruning bottom-up in post-pruing | pre-pruning 《Expert Systems-Rule Induction with Statistical Data》"The ID3 algorithm,with the enhancements mentioned previously,1^11was modified to calculate χ2\chi^2χ2instead of IM" post-pruning: https://www.cs.rit.edu/~rlaz/prec2010/slides/DecisionTrees.pdf |
| MEP(Minimum Error Pruning) | bottom-up | 《The effects of pruning methods on the predictive accuracy of induced decision rules》: “Niblett and Bratko[26] proposed a bottom-up approach for seaching a single tree that minimizes the expected error rate.” |
| PEP(Pessimistic Error Pruning) | top-down | 《Top-Down Induction of Decision Trees Classifiers – A Survey》part E: “The pessimistic pruning procedure performs top-down traversing over the internal nodes.” |
| EBP(Error Based Pruning) | bottom-up | 《C4.5: Programs for Machine Learning》page 39th: “Start from the bottom of the tree and examine each nonleaf sub-tree.” |
Do all above pruning tend to be overpruning or underpruning?
According to the following 2 articles:
《Simplifying Decision Trees by Pruning and Grafting:New Results》
《Top-Down Induction of Decision Trees Classifiers –A Survey》
The result table is as follows:
| REP | over-pruning or not significant |
| PEP | under-pruning |
| EBP | under-pruning |
| MEP | under-pruning |
| CVP | under-pruning |
| CCP | under-pruning(from my own experiment) |
| ECP | under-pruning (from my own experiment) |
markdown tables generation table
https://tool.lu/tables
总结
以上是生活随笔为你收集整理的History of pruning algorithm development and python implementation(finished)的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: C4.5-Release8中Ross Q
- 下一篇: MEP(minimum error pr