Amount of information
there is in an event using the probability of the event. This is called “Shannon information,” “selfinformation,” or simply the “information,” and can be calculated for a discrete event x as follows:
 information(x) = log2( p(x) )
Where log2() is the base2 logarithm and p(x) is the probability of the event x.
The choice of the base2 logarithm means that the units of the information measure is in bits (binary digits).
This can be directly interpreted in the information processing sense as the number of bits required to represent the event.
if the probability of finding something is 1/2
then the number of bits needed is log2(1/2)=log2(1)+log2(2)=1
if the probability of finding something is 1/4
then the number of bits needed is log2(1/4)=log2(1)+log2(4)=2
Entropy is the weighted average of information=p*log2(p)(1p)log2(1p)
There are two decisions to be made to form a decision tree
1) Information gain
a) Based on Entropy
Based on Gini Impurity
2) stop criteria (for stopping before reaching a pure leaf node)
a) reduction in impurity
b) minimum number of samples in node
c) Maximum depth
https://en.wikipedia.org/wiki/Machine_learning
Decision Tree Calculations
Random Forest has two sources of randomness
1) random selection of predictors columns used to predict the predicted.
2) random selection of a subset of training data with replacement (Bootstrapping/Bagging), Then evaluation using OutOfBag subset.
For each bootstrap sample taken from the training data, there will be samples left behind that were not included. These samples are called OutOfBag samples or OOB. The performance of each model on its left out samples when averaged can provide an estimated accuracy of the bagged models. This estimated performance is often called the OOB estimate of performance. These performance measures are reliable test error estimate and correlate well with cross validation estimates.
I like disabling the bootstrapping, which allows to find the best tree of features using the training data; then test that tree with the test data and finding the goodness of predictions in a classical confusion matrix.
Without bootstrapping, all of the data is used to fit the model, so there is not random variation between trees with respect to the selected examples at each stage. However, random forest has a second source of variation, which is the random subset of features to try at each split. “The subsample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default),” which implies that bootstrap=False
draws a sample of size equal to the number of training examples without replacement, i.e. the same training set is always used.
setting bootstrap=False
and using all the features does not produce identical random forests (may be because the random order of the top to bottom branches)
https://stats.stackexchange.com/questions/534356/arerandomforeststrainedwiththewholedataset
https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
ٰData for an Introduction to Statistical Learning with Applications in R https://cran.rproject.org/web/packages/ISLR/index.html
 Applied Predictive Modeling, Chapter 8 and Chapter 14.
 https://www.amazon.ca/HandsMachineLearningScikitLearnTensorFlow/dp/1492032646
 https://www.amazon.ca/DataScienceScratchPrinciplesPython/dp/1492041130/ref=pd_lpo_3?pd_rd_w=Zuxtv&contentid=amzn1.sym.bc8b374c81304c45bf244fcc0d96f4d6&pf_rd_p=bc8b374c81304c45bf244fcc0d96f4d6&pf_rd_r=G2NTE60PAJ5S0EJVJBA9&pd_rd_wg=ZM9qC&pd_rd_r=6ff486ca9cdf4960b5cb22e200fea9f7&pd_rd_i=1492041130&psc=1
 https://www.amazon.ca/PracticalStatisticsDataScientistsEssential/dp/149207294X/ref=pd_lpo_2?pd_rd_w=Zuxtv&contentid=amzn1.sym.bc8b374c81304c45bf244fcc0d96f4d6&pf_rd_p=bc8b374c81304c45bf244fcc0d96f4d6&pf_rd_r=G2NTE60PAJ5S0EJVJBA9&pd_rd_wg=ZM9qC&pd_rd_r=6ff486ca9cdf4960b5cb22e200fea9f7&pd_rd_i=149207294X&psc=1
 https://www.amazon.ca/dp/1801819319/ref=sspa_dk_detail_0?psc=1&pd_rd_i=1801819319&pd_rd_w=OWbtJ&contentid=amzn1.sym.d8c43617c62545bda63fad8715c2c055&pf_rd_p=d8c43617c62545bda63fad8715c2c055&pf_rd_r=G2NTE60PAJ5S0EJVJBA9&pd_rd_wg=cqqOJ&pd_rd_r=775aec5ca4ee4e5ab97c96ba6ac4926a&s=books&sp_csd=d2lkZ2V0TmFtZT1zcF9kZXRhaWw&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExVEcxT0JJV1E0SDYyJmVuY3J5cHRlZElkPUEwMTM0NjMwU1FMV1pRRDNENk1LJmVuY3J5cHRlZEFkSWQ9QTAxNjc1NjQzQlM4Vk03QUZPSkJSJndpZGdldE5hbWU9c3BfZGV0YWlsJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ==
Videos
Topic  Access 

StatQuest: Decision Trees Construct trees with GINI Impurity No flaws detected (17 min) 
URL:https://www.youtube.com/watch?v=7VeUPuFGJHk 
StatQuest: Decision Trees, Part 2 – Feature Selection and Missing Data (5 min) 
https://www.youtube.com/watch?v=wpNlJwwplA 
Decision Tree 1: how it works (9 min) assumes that the tree end with pure sets !!! Entropy discussed in a shallow way 

Look at entropy from Thermodynamics point of view: https://www.grc.nasa.gov/www/k12/airplane/entropy.html https://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_information_theory https://en.wikipedia.org/wiki/Entropy_(information_theory) https://towardsdatascience.com/entropyisameasureofuncertaintye2c000301c2c

Play List:https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez_4temBw7vLA19p3tdQH6FYO&index=1

A Gentle Introduction to Information Entropy  URL: https://machinelearningmastery.com/whatisinformationentropy/ 
Decision Tree (CART) – Machine Learning Fun and Easy (9 min)  https://www.youtube.com/watch?v=DCZ3tsQIoGU 
Decision Tree In R (46 min) by Simplilearn last 15 minutes not useful for students 
https://www.youtube.com/watch?v=HmEPCEXnZM 
Pruning Classification trees using cv.tree ( 15 min)  https://www.youtube.com/watch?v=GOJN9SKl_OE 
Machine Learning and Decision Trees ( first 15 minutes )  https://www.youtube.com/watch?v=RmajweUFKvM 
Random Forest (27 min)  https://www.youtube.com/watch?v=HeTT73WxKIc 
Articles
Medium  Topic  Access 

Tutorial  Classification Trees  https://daviddalpiaz.github.io/r4sl/trees.html 
Tutorial  A Complete Guide on Decision Tree Algorithm  https://www.edureka.co/blog/decisiontreealgorithm/ 
Tutorial  Decision Tree: How To Create A Perfect Decision Tree?  https://www.edureka.co/blog/decisiontrees/ 
Tutorial  What is Overfitting In Machine Learning And How To Avoid It?  https://www.edureka.co/blog/overfittinginmachinelearning/ 
Tutorial  Decision Tree in R with Example  https://www.guru99.com/rdecisiontrees.html 
Tutorial  Decision Tree and Random Forests  https://www.analyticsvidhya.com/blog/2016/02/completetutoriallearndatasciencescratch/ 
Tutorial  Classification & Regression Trees  http://www.di.fc.ul.pt/~jpn/r/tree/tree.html 
Using “rpart” library for CART
“classification and regression trees” (CART) relies on ‘recursive partitioning’ to identify patterns in the variance of response variables with respect to explanatory variables of interest. If the response is categorical, it creates classification trees.
If the response numeric, it creates regression trees.
We partition the response into the two most homogenous groups possible based on our explanatory variables (predictors). If categorical, this means splitting into two categories. In the case of a binomial predictor, we end up with two groups each containing only a single category.
The initial split is found by maximizing homogeneity of variance within groups in the partition based on all all possible splits for each of the explanatory variables.
================
rattle::fancyRpartPlot() is good
https://www.gormanalysis.com/blog/magicbehindconstructingadecisiontree/
library(rattle)
library(rpart.plot)
library(RColorBrewer)
library(rpart)
mytree < rpart(
Fraud ~ RearEnd,
data = train,
method = "class"
)
# plot mytree
rattle::fancyRpartPlot(mytree, caption = NULL)
================
A package for all the seasons Classification And REgression Training
51 models
http://topepo.github.io/caret/index.html