Random Forest in Python

Random Forest algorithm creates many trees each:
1) random selection of data rows

2)random selection features

Then makes the prediction based on majority vote of these trees.

It has less variance than a single tree.

###################################################

amirpredictor=amirdf[["sex","FamilySize","FamilyIncome","EdYears"]]
amiroutcome=amirdf[['be']]
X_Train, X_Test,Y_Train, Y_Test= sktts.train_test_split(amirpredictor,amiroutcome,test_size=.3)
#n_estimators – The number of trees in the forest.
# For each tree, only a share of data is selected for building the tree, i.e. training. The remaining samples are the
# out-of-bag samples. These out-of-bag samples can be used directly during training to compute a test accuracy.
# If you activate the option, the "oob_score_" and "oob_prediction_" will be computed.
RFmodel= rf( n_jobs=-1, max_depth=5, n_estimators=1000, oob_score=True)
RFmodel.fit(X_Train, Y_Train)
RFmodel.oob_score_
ModelPredictions=RFmodel.predict(X_Test)
CM=skmeter.confusion_matrix(Y_Test,ModelPredictions)
disp=skmeter.ConfusionMatrixDisplay(confusion_matrix=CM, display_labels=DTmodel.classes_)
disp.plot()
plt.show()
print(CM)
Amiraccuracy=skmeter.accuracy_score(Y_Test,ModelPredictions)
print("Amiraccuracy:"+str(Amiraccuracy))

#########################################################################

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bagging_classifier = BaggingClassifier(
  base_estimator = DecisionTreeClassifier(), # Model to use
  n_estimators=500, # Number of models to train
  max_samples=100, # Amount of samples to train each model (Putting the length of the whole dataset is what he's proposing)
)

https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/