Random Forest in Python
Random Forest algorithm creates many trees each:
1) random selection of data rows
2)random selection features
Then makes the prediction based on majority vote of these trees.
It has less variance than a single tree.
################################################### amirpredictor=amirdf[["sex","FamilySize","FamilyIncome","EdYears"]] amiroutcome=amirdf[['be']] X_Train, X_Test,Y_Train, Y_Test= sktts.train_test_split(amirpredictor,amiroutcome,test_size=.3) #n_estimators – The number of trees in the forest. # For each tree, only a share of data is selected for building the tree, i.e. training. The remaining samples are the # out-of-bag samples. These out-of-bag samples can be used directly during training to compute a test accuracy. # If you activate the option, the "oob_score_" and "oob_prediction_" will be computed. RFmodel= rf( n_jobs=-1, max_depth=5, n_estimators=1000, oob_score=True) RFmodel.fit(X_Train, Y_Train) RFmodel.oob_score_ ModelPredictions=RFmodel.predict(X_Test) CM=skmeter.confusion_matrix(Y_Test,ModelPredictions) disp=skmeter.ConfusionMatrixDisplay(confusion_matrix=CM, display_labels=DTmodel.classes_) disp.plot() plt.show() print(CM) Amiraccuracy=skmeter.accuracy_score(Y_Test,ModelPredictions) print("Amiraccuracy:"+str(Amiraccuracy)) #########################################################################
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
bagging_classifier = BaggingClassifier(
base_estimator = DecisionTreeClassifier(), # Model to use
n_estimators=500, # Number of models to train
max_samples=100, # Amount of samples to train each model (Putting the length of the whole dataset is what he's proposing)
)
https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/