Machine Learning: Decision Tree Classifier

 

Hyperparameter tuning

Hyperparameter tuning is searching the hyperparameter space for a set of values that will optimize your model architecture.


Decision Tree

Decision Tree is one of the popular and most widely used Machine Learning Algorithms because of its robustness to noise, tolerance against missing information, handling of irrelevant, redundant predictive attribute values, low computational cost, interpretability, fast run time and robust predictors. I know, that’s a lot 😂. But a common question I get asked from students is how to tune a Decision Tree. What should be the range of values I should try for the maximum depth, what should be the minimum number of samples required at a leaf node? These are very good questions that don’t have a straightforward answer but what we can do is understand how changing one will affect your model. Like what does increasing the maximum depth really mean, what does changing the minimum sample leaves do to your model. So, in this article, I attempt to give you an introduction to these parameters and how they affect your model architecture and what it can mean to your model in general.

Image for post


DecisionTreeClassifier

Image for post
N_t / N * (impurity - N_t_R / N_t * right_impurity 
- N_t_L / N_t * left_impurity)
weight * (the number of samples from a class in the node) / (size of class)

Summary

I hope you have a better idea of these parameters and how they might interact with each other when you are tuning the hyperparameters. But if something is not clear, please let me know in the comments and I would be more than happy to explain further. 

Comments