Decision Tree Hyperparameters Explained

Decision Tree is a popular supervised learning algorithm that is often used for for classification models. A Decision Tree is structured like a flow-chart in which each question helps to separate data further.

Some Advantages and Disadvantages of a Decision Tree compared to other supervised learning algorithms are:

Advantages

  • A Decision Tree does not require scaling of data
  • Compared to other models, a Decision Tree requires less preparation of data (missing values in the data do not prevent the Decision Tree from making decisions)
  • Exploratory Data Analysis → Decision Trees can identify feature importance
  • Easy to interpret and explain

Disadvantages

  • Prone to overfitting
  • A lot of feature engineering may be required in order to optimize Decision Tree model
  • A single Decision Tree is usually a weak learner. Therefore, a Random Forest (made up of many Decision Trees) is often a better predictor

In order to ensure that a Decision Tree is as accurate as possible, one must carefully tune hyperparameters. Below, I will go through some of the various hyperparameters for a Decision Tree and explain how they apply to and effect a Decision Tree:

Criterion:

  • How to measure the quality of a split in a decision tree. You can input “gini” for Gini Impurity or “entropy” for information gain.
  • Default = “gini”
  • Input Options → {“gini”, “entropy”}

Max_Depth:

  • The maximum depth of the tree. If this is not specified in the Decision Tree, the nodes will be expanded until all leaf nodes are pure or until all leaf nodes contain less than min_samples_split.
  • Default = None
  • Input options → integer

Min_Samples_Split:

  • The minimum samples required to split an internal node. If the amount of sample in an internal node is less than the min_samples_split, then that node will become a leaf node.
  • Default = 2
  • Input options → integer or float (if float, then min_samples_split is fraction)

Min_Samples_Leaf:

  • The minimum samples required to be at a leaf node. Therefore, a split can only happen if it leaves at least the min_samples_leaf in both of the resulting nodes.
  • Default = 1
  • Input options → integer or float (if float, then min_samples_leaf is fraction)

Max_Features:

  • The number of features to consider when looking for the best split. For example, if there are 35 features in a dataframe and max_features is 9, only 9 of the 35 features will be used in the decision tree.
  • Default = None
  • Input options → integer, float (if float, then max_features is fraction) or {“auto”, “sqrt”, “log2”}
  • “auto”: max_features=sqrt(n_features)
  • “sqrt”: max_features = sqrt(n_features)
  • “log2”: max_features=log2(n_features)

When building a Decision Tree, tuning hyperparameters is a crucial step in building the most accurate model. It is not usually necessary to tune every hyperparameter, but it is important to adjust certain ones so that you can improve your overall model. The hyperparameters that are highlighted above are some of the more common ones that are tuned when building a Decision Tree. I hope you have a better understanding of how some of the Decision Tree hyperparameters work. Please feel free to reach out with any questions.