What is a random forest? Explain it’s working.

1.	What is a random forest? Explain it’s working.
Answer» Classification is very important in machine learning. It is very important to know to which class does an observation belongs. Hence, we have various classification algorithms in machine learning like logistic regression, SUPPORT vector machine, decision TREES, Naive Bayes classifier, etc. One such classification technique that is near the top of the classification hierarchy is the random forest classifier. So, firstly we need to understand a decision tree before we can understand the random forest classifier and its works. So, let us say that we have a string as given below: So, we have the string with 5 ones and 4 zeroes and we want to classify the characters of this string using their features. These features are colour (red or green in this case) and whether the observation (i.e. character) is underlined or not. Now, let us say that we are only interested in red and underlined observations. So, the decision tree would look something like this: So, we started with the colour first as we are only interested in the red observations and we separated the red and the green-coloured characters. After that, the “No” branch i.e. the branch that had all the green coloured characters was not expanded further as we want only red-underlined characters. So, we expanded the “YES” branch and we again got a “Yes” and a “No” branch based on the fact whether the characters were underlined or not. So, this is how we draw a typical decision tree. However, the data in real life is not this clean but this was just to give an idea about the working of the decision trees. Let us now move to the random forest. Random Forest It consists of a large number of decision trees that operate as an ensemble. Basically, each tree in the forest gives a class prediction and the one with the maximum number of votes becomes the prediction of our model. For instance, in the example shown below, 4 decision trees predict 1, and 2 predict 0. Hence, prediction 1 will be considered. The underlying principle of a random forest is that several weak learners combine to form a keen learner. The steps to build a random forest are as follows: Build several decision trees on the samples of data and RECORD their predictions. Each time a split is considered for a tree, choose a random sample of mm predictors as the split candidates out of all the pp predictors. This happens to every tree in the random forest. Apply the rule of thumb i.e. at each split m = p√m = p. Apply the predictions to the MAJORITY rule.

Answer»

Classification is very important in machine learning. It is very important to know to which class does an observation belongs. Hence, we have various classification algorithms in machine learning like logistic regression, SUPPORT vector machine, decision TREES, Naive Bayes classifier, etc. One such classification technique that is near the top of the classification hierarchy is the random forest classifier.

So, firstly we need to understand a decision tree before we can understand the random forest classifier and its works. So, let us say that we have a string as given below:

So, we have the string with 5 ones and 4 zeroes and we want to classify the characters of this string using their features. These features are colour (red or green in this case) and whether the observation (i.e. character) is underlined or not. Now, let us say that we are only interested in red and underlined observations. So, the decision tree would look something like this:

So, we started with the colour first as we are only interested in the red observations and we separated the red and the green-coloured characters. After that, the “No” branch i.e. the branch that had all the green coloured characters was not expanded further as we want only red-underlined characters. So, we expanded the “YES” branch and we again got a “Yes” and a “No” branch based on the fact whether the characters were underlined or not.

So, this is how we draw a typical decision tree. However, the data in real life is not this clean but this was just to give an idea about the working of the decision trees. Let us now move to the random forest.

Random Forest

It consists of a large number of decision trees that operate as an ensemble. Basically, each tree in the forest gives a class prediction and the one with the maximum number of votes becomes the prediction of our model. For instance, in the example shown below, 4 decision trees predict 1, and 2 predict 0. Hence, prediction 1 will be considered.

The underlying principle of a random forest is that several weak learners combine to form a keen learner. The steps to build a random forest are as follows:

Build several decision trees on the samples of data and RECORD their predictions.
Each time a split is considered for a tree, choose a random sample of mm predictors as the split candidates out of all the pp predictors. This happens to every tree in the random forest.
Apply the rule of thumb i.e. at each split m = p√m = p.
Apply the predictions to the MAJORITY rule.

What is a random forest? Explain it’s working.

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment