InterviewSolution
| 1. |
What is overfitting? |
|
Answer» Overfitting refers to a model that is tightly fitted to the data. It is a modeling error. It occurs when a modeling function is too closely fit a limited data set. Here the model is made too complex to explain the peculiarity or individuality in the data which is under consideration. The predictivity of such models gets reduced due to overfitting. The generalization ability of such models also gets affected. Such models generally fail when applied on the outside data i.e. the data which was not part of the sample data. There are several methodologies to AVOID overfitting. These are:
Overfitting seems to be a common problem in the world of data science and machine learning. Such a model learns noise also along with the signal. It proves to be a poor fit when applied to NEW data sets. A model should be considered as an overfitted when it PERFORMS better on the training set but poor on the test set. Following is a description of the most widely USED cross-validation method: The cross-validation method is considered to be one of the powerful techniques for the prevention of overfitting. Here, the training data is used to obtain multiple small test sets. These small test sets should be used to tune the model. In 'k-fold cross-validation' method, the data is partitioned into 'k' subsets. These subsets are called folds. The model is then trained on 'k-1' folds and the remaining fold is used as the test set. It is also called the 'holdout fold'. This method allows us to keep the test set as an unseen dataset and lets us select the final model. |
|