1.

Explain overfitting in big data? How to avoid the same.

Answer»

Overfitting is generally a modeling error referring to a model that is TIGHTLY fitted to the data, i.e. When a modeling function is closely fitted to a limited data set. Due to Overfitting, the predictivity of such models gets reduced. This effect leads to a decrease in generalization ability failing to generalize when applied OUTSIDE the sample data.

There are several Methods to avoid Overfitting; some of them are:

  • Cross-validation: A cross-validation method refers to DIVIDING the data into multiple small TEST data sets, which can be used to tune the model.  
  • Early stopping: After a certain NUMBER of iterations, the generalizing capacity of the model weakens; in order to avoid that, a method called early stopping is used in order to avoid Overfitting before the model crosses that point.                                                          
  • Regularization: this method is used to penalize all the parameters except intercept so that the model generalizes the data instead of Overfitting.


Discussion

No Comment Found