InterviewSolution
| 1. |
There is an ask to evaluate a regression model based on parameters such as R square, Adjusted R square, and Tolerance? Explain what will be the criteria. |
|
Answer» In a regression problem, we expect that when we define a solution or mathematical formula, it should explain all possible values or assumption is that most data points should get closer to the line if it is a linear regression. R square is also known as “goodness of fit”. The higher the VALUE of R square, the better it is. R square explains the amount to which input variables explain the variation of the target variable or PREDICTED variable. If R square is 0.75, then it indicates that 75% of the variation in the target variable is explained by input variables. So higher the R-square value, better the explainability of variation in target, hence better the model performance. Now the problem arises, where we add more input variables. The value of R-square keeps increasing. If additional variables do not have an influence in determining the variation of the target variable, then it is a problem and higher R-square value, in this case, is misleading. This is where the adjusted R square is being USED. The Adjusted R square is an UPDATED version of R square. It penalizes if the addition of more input variables does not improve the existing model and can’t explain the variation in target effectively. So if we are adding more input variables, we need to ensure they influence target variable, else the gap between R-square and Adjusted R-square will increase. If there is only one input variable both value will be the same. If there are multiple input variables, it is suggested to consider Adjusted R-square value for the goodness of fit. Tolerance is defined as 1/VIF where VIF stands for Variation Inflation Factor. VIF as the name suggests indicates the inflation in variation. It is a parameter that detects multicollinearity between variables. Based on VIF values, we can determine whether to remove or include all variables without comprising the Adjusted R-square value. Hence 1/VIF or Tolerance can be used to gauge which all parameters to be CONSIDERED in the model to have a better performance. |
|