InterviewSolution
| 1. |
We have trained/executed our model with the given dataset. We have noticed that we have used a regression model and it is suffering from multicollinearity. Is it possible to improvise on our model without losing any information? |
|
Answer» To check multicollinearity, we can create a CORRELATION matrix to identify & remove variables having a correlation above 75% (assuming that deciding a threshold is subjective). In addition, we can USE calculate VIF (variance inflation factor) to check the presence of multicollinearity. VIF VALUE <= 4 suggests no multicollinearity whereas a value of >= 10 IMPLIES serious multicollinearity. Additionally, we can use tolerance as an indicator of multicollinearity. However, removing correlated variables might lead to loss of information. In order to retain those variables, we can use penalized regression models like RIDGE or lasso regression. Additionally, we can add some random noise in a correlated variable so that the variables become different from each other. But, adding noise might affect the prediction accuracy, hence this approach should be carefully used with some balancing effect. |
|