InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Explain Collinearity Between Continuous And Categorical Variables. Is Vif A Correct Method To Compute Collinearity In This Case? |
|
Answer» COLLINEARITY between categorical and continuous VARIABLES is very common. The choice of reference category for DUMMY variables affects multicollinearity. It means CHANGING the reference category of dummy variables can avoid collinearity. Pick a reference category with highest proportion of CASES. VIF is not a correct method in this case. VIFs should only be run for continuous variables. The t-test method can be used to check collinearity between continuous and dummy variable. Collinearity between categorical and continuous variables is very common. The choice of reference category for dummy variables affects multicollinearity. It means changing the reference category of dummy variables can avoid collinearity. Pick a reference category with highest proportion of cases. VIF is not a correct method in this case. VIFs should only be run for continuous variables. The t-test method can be used to check collinearity between continuous and dummy variable. |
|
| 2. |
Explain Important Model Performance Statistics? |
| Answer» | |
| 3. |
What Is P-value And How It Is Used For Variable Selection? |
|
Answer» The p-value is lowest level of significance at which you can reject null hypothesis. In the case of independent VARIABLES, it implies whether coefficient of a VARIABLE is SIGNIFICANTLY DIFFERENT from zero. The p-value is lowest level of significance at which you can reject null hypothesis. In the case of independent variables, it implies whether coefficient of a variable is significantly different from zero. |
|
| 4. |
Do We Remove Intercepts While Calculating Vif? |
|
Answer» No. VIF depends on the intercept because there is an intercept in the regression used to determine VIF. If the intercept is removed, R-square is not meaningful because it may be negative in which case ONE can get VIF < 1, IMPLYING that the STANDARD error of a variable would go up if that independent variable were uncorrelated with the other PREDICTORS. No. VIF depends on the intercept because there is an intercept in the regression used to determine VIF. If the intercept is removed, R-square is not meaningful because it may be negative in which case one can get VIF < 1, implying that the standard error of a variable would go up if that independent variable were uncorrelated with the other predictors. |
|
| 5. |
How Vif Is Calculated And Interpretation Of It? |
|
Answer» VIF MEASURES how MUCH the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity. If the VIF of a predictor variable were 9 (√9 = 3) this MEANS that the standard error for the coefficient of that predictor variable is 3 times as large as it would be if that predictor variable were uncorrelated with the other predictor variables.Steps of calculating VIF
VIF measures how much the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity. If the VIF of a predictor variable were 9 (√9 = 3) this means that the standard error for the coefficient of that predictor variable is 3 times as large as it would be if that predictor variable were uncorrelated with the other predictor variables.Steps of calculating VIF |
|
| 6. |
What Is Multicollinearity And How To Deal It? |
|
Answer» MULTICOLLINEARITY implies high correlation between independent variables. It is ONE of the assumptions in linear and LOGISTIC regression. It can be identified by looking at VIF score of variables. VIF > 2.5 implies moderate collinearity issue. VIF >5 is considered as high collinearity. It can be handled by iterative process : first step - remove VARIABLE having highest VIF and then CHECK VIF of remaining variables. If VIF of remaining variables > 2.5, then follow the same first step until VIF < =2.5 Multicollinearity implies high correlation between independent variables. It is one of the assumptions in linear and logistic regression. It can be identified by looking at VIF score of variables. VIF > 2.5 implies moderate collinearity issue. VIF >5 is considered as high collinearity. It can be handled by iterative process : first step - remove variable having highest VIF and then check VIF of remaining variables. If VIF of remaining variables > 2.5, then follow the same first step until VIF < =2.5 |
|
| 7. |
Explain Dimensionality / Variable Reduction Techniques? |
|
Answer» Unsupervised Method (No Dependent Variable)
Supervised Method (In RESPECT to Dependent Variable): For Binary / Categorical Dependent Variable
For Continuous Dependent Variable
Unsupervised Method (No Dependent Variable) Supervised Method (In respect to Dependent Variable): For Binary / Categorical Dependent Variable For Continuous Dependent Variable |
|
| 8. |
How To Treat Outliers? |
|
Answer» There are several methods to TREAT OUTLIERS - There are several methods to treat outliers - |
|
| 9. |
How To Handle Missing Values? |
|
Answer» We fill/impute missing values using the following METHODS. Or make missing values as a separate category.
We fill/impute missing values using the following methods. Or make missing values as a separate category. |
|
| 10. |
Difference Between Linear And Logistic Regression? |
|
Answer» Two main difference are as follows - LINEAR regression requires the dependent variable to be CONTINUOUS i.e. numeric values (no categories or groups). While BINARY logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinomial or ordinary logistic regression can have dependent variable with more than two categories. Linear regression is based on least square estimation which SAYS regression coefficients should be chosen in such a way that it minimizes the sum of the squared DISTANCES of each observed response to its fitted value. While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood) Two main difference are as follows - Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups). While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinomial or ordinary logistic regression can have dependent variable with more than two categories. Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value. While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood) |
|
| 11. |
Explain The Problem Statement Of Your Project. What Are The Financial Impacts Of It? |
|
Answer» Cover the objective or main goal of your PREDICTIVE MODEL. Compare MONETARY benefits of the predictive model vs. No-model. ALSO highlights the non-monetary benefits (if any). Cover the objective or main goal of your predictive model. Compare monetary benefits of the predictive model vs. No-model. Also highlights the non-monetary benefits (if any). |
|
| 12. |
What Are The Applications Of Predictive Modeling? |
|
Answer» Predictive modeling is mostly USED in the following areas -
Predictive modeling is mostly used in the following areas - |
|
| 13. |
What Are The Essential Steps In A Predictive Modeling Project? |
|
Answer» It consists of the following steps:
It consists of the following steps: |
|