72 + Interview Questions in Statistical Inference And Regression Models in Data Science Page 1 InterviewSolution

1.	Collection of exchangeable binary outcomes for the same covariate data are called _______ outcomes.(a) random(b) direct(c) binomial(d) none of the mentionedThis question was posed to me in an international level competition.I'd like to ask this question from Binary and Count Outcomes in division Statistical Inference and Regression Models of Data Science
Answer» RIGHT option is (c) binomial Best EXPLANATION: The multivariate REGRESSION MODEL for binary outcomes gives ODDS ratios, not risk ratios.

Discussion

2.	Point out the correct statement.(a) A standard error is needed to create a prediction interval(b) The prediction interval must incorporate the variability in the data around the line(c) Investors use the residual variance to measure the accuracy of their predictions on the value of an asset(d) All of the mentionedI have been asked this question in unit test.This interesting question is from Residual Variation and Multivariate in chapter Statistical Inference and Regression Models of Data Science
Answer» CORRECT answer is (d) All of the mentioned The best explanation: In statistics, EXPLAINED variation MEASURES the proportion to which a mathematical model accounts for the variation of a GIVEN DATA set.

Discussion

3.	Point out the correct statement.(a) The mean is a measure of central tendency of the data(b) Empirical mean is related to “centering” the random variables(c) The empirical standard deviation is a measure of spread(d) All of the mentionedI have been asked this question in a job interview.My doubt stems from Introduction to Regression Models topic in division Statistical Inference and Regression Models of Data Science
Answer» CORRECT CHOICE is (d) All of the mentioned Explanation: The PROCESS of centering and SCALING the data is called “normalizing” the data.

Discussion

4.	Which of the following value is the most common measure of “statistical significance”?(a) P(b) A(c) L(d) All of the mentionedThe question was asked in an online interview.This question is from Statistical Inference Concepts topic in section Statistical Inference and Regression Models of Data Science
Answer» Right answer is (a) P To EXPLAIN: The P-value is the PROBABILITY under the NULL hypothesis of OBTAINING evidence as EXTREME or more extreme than would be observed by chance alone.

Discussion

5.	__________ random variables are used to model rates.(a) Empirical(b) Binomial(c) Poisson(d) All of the mentionedI have been asked this question by my school principal while I was bunking the class.I want to ask this question from Common Distributions topic in portion Statistical Inference and Regression Models of Data Science
Answer» CORRECT option is (c) POISSON The BEST I can explain: Poisson distribution is USED to model counts.

Discussion

6.	Which of the following is example use of Poisson distribution?(a) Analyzing contingency table data(b) Modeling web traffic hits(c) Incidence rates(d) All of the mentionedThe question was asked during a job interview.Question is taken from Binary and Count Outcomes in division Statistical Inference and Regression Models of Data Science
Answer» Correct choice is (d) All of the mentioned The EXPLANATION: The POISSON distribution is a useful model for COUNTS and RATES.

Discussion

7.	Which of the following function is associated with a continuous random variable?(a) pdf(b) pmv(c) pmf(d) all of the mentionedThis question was addressed to me by my school principal while I was bunking the class.Query is from Introduction to Statistical Inference in chapter Statistical Inference and Regression Models of Data Science
Answer» CORRECT answer is (a) PDF The best explanation: pdf stands for probability DENSITY FUNCTION.

Discussion

8.	Point out the correct statement.(a) Bayesian inference is the use of Bayesian probability representation of beliefs to perform inference(b) NULL is the standard missing data marker used in S(c) Frequency inference is the use of Bayesian probability representation of beliefs to perform inference(d) None of the mentionedThe question was posed to me in final exam.My doubt stems from Introduction to Statistical Inference in portion Statistical Inference and Regression Models of Data Science
Answer» The correct option is (a) Bayesian inference is the use of Bayesian probability representation of beliefs to PERFORM inference To EXPLAIN I would say: FREQUENCY probability is the long run proportion of times an event OCCURS in independent, identically distributed REPETITIONS.

Discussion

9.	Which of the following refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it?(a) Heterogeneity(b) Heteroskedasticity(c) Heteroelasticty(d) None of the mentionedI got this question by my college professor while I was bunking the class.Question is from Introduction to Regression Models topic in chapter Statistical Inference and Regression Models of Data Science
Answer» RIGHT option is (b) Heteroskedasticity Easiest EXPLANATION - Heteroskedasticity has SERIOUS CONSEQUENCES for the OLS estimator.

Discussion

10.	Point out the wrong statement.(a) Asymptotics generally give assurances about finite sample performance(b) The sample variance and the sample standard deviation are consistent as well(c) The sample mean and the sample variance are unbiased as well(d) None of the mentionedI had been asked this question during an online interview.Question is taken from Likelihood in division Statistical Inference and Regression Models of Data Science
Answer» Correct option is (a) ASYMPTOTICS generally give assurances about finite sample performance To EXPLAIN I WOULD say: The kinds of asymptotics that do are orders of magnitude more difficult to WORK with.

Discussion

11.	Which of the following random variables are the default model for random samples?(a) iid(b) id(c) pmd(d) all of the mentionedI had been asked this question in class test.This intriguing question comes from Probability and Statistics in chapter Statistical Inference and Regression Models of Data Science
Answer» Correct choice is (a) iid For EXPLANATION: Random VARIABLES are SAID to be iid if they are independent and IDENTICALLY DISTRIBUTED.

Discussion

12.	Normalized data are centered at ___ and have units equal to standard deviations of the original data.(a) 0(b) 5(c) 1(d) 10I have been asked this question in unit test.I'm obligated to ask this question of Introduction to Regression Models topic in chapter Statistical Inference and Regression Models of Data Science
Answer» The correct choice is (a) 0 To explain: In STATISTICS and applications of statistics, normalization can have a RANGE of MEANINGS.

Discussion

13.	Point out the wrong statement with respect to FDR.(a) FDR is difficult to calculate(b) FDR is relatively less conservative(c) FDR allows for more false positives(d) None of the mentionedThis question was addressed to me during an interview.My question is from Statistical Inference Concepts topic in chapter Statistical Inference and Regression Models of Data Science
Answer» RIGHT ANSWER is (a) FDR is difficult to calculate Explanation: FDR STANDS for false DISCOVERY rate.

Discussion

14.	What is the purpose of multiple testing in statistical inference?(a) Minimize errors(b) Minimize false positives(c) Minimize false negatives(d) All of the mentionedThis question was addressed to me during an interview.My doubt is from Statistical Inference Concepts in division Statistical Inference and Regression Models of Data Science
Answer» The CORRECT answer is (d) All of the mentioned To EXPLAIN: A false positive is an error in some EVALUATION process in which a CONDITION tested for is mistakenly found to have been detected.

Discussion

15.	Which of the following testing is concerned with making decisions using data?(a) Probability(b) Hypothesis(c) Causal(d) None of the mentionedThis question was addressed to me during an interview.The question is from Statistical Inference Concepts topic in section Statistical Inference and Regression Models of Data Science
Answer» CORRECT choice is (b) Hypothesis Easy explanation - The null hypothesis is ASSUMED true and statistical evidence is required to REJECT it in favor of a research or ALTERNATIVE hypothesis.

Discussion

16.	Which of the following goal is incorrectly represented in the below figure?(a) Relationship between variables(b) Distribution of variables(c) Inference about relationships(d) CausalThe question was asked at a job interview.The origin of the question is Common Distributions topic in chapter Statistical Inference and Regression Models of Data Science
Answer» Correct choice is (d) Causal To explain I WOULD say: Causal is not DIRECTLY related to goal of STATISTICAL MODELLING.

Discussion

17.	Which of the following can be useful for diagnosing data entry errors?(a) hat values(b) dffit(c) resid(d) all of the mentionedThe question was asked in a national level competition.This intriguing question comes from Residual Variation and Multivariate in chapter Statistical Inference and Regression Models of Data Science
Answer» Right CHOICE is (a) hat values To explain I would SAY: resid returns the ordinary RESIDUALS.

Discussion

18.	Which of the following function can be replaced with the question mark in the below figure?(a) boxplot(b) lplot(c) levelplot(d) all of the mentionedThis question was posed to me in quiz.The origin of the question is Introduction to Regression Models in portion Statistical Inference and Regression Models of Data Science
Answer» The CORRECT choice is (c) levelplot Explanation: levelplot is USED plotting “IMAGE”.

Discussion

19.	Point out the correct statement.(a) The exponent of a normally distributed random variables follows what is called the log-normal distribution(b) Sums of normally distributed random variables are again normally distributed even if the variables are dependent(c) The square of a standard normal random variable follows what is called chi-squared distribution(d) All of the mentionedI have been asked this question in quiz.My question is based upon Common Distributions topic in section Statistical Inference and Regression Models of Data Science
Answer» The correct CHOICE is (d) All of the mentioned For EXPLANATION: Many random variables, PROPERLY normalized, limit to a normal DISTRIBUTION.

Discussion

20.	Point out the correct statement.(a) Some cumulative distribution function F is non-decreasing and right-continuous(b) Every cumulative distribution function F is decreasing and right-continuous(c) Every cumulative distribution function F is increasing and left-continuous(d) None of the mentionedThe question was posed to me in final exam.My question is based upon Probability and Statistics topic in division Statistical Inference and Regression Models of Data Science
Answer» Correct choice is (d) NONE of the mentioned To explain: EVERY cumulative distribution FUNCTION F is non-decreasing and right-continuous.

Discussion

21.	Principal components or factor analytic models on covariates are often useful for reducing complex covariate spaces.(a) True(b) FalseThe question was posed to me at a job interview.Question is taken from Binary and Count Outcomes in portion Statistical Inference and Regression Models of Data Science
Answer» CORRECT OPTION is (a) True To explain: The space of MODELS explodes quickly as you add INTERACTIONS and polynomial TERMS.

Discussion

22.	How many components are present in generalized linear models?(a) 2(b) 4(c) 6(d) None of the mentionedI had been asked this question in unit test.This intriguing question comes from Binary and Count Outcomes in section Statistical Inference and Regression Models of Data Science
Answer» Correct answer is (d) NONE of the mentioned To EXPLAIN I would say: Generalized LINEAR models INVOLVE three COMPONENTS.

Discussion

23.	Which of the following is the correct formula for total variation?(a) Total Variation = Residual Variation – Regression Variation(b) Total Variation = Residual Variation + Regression Variation(c) Total Variation = Residual Variation * Regression Variation(d) All of the mentionedThe question was posed to me in an interview.The query is from Residual Variation and Multivariate topic in chapter Statistical Inference and Regression Models of Data Science
Answer» The correct choice is (b) Total VARIATION = Residual Variation + Regression Variation Best explanation: The complementary PART of the total variation is CALLED UNEXPLAINED or residual.

Discussion

24.	The _________ of a collection of data is the joint density evaluated as a function of the parameters with the data fixed.(a) probability(b) likelihood(c) poisson distribution(d) all of the mentionedThis question was addressed to me during an interview.My doubt is from Likelihood in portion Statistical Inference and Regression Models of Data Science
Answer» Correct answer is (b) likelihood Best EXPLANATION: Likelihood analysis of DATA USES the likelihood to PERFORM inference REGARDING the unknown parameter.

Discussion

25.	Point out the correct statement.(a) Power of a one sided test is lower than the power of the associated two sided test(b) Power of a two sided test is greater than the power of the associated one sided test(c) Hypothesis testing is less commonly used(d) None of the mentionedThe question was posed to me in an interview for job.This interesting question is from Statistical Inference Concepts topic in chapter Statistical Inference and Regression Models of Data Science
Answer» Right choice is (d) NONE of the mentioned The EXPLANATION: Power of a ONE SIDED test is greater than the power of the associated TWO sided test.

Discussion

26.	Which of the following form the basis for frequency interpretation of probabilities?(a) Asymptotics(b) Symptotics(c) Asymmetry(d) All of the mentionedThis question was addressed to me in a job interview.My doubt is from Common Distributions topic in section Statistical Inference and Regression Models of Data Science
Answer» Right option is (a) Asymptotics The BEST EXPLANATION: Asymptotics is the term for the behavior of statistics as the SAMPLE size.

Discussion

27.	For continuous random variables, the CDF is the derivative of the PDF.(a) True(b) FalseThis question was posed to me in semester exam.Question is from Probability and Statistics topic in chapter Statistical Inference and Regression Models of Data Science
Answer» Right answer is (b) False Best explanation: For continuous RANDOM variables, the PDF is the derivative of the CDF.

Discussion

28.	Which of the following condition should be satisfied by function for pmf?(a) The sum of all of the possible values is 1(b) The sum of all of the possible values is 0(c) The sum of all of the possible values is infinite(d) All of the mentionedThis question was posed to me by my college professor while I was bunking the class.I want to ask this question from Introduction to Statistical Inference topic in division Statistical Inference and Regression Models of Data Science
Answer» CORRECT choice is (a) The sum of all of the possible values is 1 To explain I would SAY: A probability mass FUNCTION evaluated at a VALUE CORRESPONDS to the probability that a random variable takes that value.

Discussion

29.	Which of the following component is involved in generalized linear models?(a) An exponential family model for the response(b) A systematic component via a linear predictor(c) A link function that connects the means of the response to the linear predictor(d) All of the mentionedI had been asked this question in unit test.My question comes from Binary and Count Outcomes topic in portion Statistical Inference and Regression Models of Data Science
Answer» The correct option is (d) All of the mentioned Easy explanation - GLM is a FLEXIBLE generalization of ordinary linear REGRESSION that allows for RESPONSE variables that have error DISTRIBUTION MODELS other than a normal distribution.

Discussion

30.	Which of the following statement is incorrect with respect to outliers?(a) Outliers can have varying degrees of influence(b) Outliers can be the result of spurious or real processes(c) Outliers cannot conform to the regression relationship(d) None of the mentionedI had been asked this question in a job interview.This interesting question is from Residual Variation and Multivariate in division Statistical Inference and Regression Models of Data Science
Answer» The CORRECT choice is (c) OUTLIERS cannot CONFORM to the regression RELATIONSHIP Best EXPLANATION: Outliers can conform to the regression relationship.

Discussion

31.	Which of the following things can be accomplished with linear model?(a) Flexibly fit complicated functions(b) Uncover complex multivariate relationships(c) Build accurate prediction models(d) All of the mentionedThis question was addressed to me during a job interview.My query is from Residual Variation and Multivariate topic in chapter Statistical Inference and Regression Models of Data Science
Answer» The correct option is (d) All of the mentioned Best EXPLANATION: LINEAR MODELS are the single most important applied statistical and MACHINE LEARNING technique.

Discussion

32.	Point out the wrong statement.(a) Asymptotics are used for inference usually(b) Adding squared terms makes it continuously differentiable at the knot points(c) Adding squared terms makes it twice continuously differentiable at the knot points(d) None of the mentionedI had been asked this question in unit test.The query is from Binary and Count Outcomes in section Statistical Inference and Regression Models of Data Science
Answer» Right answer is (C) Adding squared TERMS makes it TWICE continuously differentiable at the knot POINTS The explanation is: Adding cubic terms makes it twice continuously differentiable at the knot points.

Discussion

33.	Residual ______ plots investigate normality of the errors.(a) RR(b) PP(c) QQ(d) None of the mentionedThe question was posed to me in a national level competition.I'm obligated to ask this question of Residual Variation and Multivariate in portion Statistical Inference and Regression Models of Data Science
Answer» Right OPTION is (c) QQ Easy EXPLANATION - PATTERNS in your residual PLOTS generally indicate some POOR aspect of model fit.

Discussion

34.	Which of the following is correct with respect to residuals?(a) Positive residuals are above the line, negative residuals are below(b) Positive residuals are below the line, negative residuals are above(c) Positive residuals and negative residuals are below the line(d) All of the mentionedI have been asked this question in final exam.This intriguing question originated from Introduction to Regression Models topic in section Statistical Inference and Regression Models of Data Science
Answer» Correct choice is (a) Positive residuals are above the LINE, negative residuals are below The explanation is: Residuals can be thought of as the OUTCOME with the linear association of the predictor REMOVED.

Discussion

35.	Which of the following is the oldest multiple testing correction?(a) Bonferroni correction(b) Bernoulli correction(c) Likelihood correction(d) All of the mentionedI got this question by my school principal while I was bunking the class.My query is from Statistical Inference Concepts topic in section Statistical Inference and Regression Models of Data Science
Answer» CORRECT answer is (a) Bonferroni CORRECTION To EXPLAIN: Bonferroni correction is EASY to calculate.

Discussion

36.	Chebyshev’s inequality states that the probability of a “Six Sigma” event is less than ___________(a) 10%(b) 20%(c) 30%(d) 3%I had been asked this question by my school principal while I was bunking the class.The doubt is from Probability and Statistics in portion Statistical Inference and Regression Models of Data Science
Answer» CORRECT option is (d) 3% The best explanation: If a bell CURVE is ASSUMED, the probability of a “six sigma” EVENT is on the order of ONE ten millionth of a percent.

Discussion

37.	Point out the wrong statement.(a) A percentile is simply a quantile with expressed as a percent(b) There are two types of random variable(c) R cannot approximate quantiles for you for common distributions(d) None of the mentionedI had been asked this question at a job interview.My question comes from Probability and Statistics topic in section Statistical Inference and Regression Models of Data Science
Answer» The correct option is (c) R cannot APPROXIMATE quantiles for you for COMMON distributions Best EXPLANATION: R can approximate quantiles for you for common distributions.

Discussion

38.	Multivariate regression estimates are exactly those having removed the linear relationship of the other variables from both the regressor and response.(a) True(b) FalseI have been asked this question during an interview.The doubt is from Residual Variation and Multivariate topic in division Statistical Inference and Regression Models of Data Science
Answer» Right CHOICE is (a) True To explain: Multivariate Data Analysis REFERS to any statistical technique USED to analyze data that ARISES from more than ONE variable.

Discussion

39.	Point out the wrong statement.(a) The fraction of variance unexplained is an established concept in the context of linear regression(b) “Explained variance” is routinely used in principal component analysis(c) The general linear model extends simple linear regression (SLR) by adding terms linearly into the model(d) None of the mentionedThis question was addressed to me in class test.My query is from Residual Variation and Multivariate topic in chapter Statistical Inference and Regression Models of Data Science
Answer» RIGHT option is (d) None of the mentioned The best I can explain: LINEARITY REFERS to a mathematical relationship or function that can be graphically REPRESENTED as a STRAIGHT line.

Discussion

40.	Minimizing the likelihood is the same as maximizing -2 log likelihood.(a) True(b) FalseThe question was asked in an interview for job.This question is from Introduction to Regression Models in portion Statistical Inference and Regression Models of Data Science
Answer» CORRECT option is (a) True To EXPLAIN I WOULD say: Maximizing the LIKELIHOOD is the same as MINIMIZING 2 log likelihood.

Discussion

41.	Point out the wrong statement.(a) Regression through the origin yields an equivalent slope if you center the data first(b) Normalizing variables results in the slope being the correlation(c) Least squares is not an estimation tool(d) None of the mentionedThe question was posed to me in exam.Asked question is from Introduction to Regression Models topic in chapter Statistical Inference and Regression Models of Data Science
Answer» RIGHT option is (c) LEAST SQUARES is not an estimation TOOL Explanation: Least squares is an estimation tool.

Discussion

42.	The pooled estimator is a mixture of the group variances, placing greater weight on whichever has a larger sample size.(a) True(b) FalseI got this question in an interview for job.This question is from Statistical Inference Concepts topic in chapter Statistical Inference and Regression Models of Data Science
Answer» CORRECT ANSWER is (a) True The best I can EXPLAIN: If the SAMPLE sizes are the same the pooled variance estimate is the average of the group VARIANCES.

Discussion

43.	Which of the following can be considered as random variable?(a) The outcome from the roll of a die(b) The outcome of flip of a coin(c) The outcome of exam(d) All of the mentionedThis question was posed to me in exam.I'm obligated to ask this question of Introduction to Statistical Inference in chapter Statistical Inference and Regression Models of Data Science
Answer» The CORRECT CHOICE is (d) All of the mentioned To EXPLAIN: The PROBABILITY distribution of a discrete random variable is a LIST of probabilities associated with each of its possible values.

Discussion

44.	Which of the following show residuals divided by their standard deviations?(a) rstudent(b) cooks.distance(c) rstandard(d) all of the mentionedI had been asked this question in exam.This intriguing question originated from Residual Variation and Multivariate in chapter Statistical Inference and Regression Models of Data Science
Answer» The correct choice is (C) rstandard To EXPLAIN: rstandard STANDS for STANDARDIZED residuals.

Discussion

45.	Which of the following tool is used for estimating standard errors and the bias of estimators?(a) knitr(b) jackknife(c) ggplot2(d) all of the mentionedI had been asked this question by my school principal while I was bunking the class.Asked question is from Statistical Inference Concepts topic in division Statistical Inference and Regression Models of Data Science
Answer» CORRECT OPTION is (C) ggplot2 Easy EXPLANATION - jackknife involves RESAMPLING data.

Discussion

46.	Gosset’s distribution is invented by which of the following scientist?(a) William Gosset(b) William Gosling(c) Gosling Gosset(d) All of the mentionedThe question was asked in examination.My enquiry is from Likelihood topic in division Statistical Inference and Regression Models of Data Science
Answer» The CORRECT ANSWER is (a) WILLIAM Gosset The explanation: Gosset’s DISTRIBUTION is INDEXED by a degrees of freedom.

Discussion

47.	Bernoulli random variables take (only) the values 1 and 0.(a) True(b) FalseThis question was addressed to me in an international level competition.This intriguing question originated from Common Distributions in section Statistical Inference and Regression Models of Data Science
Answer» Right answer is (a) True The best EXPLANATION: The Bernoulli DISTRIBUTION ARISES as the result of a BINARY OUTCOME.

Discussion

48.	Which of the following is incorrect with respect to use of Poisson distribution?(a) Modeling event/time data(b) Modeling bounded count data(c) Modeling contingency tables(d) All of the mentionedThe question was asked during an online exam.I would like to ask this question from Common Distributions topic in division Statistical Inference and Regression Models of Data Science
Answer» The CORRECT CHOICE is (B) Modeling bounded count DATA Easy explanation - Poisson distribution is used for modeling unbounded count data.

Discussion

49.	Which of the following inequality is useful for interpreting variances?(a) Chebyshev(b) Stautaory(c) Testory(d) All of the mentionedI had been asked this question by my school teacher while I was bunking the class.Asked question is from Probability and Statistics in division Statistical Inference and Regression Models of Data Science
Answer» CORRECT option is (a) Chebyshev The BEST explanation: Chebyshev’s inequality is also SPELLED as Tchebysheff’s inequality.

Discussion

50.	Bayesian inference uses frequency interpretations of probabilities to control error rates.(a) True(b) FalseI had been asked this question in an online interview.This intriguing question originated from Introduction to Statistical Inference in section Statistical Inference and Regression Models of Data Science
Answer» Correct choice is (b) False To explain I WOULD say: FREQUENCY inference uses frequency INTERPRETATIONS of probabilities to CONTROL error rates.

Discussion

Explore topic-wise InterviewSolutions in .

How many components are present in generalized linear models?(a) 2(b) 4(c) 6(d) None of the mentionedI had been asked this question in unit test.This intriguing question comes from Binary and Count Outcomes in section Statistical Inference and Regression Models of Data Science

For continuous random variables, the CDF is the derivative of the PDF.(a) True(b) FalseThis question was posed to me in semester exam.Question is from Probability and Statistics topic in chapter Statistical Inference and Regression Models of Data Science

Minimizing the likelihood is the same as maximizing -2 log likelihood.(a) True(b) FalseThe question was asked in an interview for job.This question is from Introduction to Regression Models in portion Statistical Inference and Regression Models of Data Science

Gosset’s distribution is invented by which of the following scientist?(a) William Gosset(b) William Gosling(c) Gosling Gosset(d) All of the mentionedThe question was asked in examination.My enquiry is from Likelihood topic in division Statistical Inference and Regression Models of Data Science

Bernoulli random variables take (only) the values 1 and 0.(a) True(b) FalseThis question was addressed to me in an international level competition.This intriguing question originated from Common Distributions in section Statistical Inference and Regression Models of Data Science