Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

51.

Which of the following is correct order of working?(a) questions->input data ->algorithms(b) questions->evaluation ->algorithms(c) evaluation->input data ->algorithms(d) all of the mentionedThe question was posed to me in an online quiz.This interesting question is from Prediction Motivation in portion Machine Learning of Data Science

Answer» CORRECT CHOICE is (a) QUESTIONS->input DATA ->algorithms

For explanation: Evaluation is done in the last.
52.

Which of the following is the valid component of the predictor?(a) data(b) question(c) algorithm(d) all of the mentionedI had been asked this question in an interview.My question is from Prediction Motivation topic in portion Machine Learning of Data Science

Answer»

Right choice is (d) all of the mentioned

For EXPLANATION: A prediction is a STATEMENT about the FUTURE.

53.

The function preProcess estimates the required parameters for each operation.(a) True(b) FalseI have been asked this question by my college director while I was bunking the class.Origin of the question is caret in portion Machine Learning of Data Science

Answer»

The CORRECT option is (a) True

The best I can explain: predict.preProcess is USED to APPLY them to specific data SETS.

54.

Which of the following library is used for boosting generalized additive models?(a) gamBoost(b) gbm(c) ada(d) all of the mentionedI have been asked this question in my homework.Origin of the question is Predicting with Regression in portion Machine Learning of Data Science

Answer»

Right answer is (a) gamBoost

The best EXPLANATION: BOOSTING can be used with any SUBSET of CLASSIFIER.

55.

Which of the following expression is true?(a) In sample error < out sample error(b) In sample error > out sample error(c) In sample error = out sample error(d) All of the mentionedThis question was addressed to me during an internship interview.My question comes from Prediction Motivation in division Machine Learning of Data Science

Answer»

Right CHOICE is (a) In sample error < out sample error

The BEST I can EXPLAIN: Out of sample error is given more importance.

56.

Which of the following is correct with respect to random forest?(a) Random forest are difficult to interpret but often very accurate(b) Random forest are easy to interpret but often very accurate(c) Random forest are difficult to interpret but very less accurate(d) None of the mentionedThe question was asked in an international level competition.This is a very interesting question from Predicting with Regression topic in section Machine Learning of Data Science

Answer»

Right answer is (a) Random FOREST are difficult to interpret but OFTEN very accurate

The explanation: Random forest is TOP PERFORMING algorithm in prediction.

57.

Point out the correct statement.(a) In Sample Error is the error rate you get on the same dataset used to model a predictor(b) Data have two parts-signal and noise(c) The goal of predictor is to find signal(d) None of the mentionedI have been asked this question by my school principal while I was bunking the class.The origin of the question is Prediction Motivation topic in section Machine Learning of Data Science

Answer»

The correct CHOICE is (d) NONE of the mentioned

Best explanation: Perfect in SAMPLE prediction can be BUILT.

58.

Which of the following function can be used to identify near zero-variance variables?(a) zeroVar(b) nearVar(c) nearZeroVar(d) all of the mentionedThe question was asked by my school principal while I was bunking the class.Question is from caret topic in section Machine Learning of Data Science

Answer»

The correct option is (c) nearZeroVar

Easiest explanation - The saveMetrics argument can be used to SHOW the DETAILS and USUALLY defaults to FALSE.

59.

Which of the following method options is provided by train function for bagging?(a) bagEarth(b) treebag(c) bagFDA(d) all of the mentionedThe question was asked in exam.My doubt stems from Predicting with Regression in section Machine Learning of Data Science

Answer»

The CORRECT ANSWER is (d) all of the mentioned

To explain: Bagging can be DONE using BAG function as WELL.

60.

For k cross-validation, larger k value implies more bias.(a) True(b) FalseThis question was posed to me during an interview.My question is based upon Cross Validation topic in chapter Machine Learning of Data Science

Answer» RIGHT answer is (b) False

Easy EXPLANATION - For k cross-validation, LARGER k value IMPLIES less BIAS.
61.

True positive means correctly rejected.(a) True(b) FalseThis question was addressed to me in an online quiz.The question is from Prediction Motivation in portion Machine Learning of Data Science

Answer»

Correct ANSWER is (B) False

To explain I would say: True positive MEANS correctly identified.

62.

Which of the following is characteristic of best machine learning method?(a) Fast(b) Accuracy(c) Scalable(d) All of the mentionedThe question was asked in an internship interview.The question is from Prediction Motivation in division Machine Learning of Data Science

Answer» CORRECT ANSWER is (d) All of the mentioned

Explanation: There is ALWAYS a trade-off in PREDICTION ACCURACY.
63.

Which of the following model model include a backwards elimination feature selection routine?(a) MCV(b) MARS(c) MCRS(d) All of the mentionedI had been asked this question in an interview.I'd like to ask this question from caret in chapter Machine Learning of Data Science

Answer»

Correct answer is (B) MARS

The EXPLANATION is: MARS stands for MULTIVARIATE Adaptive REGRESSION Splines.

64.

Which of the following function tracks the changes in model statistics?(a) varImp(b) varImpTrack(c) findTrack(d) none of the mentionedThis question was posed to me in a national level competition.This is a very interesting question from caret topic in portion Machine Learning of Data Science

Answer»

The correct ANSWER is (a) varImp

For EXPLANATION: GCV CHANGE value can ALSO be tracked.

65.

Which of the following function is used to generate the class distances?(a) preprocess.classDist(b) predict.classDist(c) predict.classDistance(d) all of the mentionedThis question was posed to me during an interview.Query is from caret topic in division Machine Learning of Data Science

Answer»

The CORRECT ANSWER is (B) predict.classDist

To EXPLAIN: By default, the distances are logged.

66.

Point out the correct statement.(a) Combining classifiers improves interpretability(b) Combining classifiers reduces accuracy(c) Combining classifiers improves accuracy(d) All of the mentionedI got this question in a job interview.My doubt stems from Model Based Prediction topic in chapter Machine Learning of Data Science

Answer» CORRECT choice is (c) COMBINING CLASSIFIERS improves accuracy

The EXPLANATION: You can combine classifier by AVERAGING.
67.

Which of the following method can be used to combine different classifiers?(a) Model stacking(b) Model combining(c) Model structuring(d) None of the mentionedI have been asked this question by my school principal while I was bunking the class.This question is from Model Based Prediction in portion Machine Learning of Data Science

Answer»

Right CHOICE is (a) MODEL stacking

Explanation: Model ensembling is ALSO USED for combining different CLASSIFIERS.

68.

Which of the following is statistical boosting based on additive logistic regression?(a) gamBoost(b) gbm(c) ada(d) mboostThis question was addressed to me in an online interview.I need to ask this question from Predicting with Regression in division Machine Learning of Data Science

Answer» CORRECT answer is (a) gamBoost

To explain: MBOOST is USED for model based BOOSTING.
69.

Point out the wrong statement.(a) Training and testing data must be processed in different way(b) Test transformation would mostly be imperfect(c) The first goal is statistical and second is data compression in PCA(d) All of the mentionedI have been asked this question during a job interview.This intriguing question originated from Predicting with Regression topic in section Machine Learning of Data Science

Answer» RIGHT option is (a) Training and testing data must be PROCESSED in DIFFERENT way

The explanation: Training and testing data must be processed in same way.
70.

Which of the following is correct use of cross validation?(a) Selecting variables to include in a model(b) Comparing predictors(c) Selecting parameters in prediction function(d) All of the mentionedThis question was posed to me in class test.My doubt stems from Cross Validation in chapter Machine Learning of Data Science

Answer» CORRECT option is (d) All of the mentioned

Explanation: Cross-validation is also USED to pick TYPE of PREDICTION FUNCTION to be used.
71.

Point out the wrong statement.(a) In Sample Error is also called generalization error(b) Out of Sample Error is the error rate you get on the new dataset(c) In Sample Error is also called resubstitution error(d) All of the mentionedI have been asked this question during an online exam.My doubt stems from Prediction Motivation in chapter Machine Learning of Data Science

Answer»

The correct ANSWER is (a) In Sample ERROR is also CALLED generalization error

Easiest EXPLANATION - Out of Sample Error is also called generalization error.

72.

Point out the correct statement.(a) The difference between the class centroids and the overall centroid is used to measure the variable influence(b) The Bagged Trees output contains variable usage statistics(c) Boosted Trees uses different approach as a single tree(d) None of the mentionedThis question was addressed to me in an internship interview.I want to ask this question from caret topic in section Machine Learning of Data Science

Answer»

Correct choice is (a) The difference between the class centroids and the overall CENTROID is used to measure the variable influence

Best EXPLANATION: The larger the difference between the class centroid and the overall CENTER of the data, the larger the SEPARATION between the CLASSES.

73.

Which of the following can be used to impute data sets based only on information in the training set?(a) postProcess(b) preProcess(c) process(d) all of the mentionedI have been asked this question in final exam.My question is taken from caret in section Machine Learning of Data Science

Answer»

The correct option is (b) preProcess

The BEST I can EXPLAIN: This can be DONE with K-nearest NEIGHBORS.

74.

Point out the wrong combination.(a) True negative=correctly rejected(b) False negative=correctly rejected(c) False positive=correctly identified(d) All of the mentionedI have been asked this question during an interview.Question is from Cross Validation topic in division Machine Learning of Data Science

Answer» CORRECT OPTION is (c) FALSE positive=correctly identified

The best I can explain: False positive MEANS incorrectly identified.
75.

Point out the wrong statement.(a) The trapezoidal rule is used to compute the area under the ROC curve(b) For regression, the relationship between each predictor and the outcome is evaluated(c) An argument, para, is used to pick the model fitting technique(d) All of the mentionedI have been asked this question during an interview for a job.I need to ask this question from caret in chapter Machine Learning of Data Science

Answer»

Right OPTION is (C) An argument, PARA, is used to pick the MODEL fitting technique

Best EXPLANATION: An argument, nonpara, is used to pick the model fitting technique.

76.

Which of the following function can be used to maximize the minimum dissimilarities?(a) sumDiss(b) minDiss(c) avgDiss(d) all of the mentionedI got this question by my school principal while I was bunking the class.I'm obligated to ask this question of caret topic in portion Machine Learning of Data Science

Answer»

Correct CHOICE is (d) all of the mentioned

To EXPLAIN: SUMDISS can be used to maximize the TOTAL dissimilarities.

77.

Which of the following function can be used to create balanced splits of the data?(a) newDataPartition(b) createDataPartition(c) renameDataPartition(d) none of the mentionedThis question was posed to me in final exam.The doubt is from caret in division Machine Learning of Data Science

Answer»

Correct choice is (b) createDataPartition

Easiest EXPLANATION - If the y argument to this function is a factor, the random sampling OCCURS within each class and should PRESERVE the overall class distribution of the DATA.

78.

Which of the following shows correct relative order of importance?(a) question->features->data->algorithms(b) question->data->features->algorithms(c) algorithms->data->features->question(d) none of the mentionedI have been asked this question in unit test.I want to ask this question from Prediction Motivation topic in division Machine Learning of Data Science

Answer»

Correct option is (B) QUESTION->data->FEATURES->algorithms

Explanation: Garbage in should be equal to garbage out.

79.

Which of the following function is a wrapper for different lattice plots to visualize the data?(a) levelplot(b) featurePlot(c) plotsample(d) none of the mentionedThe question was posed to me at a job interview.The origin of the question is caret in portion Machine Learning of Data Science

Answer»

The CORRECT answer is (B) featurePlot

The explanation: featurePlot is USED for DATA visualization in caret.

80.

Which of the following can also be used to find new variables that are linear combinations of the original set with independent components?(a) ICA(b) SCA(c) PCA(d) None of the mentionedI got this question in examination.The doubt is from caret in section Machine Learning of Data Science

Answer»

Correct OPTION is (a) ICA

Best explanation: ICA STANDS for INDEPENDENT component ANALYSIS.

81.

Which of the following package tools are present in caret?(a) pre-processing(b) feature selection(c) model tuning(d) all of the mentionedThe question was posed to me during an online exam.This question is from caret topic in chapter Machine Learning of Data Science

Answer»

Correct OPTION is (d) all of the mentioned

For EXPLANATION: There are many DIFFERENT MODELING functions in R.

82.

Model based prediction considers relatively easy version for covariance matrix.(a) True(b) FalseI have been asked this question in unit test.I need to ask this question from Model Based Prediction in portion Machine Learning of Data Science

Answer»

The correct answer is (b) False

The best EXPLANATION: Model BASED PREDICTION considers relatively easy version for covariance matrix.

83.

Predicting with trees evaluate _____________ within each group of data.(a) equality(b) homogeneity(c) heterogeneity(d) all of the mentionedThe question was asked in quiz.I'm obligated to ask this question of Predicting with Regression topic in section Machine Learning of Data Science

Answer» CORRECT answer is (B) homogeneity

The EXPLANATION: Predicting with TREES is EASY to interpret.
84.

Which of the following method is used for trainControl resampling?(a) repeatedcv(b) svm(c) bag32(d) none of the mentionedI have been asked this question during an online interview.This intriguing question comes from Cross Validation in chapter Machine Learning of Data Science

Answer» CORRECT ANSWER is (a) repeatedcv

To EXPLAIN: repeatedcv STANDS for REPEATED cross-validation.
85.

Which of the following model sums the importance over each boosting iteration?(a) Boosted trees(b) Bagged trees(c) Partial least squares(d) None of the mentionedThe question was asked at a job interview.My enquiry is from caret in chapter Machine Learning of Data Science

Answer»

The correct option is (a) Boosted trees

The explanation: GBM PACKAGE can be used here.

86.

The advantage of using a model-based approach is that is more closely tied to the model performance.(a) True(b) FalseI got this question by my school principal while I was bunking the class.This interesting question is from caret topic in division Machine Learning of Data Science

Answer»

Right choice is (a) True

To EXPLAIN I WOULD say: Model-based APPROACH is able to INCORPORATE the correlation structure between the predictors into the IMPORTANCE calculation.

87.

varImp is a wrapper around the evimp function in the _______ package.(a) numpy(b) earth(c) plot(d) none of the mentionedI got this question in an online interview.My doubt stems from caret topic in section Machine Learning of Data Science

Answer»

The CORRECT choice is (b) EARTH

Explanation: The earth package is an implementation of Jerome Friedman’s MULTIVARIATE Adaptive REGRESSION Splines.

88.

Which of the following curve analysis is conducted on each predictor for classification?(a) NOC(b) ROC(c) COC(d) All of the mentionedThe question was asked in a job interview.I'm obligated to ask this question of caret topic in portion Machine Learning of Data Science

Answer»

The correct answer is (B) ROC

The best I can explain: For two class PROBLEMS, a series of cutoffs is APPLIED to the predictor data to predict the class.

89.

Which of the following trade-off occurs during prediction?(a) Speed vs Accuracy(b) Simplicity vs Accuracy(c) Scalability vs Accuracy(d) None of the mentionedI have been asked this question in an interview.I want to ask this question from Prediction Motivation in section Machine Learning of Data Science

Answer» CORRECT choice is (d) NONE of the mentioned

Explanation: Interpretability ALSO MATTERS during PREDICTION.
90.

Which of the following argument is used to set importance values?(a) scale(b) set(c) value(d) all of the mentionedThis question was addressed to me in an interview for job.I want to ask this question from caret in portion Machine Learning of Data Science

Answer»

Correct option is (a) scale

Explanation: All MEASURES of IMPORTANCE are SCALED to have a maximum VALUE of 100.

91.

Point out the correct statement.(a) Asymptotics are used for inference usually(b) Caret includes several functions to pre-process the predictor data(c) The function dummyVars can be used to generate a complete set of dummy variables from one or more factors(d) All of the mentionedI had been asked this question in semester exam.My question comes from caret in portion Machine Learning of Data Science

Answer»

The correct ANSWER is (d) All of the mentioned

Explanation: The function dummyVars takes a formula and a data set and outputs an object that can be USED to create the dummy variables using the PREDICT METHOD.

92.

Which of the following function can be used for forecasting?(a) predict(b) forecast(c) ets(d) all of the mentionedThe question was asked in a national level competition.This interesting question is from Model Based Prediction topic in division Machine Learning of Data Science

Answer» CORRECT option is (b) forecast

The best I can explain: FORECASTING is the process of making predictions of the FUTURE based on PAST and PRESENT data and analysis of trends.
93.

Which of the following function provides unsupervised prediction?(a) cl_forecast(b) cl_nowcast(c) cl_precast(d) none of the mentionedThe question was posed to me during an online exam.I'm obligated to ask this question of Model Based Prediction topic in division Machine Learning of Data Science

Answer»

Right choice is (d) NONE of the mentioned

Easy explanation - cl_predict function is CLUE package provides unsupervised PREDICTION.

94.

Which of the following is correct about regularized regression?(a) Can help with bias trade-off(b) Cannot help with model selection(c) Cannot help with variance trade-off(d) All of the mentionedI got this question in unit test.This key question is from Model Based Prediction in division Machine Learning of Data Science

Answer» CORRECT option is (a) Can HELP with BIAS trade-off

Best explanation: Regularized REGRESSION does not perform as well as random forest.
95.

Which of the following is one of the largest boost subclass in boosting?(a) variance boosting(b) gradient boosting(c) mean boosting(d) all of the mentionedThis question was addressed to me in an interview.Enquiry is from Predicting with Regression in division Machine Learning of Data Science

Answer»

Right choice is (B) gradient BOOSTING

Best explanation: R has MULTIPLE boosting libraries.

96.

Which of the following function can create the indices for time series type of splitting?(a) newTimeSlices(b) createTimeSlices(c) binTimeSlices(d) none of the mentionedI have been asked this question in homework.I need to ask this question from caret topic in division Machine Learning of Data Science

Answer» RIGHT OPTION is (B) createTimeSlices

The best I can explain: Rolling FORECASTING ORIGIN techniques are associated with time series type of splitting.
97.

The principal components are equal to left singular values if you first scale the variables.(a) True(b) FalseI got this question by my school teacher while I was bunking the class.This is a very interesting question from Predicting with Regression topic in division Machine Learning of Data Science

Answer»

The CORRECT answer is (b) False

The best I can EXPLAIN: The principal components are EQUAL to left singular values if you FIRST scale the variables.

98.

Point out the correct statement.(a) Prediction with regression is easy to implement(b) Prediction with regression is easy to interpret(c) Prediction with regression performs well when linear model is correct(d) All of the mentionedThe question was posed to me in semester exam.I need to ask this question from Predicting with Regression in portion Machine Learning of Data Science

Answer»

The CORRECT option is (d) All of the mentioned

Best explanation: PREDICTION with regression gives poor PERFORMANCE in non LINEAR settings.

99.

Which of the following is a categorical outcome?(a) RMSE(b) RSquared(c) Accuracy(d) All of the mentionedI got this question by my college director while I was bunking the class.I need to ask this question from Cross Validation topic in division Machine Learning of Data Science

Answer»

Right answer is (C) Accuracy

The explanation is: RMSE stands for ROOT Mean SQUARED Error.

100.

Point out the wrong statement.(a) ROC curve stands for receiver operating characteristic(b) Foretime series, data must be in chunks(c) Random sampling must be done with replacement(d) None of the mentionedThis question was addressed to me during an interview.This question is from Cross Validation topic in portion Machine Learning of Data Science

Answer»

Right answer is (d) None of the mentioned

Explanation: RANDOM sampling with REPLACEMENT is the BOOTSTRAP.