39 + Interview Questions in Data Science Interview Questions for Experienced in Data Science Interview Questions Page 1 InterviewSolution

1.	How is the grid search parameter different from the random search tuning strategy?
Answer» Tuning strategies are used to find the right set of hyperparameters. Hyperparameters are those properties that are fixed and model-specific before the model is TESTED or trained on the dataset. Both the grid search and random search tuning strategies are optimization techniques to find efficient hyperparameters. Grid Search: Here, every combination of a preset list of hyperparameters is tried out and evaluated. The search pattern is similar to searching in a grid where the values are in a matrix and a search is performed. Each parameter set is tried out and their accuracy is tracked. after every combination is tried out, the model with the highest accuracy is chosen as the best one. The main drawback here is that, if the number of hyperparameters is increased, the technique suffers. The number of evaluations can increase exponentially with each increase in the hyperparameter. This is called the problem of dimensionality in a grid search. Random Search: In this technique, random combinations of hyperparameters set are tried and evaluated for finding the best solution. For optimizing the search, the function is tested at random configurations in parameter space as shown in the image below. In this method, there are increased chances of finding optimal parameters because the pattern followed is random. There are chances that the model is trained on optimized parameters without the need for aliasing. This search works the best when there is a lower number of dimensions as it takes less time to find the right set. Conclusion: Data Science is a very vast field and comprises many topics like Data MINING, Data Analysis, Data Visualization, Machine Learning, Deep Learning, and most importantly it is laid on the foundation of MATHEMATICAL concepts like Linear Algebra and Statistical analysis. Since there are a LOT of pre-requisites for becoming a good professional Data Scientist, the perks and benefits are very big. Data Scientist has become the most sought job role these days. Looking for a comprehensive course on Data Science: Check out our Offerings. Useful Resources: Best Data Science Courses Data Scientist Salary Data Science Resume Data Analyst: Career Guide Tableau Interview Additional Technical Interview Questions

1.

How is the grid search parameter different from the random search tuning strategy?

Answer»

Tuning strategies are used to find the right set of hyperparameters. Hyperparameters are those properties that are fixed and model-specific before the model is TESTED or trained on the dataset. Both the grid search and random search tuning strategies are optimization techniques to find efficient hyperparameters.

Grid Search:
- Here, every combination of a preset list of hyperparameters is tried out and evaluated.
- The search pattern is similar to searching in a grid where the values are in a matrix and a search is performed. Each parameter set is tried out and their accuracy is tracked. after every combination is tried out, the model with the highest accuracy is chosen as the best one.
- The main drawback here is that, if the number of hyperparameters is increased, the technique suffers. The number of evaluations can increase exponentially with each increase in the hyperparameter. This is called the problem of dimensionality in a grid search.

Random Search:
- In this technique, random combinations of hyperparameters set are tried and evaluated for finding the best solution. For optimizing the search, the function is tested at random configurations in parameter space as shown in the image below.
- In this method, there are increased chances of finding optimal parameters because the pattern followed is random. There are chances that the model is trained on optimized parameters without the need for aliasing.
- This search works the best when there is a lower number of dimensions as it takes less time to find the right set.

Conclusion:

Data Science is a very vast field and comprises many topics like Data MINING, Data Analysis, Data Visualization, Machine Learning, Deep Learning, and most importantly it is laid on the foundation of MATHEMATICAL concepts like Linear Algebra and Statistical analysis. Since there are a LOT of pre-requisites for becoming a good professional Data Scientist, the perks and benefits are very big. Data Scientist has become the most sought job role these days.

Looking for a comprehensive course on Data Science: Check out our Offerings.

Useful Resources:

Best Data Science Courses
Data Scientist Salary
Data Science Resume
Data Analyst: Career Guide
Tableau Interview
Additional Technical Interview Questions

4.	How is feature selection performed using the regularization method?
Answer» The method of regularization ENTAILS the addition of PENALTIES to different parameters in the machine learning model for reducing the freedom of the model to avoid the issue of overfitting. There are various regularization methods available such as linear model regularization, Lasso/L1 regularization, etc. The linear model regularization applies penalty over COEFFICIENTS that multiplies the predictors. The Lasso/L1 regularization has the feature of shrinking some coefficients to zero, thereby making it ELIGIBLE to be removed from the model.

6.	Is it good to do dimensionality reduction before fitting a Support Vector Model?
Answer» If the features number is greater than observations then doing DIMENSIONALITY REDUCTION IMPROVES the SVM (Support VECTOR MODEL).

7.	Give one example where both false positives and false negatives are important equally?
Answer» In Banking fields: Lending loans are the main sources of income to the BANKS. But if the repayment rate isn’t good, then there is a risk of huge LOSSES INSTEAD of any profits. So GIVING out loans to customers is a gamble as banks can’t risk losing good customers but at the same time, they can’t AFFORD to acquire bad customers. This case is a classic example of equal importance in false positive and false negative scenarios.

11.	What is better - random forest or multiple decision trees?
Answer» Random forest is BETTER than multiple DECISION trees as random forests are much more robust, accurate, and lesser PRONE to OVERFITTING as it is an ensemble METHOD that ensures multiple weak decision trees learn strongly.

13.	Differentiate between box plot and histogram.
Answer» Box plots and histograms are both visualizations used for showing data DISTRIBUTIONS for efficient COMMUNICATION of information. Histograms are the bar CHART representation of information that represents the frequency of numerical variable values that are useful in estimating probability distribution, variations and outliers. Boxplots are used for communicating different aspects of data distribution where the shape of the distribution is not seen but still the insights can be gathered. These are useful for comparing multiple CHARTS at the same time as they take less space when COMPARED to histograms.

14.	What do you understand by a kernel trick?
Answer» Kernel functions are GENERALIZED dot PRODUCT functions USED for the COMPUTING dot product of VECTORS xx and yy in high dimensional feature space. Kernal trick method is used for solving a non-linear problem by using a linear classifier by transforming linearly inseparable data into separable ones in higher dimensions.

15.	What is the difference between the Test set and validation set?
Answer» The test set is USED to test or EVALUATE the performance of the TRAINED model. It evaluates the PREDICTIVE power of the model. The VALIDATION set is part of the training set that is used to select parameters for avoiding model overfitting.

19.	Will treating categorical variables as continuous variables result in a better predictive model?
Answer» Yes! A CATEGORICAL variable is a variable that can be assigned to two or more CATEGORIES with no definite category ORDERING. ORDINAL variables are similar to categorical variables with proper and clear ordering defines. So, if the variable is ordinal, then treating the categorical VALUE as a continuous variable will result in better predictive models.

29.	Since you have experience in the deep learning field, can you tell us why TensorFlow is the most preferred library in deep learning?
Answer» Tensorflow is a very famous library in DEEP learning. The reason is PRETTY simple actually. It provides C++ as well as Python APIs which makes it very easier to work on. ALSO, TensorFlow has a fast COMPILATION speed as COMPARED to Keras and Torch (other famous deep learning libraries). Apart from that, Tenserflow supports both GPU and CPU computing devices. Hence, it is a major success and a very popular library for deep learning.

31.	What are Exploding Gradients and Vanishing Gradients?
Answer» EXPLODING GRADIENTS: Let us say that you are training an RNN. Say, you saw exponentially growing error gradients that accumulate, and as a result of this, very large updates are made to the neural network model weights. These exponentially growing error gradients that update the neural network weights to a great extent are called Exploding Gradients. Vanishing Gradients: Let us say again, that you are training an RNN. Say, the slope became too small. This problem of the slope becoming too small is called Vanishing Gradient. It causes a major increase in the training time and causes POOR performance and extremely low accuracy.

33.	What is a computational graph?
Answer» A computational GRAPH is also known as a “Dataflow Graph”. Everything in the famous deep learning LIBRARY TensorFlow is based on the computational graph. The computational graph in Tensorflow has a network of nodes where each node operates. The nodes of this graph represent OPERATIONS and the edges represent TENSORS.

39.	How are the time series problems different from other regression problems?
Answer» Time series data can be thought of as an extension to linear regression which uses terms LIKE autocorrelation, movement of averages for summarizing historical data of y-axis variables for predicting a better future. Forecasting and prediction is the MAIN goal of time series problems where accurate predictions can be MADE but sometimes the UNDERLYING reasons might not be known. Having Time in the problem does not necessarily mean it becomes a time series problem. There should be a relationship between target and time for a problem to become a time series problem. The observations close to one another in time are EXPECTED to be similar to the ones far away which provide accountability for seasonality. For instance, today’s weather would be similar to tomorrow’s weather but not similar to weather from 4 months from today. Hence, weather prediction based on past data becomes a time series problem.

37.	What are Support Vectors in SVM (Support Vector Machine)?
Answer» In the above DIAGRAM, we can see that the thin lines MARK the distance from the classifier to the closest data points (darkened data points). These are called support vectors. So, we can DEFINE the support vectors as the data points or vectors that are nearest (closest) to the hyperplane. They AFFECT the position of the hyperplane. Since they support the hyperplane, they are KNOWN as support vectors.

Explore topic-wise InterviewSolutions in .

How is the grid search parameter different from the random search tuning strategy?

What is the importance of dimensionality reduction?

How do you identify if a coin is biased?

How is feature selection performed using the regularization method?

What are various assumptions used in linear regression? What would happen if they are violated?

Is it good to do dimensionality reduction before fitting a Support Vector Model?

Give one example where both false positives and false negatives are important equally?

What are some examples when false positive has proven important than false negative?

Toss the selected coin 10 times from a jar of 1000 coins. Out of 1000 coins, 999 coins are fair and 1 coin is double-headed, assume that you see 10 heads. Estimate the probability of getting a head in the next coin toss.

Consider a case where you know the probability of finding at least one shooting star in a 15-minute interval is 30%. Evaluate the probability of finding at least one shooting star in a one-hour duration?

What is better - random forest or multiple decision trees?

How will you balance/correct imbalanced data?

Differentiate between box plot and histogram.

What do you understand by a kernel trick?

What is the difference between the Test set and validation set?

What are the differences between univariate, bivariate and multivariate analysis?

What does the ROC Curve represent and how to create it?

How will you treat missing values during data analysis?

Will treating categorical variables as continuous variables result in a better predictive model?

During analysis, how do you treat the missing values?

What are the available feature selection methods for selecting the right variables for building efficient predictive models?

Why is data cleaning crucial? How do you clean the data?

Why do we need selection bias?

How regularly must we update an algorithm in the field of machine learning?

How do you approach solving any data analytics based project?

What are the differences between correlation and covariance?

What is Cross-Validation?

Suppose there is a dataset having variables with missing values of more than 30%, how will you deal with such a dataset?

Since you have experience in the deep learning field, can you tell us why TensorFlow is the most preferred library in deep learning?

What is the p-value and what does it indicate in the Null Hypothesis?

What are Exploding Gradients and Vanishing Gradients?

What are auto-encoders?

What is a computational graph?

What is Generative Adversarial Network?

Explain Neural Network Fundamentals.

So, you have done some projects in machine learning and data science and we see you are a bit experienced in the field. Let’s say your laptop’s RAM is only 4GB and you want to train your model on 10GB data set.

What are Support Vectors in SVM (Support Vector Machine)?

What are RMSE and MSE in a linear regression model?

How are the time series problems different from other regression problems?