InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
How is the grid search parameter different from the random search tuning strategy? |
|
Answer» Tuning strategies are used to find the right set of hyperparameters. Hyperparameters are those properties that are fixed and model-specific before the model is TESTED or trained on the dataset. Both the grid search and random search tuning strategies are optimization techniques to find efficient hyperparameters.
Data Science is a very vast field and comprises many topics like Data MINING, Data Analysis, Data Visualization, Machine Learning, Deep Learning, and most importantly it is laid on the foundation of MATHEMATICAL concepts like Linear Algebra and Statistical analysis. Since there are a LOT of pre-requisites for becoming a good professional Data Scientist, the perks and benefits are very big. Data Scientist has become the most sought job role these days. Looking for a comprehensive course on Data Science: Check out our Offerings. Useful Resources:
|
|
| 2. |
What is the importance of dimensionality reduction? |
|
Answer» The PROCESS of DIMENSIONALITY reduction CONSTITUTES reducing the number of features in a dataset to avoid overfitting and reduce the variance. There are mostly 4 advantages of this process: |
|
| 3. |
How do you identify if a coin is biased? |
|
Answer» To identify this, we perform a hypothesis test as below:
|
|
| 4. |
How is feature selection performed using the regularization method? |
|
Answer» The method of regularization ENTAILS the addition of PENALTIES to different parameters in the machine learning model for reducing the freedom of the model to avoid the issue of overfitting. |
|
| 5. |
What are various assumptions used in linear regression? What would happen if they are violated? |
|
Answer» LINEAR regression is done under the following assumptions:
Extreme VIOLATIONS of the above assumptions lead to redundant results. Smaller violations of these result in greater variance or bias of the estimates. |
|
| 6. |
Is it good to do dimensionality reduction before fitting a Support Vector Model? |
|
Answer» If the features number is greater than observations then doing DIMENSIONALITY REDUCTION IMPROVES the SVM (Support VECTOR MODEL). |
|
| 7. |
Give one example where both false positives and false negatives are important equally? |
|
Answer» In Banking fields: Lending loans are the main sources of income to the BANKS. But if the repayment rate isn’t good, then there is a risk of huge LOSSES INSTEAD of any profits. So GIVING out loans to customers is a gamble as banks can’t risk losing good customers but at the same time, they can’t AFFORD to acquire bad customers. This case is a classic example of equal importance in false positive and false negative scenarios. |
|
| 8. |
What are some examples when false positive has proven important than false negative? |
|
Answer» Before citing instances, LET us understand what are false positives and false negatives.
Some examples where false positives were important than false negatives are:
|
|
| 9. |
Toss the selected coin 10 times from a jar of 1000 coins. Out of 1000 coins, 999 coins are fair and 1 coin is double-headed, assume that you see 10 heads. Estimate the probability of getting a head in the next coin toss. |
|
Answer» We know that there are TWO types of coins - fair and double-headed. HENCE, there are two possible ways of choosing a coin. The first is to choose a fair coin and the second is to choose a coin having 2 heads. P(selecting fair coin) = 999/1000 = 0.999 Using Bayes RULE, P(selecting 10 heads in row) = P(selecting fair coin)* Getting 10 heads + P(selecting double headed coin)P(selecting 10 heads in row) = P(A)+P(B)P (A) = 0.999 * (1/2)^10 = 0.999 * (1/1024) = 0.000976P (B) = 0.001 * 1 = 0.001P( A / (A + B) ) = 0.000976 / (0.000976 + 0.001) = 0.4939P( B / (A + B)) = 0.001 / 0.001976 = 0.5061P(selecting HEAD in next toss) = P(A/A+B) * 0.5 + P(B/A+B) * 1 = 0.4939 * 0.5 + 0.5061 = 0.7531So, the answer is 0.7531 or 75.3%. |
|
| 10. |
Consider a case where you know the probability of finding at least one shooting star in a 15-minute interval is 30%. Evaluate the probability of finding at least one shooting star in a one-hour duration? |
|
Answer» We KNOW that,PROBABILITY of finding ATLEAST 1 SHOOTING star in 15 min = P(sighting in 15min) = 30% = 0.3Hence, Probability of not sighting any shooting star in 15 min = 1-P(sighting in 15min) = 1-0.3 = 0.7 Probability of not finding shooting star in 1 hour = 0.7^4 = 0.1372Probability of finding atleast 1 shooting star in 1 hour = 1-0.1372 = 0.8628 So the probability is 0.8628 = 86.28% |
|
| 11. |
What is better - random forest or multiple decision trees? |
|
Answer» Random forest is BETTER than multiple DECISION trees as random forests are much more robust, accurate, and lesser PRONE to OVERFITTING as it is an ensemble METHOD that ensures multiple weak decision trees learn strongly. |
|
| 12. |
How will you balance/correct imbalanced data? |
|
Answer» There are different techniques to correct/balance imbalanced data. It can be done by increasing the sample numbers for MINORITY CLASSES. The number of samples can be decreased for those classes with extremely high data points. Following are some approaches followed to balance data:
For EXAMPLE, consider the below graph that illustrates training data: Here, if we measure the accuracy of the model in terms of getting "0"s, then the accuracy of the model would be very high -> 99.9%, but the model does not guarantee any valuable information. In such cases, we can apply different evaluation metrics as stated above.
|
|
| 13. |
Differentiate between box plot and histogram. |
|
Answer» Box plots and histograms are both visualizations used for showing data DISTRIBUTIONS for efficient COMMUNICATION of information. |
|
| 14. |
What do you understand by a kernel trick? |
|
Answer» Kernel functions are GENERALIZED dot PRODUCT functions USED for the COMPUTING dot product of VECTORS xx and yy in high dimensional feature space. Kernal trick method is used for solving a non-linear problem by using a linear classifier by transforming linearly inseparable data into separable ones in higher dimensions. |
|
| 15. |
What is the difference between the Test set and validation set? |
|
Answer» The test set is USED to test or EVALUATE the performance of the TRAINED model. It evaluates the PREDICTIVE power of the model. |
|
| 16. |
What are the differences between univariate, bivariate and multivariate analysis? |
|||||||||
|
Answer» Statistical analyses are classified based on the number of variables processed at a GIVEN time.
|
||||||||||
| 17. |
What does the ROC Curve represent and how to create it? |
|
Answer» ROC (Receiver Operating Characteristic) curve is a graphical REPRESENTATION of the contrast between false-positive rates and true positive rates at DIFFERENT thresholds. The curve is used as a PROXY for a trade-off between sensitivity and specificity. The ROC curve is created by plotting values of true positive rates (TPR or sensitivity) against false-positive rates (FPR or (1-specificity)) TPR represents the PROPORTION of observations correctly predicted as positive out of overall positive observations. The FPR represents the proportion of observations incorrectly predicted out of overall negative observations. Consider the example of medical testing, the TPR represents the rate at which people are correctly TESTED positive for a particular disease. |
|
| 18. |
How will you treat missing values during data analysis? |
|
Answer» The impact of missing values can be known after IDENTIFYING what kind of VARIABLES have the missing values.
|
|
| 19. |
Will treating categorical variables as continuous variables result in a better predictive model? |
|
Answer» Yes! A CATEGORICAL variable is a variable that can be assigned to two or more CATEGORIES with no definite category ORDERING. ORDINAL variables are similar to categorical variables with proper and clear ordering defines. So, if the variable is ordinal, then treating the categorical VALUE as a continuous variable will result in better predictive models. |
|
| 20. |
During analysis, how do you treat the missing values? |
|
Answer» To IDENTIFY the extent of missing VALUES, we first have to identify the variables with the missing values. Let us say a pattern is identified. The analyst should now concentrate on them as it could lead to interesting and meaningful insights. HOWEVER, if there are no patterns identified, we can SUBSTITUTE the missing values with the median or mean values or we can simply ignore the missing values. If the variable is categorical, the default value to the mean, MINIMUM, and maximum is assigned. The missing value is assigned to the default value. If we have a distribution of data coming, for normal distribution, we give the mean value. If 80% of the values are missing for a particular variable, then we would drop the variable instead of treating the missing values. |
|
| 21. |
What are the available feature selection methods for selecting the right variables for building efficient predictive models? |
|
Answer» While using a dataset in data science or machine learning algorithms, it so happens that not all the variables are necessary and useful to build a model. Smarter feature selection methods are required to avoid REDUNDANT models to increase the efficiency of our model. Following are the three main methods in feature selection:
|
|
| 22. |
Why is data cleaning crucial? How do you clean the data? |
|
Answer» While running an algorithm on any data, to gather proper insights, it is very much necessary to have correct and clean data that contains only relevant information. Dirty data most often results in poor or incorrect insights and predictions which can have damaging effects. For example, while launching any big campaign to market a product, if our data analysis tells us to target a product that in reality has no demand and if the campaign is launched, it is bound to fail. This results in a loss of the company’s revenue. This is where the importance of having proper and clean data comes into the picture.
The following diagram represents the advantages of data cleaning: |
|
| 23. |
Why do we need selection bias? |
|
Answer» Selection Bias happens in cases where there is no RANDOMIZATION specifically achieved while PICKING a part of the dataset for analysis. This bias tells that the sample analyzed does not represent the WHOLE POPULATION meant to be analyzed.
|
|
| 24. |
How regularly must we update an algorithm in the field of machine learning? |
|
Answer» We do not want to update and make changes to an algorithm on a regular basis as an algorithm is a well-defined step procedure to solve any problem and if the steps keep on updating, it cannot be said well defined anymore. ALSO, this brings in a LOT of problems to the systems ALREADY implementing the algorithm as it BECOMES difficult to bring in continuous and regular changes. So, we should update an algorithm only in any of the following cases:
|
|
| 25. |
How do you approach solving any data analytics based project? |
|
Answer» Generally, we follow the below steps:
Check out the list of data ANALYTICS projects. |
|
| 26. |
What are the differences between correlation and covariance? |
|
Answer» Although these two terms are used for establishing a relationship and dependency between any two random variables, the following are the differences between them:
Mathematically, consider 2 random variables, X and Y where the means are represented as μX{"detectHand":false} and μY{"detectHand":false} RESPECTIVELY and standard deviations are represented by σX{"detectHand":false} and σY{"detectHand":false} respectively and E represents the EXPECTED value operator, then:
Based on the above formula, we can deduce that the correlation is dimensionless whereas covariance is represented in units that are obtained from the multiplication of units of two variables. The following image graphically shows the difference between correlation and covariance: |
|
| 27. |
What is Cross-Validation? |
|
Answer» Cross-Validation is a Statistical technique used for improving a model’s PERFORMANCE. Here, the model will be TRAINED and tested with rotation using different samples of the training dataset to ensure that the model performs WELL for unknown data. The training data will be SPLIT into various groups and the model is run and VALIDATED against these groups in rotation. The most commonly used techniques are:
|
|
| 28. |
Suppose there is a dataset having variables with missing values of more than 30%, how will you deal with such a dataset? |
|
Answer» Depending on the size of the dataset, we FOLLOW the below WAYS:
|
|
| 29. |
Since you have experience in the deep learning field, can you tell us why TensorFlow is the most preferred library in deep learning? |
|
Answer» Tensorflow is a very famous library in DEEP learning. The reason is PRETTY simple actually. It provides C++ as well as Python APIs which makes it very easier to work on. ALSO, TensorFlow has a fast COMPILATION speed as COMPARED to Keras and Torch (other famous deep learning libraries). Apart from that, Tenserflow supports both GPU and CPU computing devices. Hence, it is a major success and a very popular library for deep learning. |
|
| 30. |
What is the p-value and what does it indicate in the Null Hypothesis? |
|
Answer» P-value is a number that ranges from 0 to 1. In a hypothesis test in statistics, the p-value helps in telling us how strong the results are. The claim that is kept for EXPERIMENT or trial is called NULL Hypothesis.
|
|
| 31. |
What are Exploding Gradients and Vanishing Gradients? |
Answer»
|
|
| 32. |
What are auto-encoders? |
|
Answer» Auto-encoders are learning NETWORKS. They TRANSFORM inputs into outputs with MINIMUM possible ERRORS. So, basically, this means that the output that we want should be almost equal to or as close as to input as follows. Multiple layers are added between the input and the output layer and the layers that are in between the input and the output layer are smaller than the input layer. It received unlabelled input. This input is encoded to reconstruct the input LATER. |
|
| 33. |
What is a computational graph? |
|
Answer» A computational GRAPH is also known as a “Dataflow Graph”. Everything in the famous deep learning LIBRARY TensorFlow is based on the computational graph. The computational graph in Tensorflow has a network of nodes where each node operates. The nodes of this graph represent OPERATIONS and the edges represent TENSORS. |
|
| 34. |
What is Generative Adversarial Network? |
|
Answer» This APPROACH can be understood with the FAMOUS example of the wine seller. Let us say that there is a wine seller who has his own shop. This wine seller purchases wine from the dealers who sell him the wine at a low cost so that he can sell the wine at a high cost to the customers. Now, let us say that the dealers whom he is purchasing the wine from, are SELLING him fake wine. They do this as the fake wine costs way less than the original wine and the fake and the real wine are indistinguishable to a normal CONSUMER (customer in this case). The shop owner has some friends who are wine experts and he sends his wine to them every time before keeping the stock for sale in his shop. So, his friends, the wine experts, give him feedback that the wine is probably fake. Since the wine seller has been purchasing the wine for a long time from the same dealers, he wants to make sure that their feedback is right before he complains to the dealers about it. Now, let us say that the dealers also have got a tip from somewhere that the wine seller is suspicious of them. So, in this situation, the dealers will try their best to sell the fake wine WHEREAS the wine seller will try his best to identify the fake wine. Let us see this with the help of a diagram shown below: From the image above, it is clear that a noise vector is entering the generator (dealer) and he generates the fake wine and the discriminator has to distinguish between the fake wine and real wine. This is a Generative Adversarial Network (GAN). In a GAN, there are 2 main components viz. Generator and Discrminator. So, the generator is a CNN that keeps producing images and the discriminator tries to identify the real images from the fake ones. |
|
| 35. |
Explain Neural Network Fundamentals. |
|
Answer» In the human brain, DIFFERENT neurons are present. These neurons combine and perform various tasks. The Neural Network in deep learning tries to imitate human brain neurons. The neural network learns the patterns from the data and uses the knowledge that it gains from various patterns to predict the OUTPUT for new data, without any human assistance. A perceptron is the simplest neural network that CONTAINS a SINGLE neuron that performs 2 functions. The first function is to perform the weighted sum of all the inputs and the second is an activation function. There are some other neural networks that are more complicated. Such networks consist of the following three layers:
An EXAMPLE neural network image is shown below: |
|
| 36. |
So, you have done some projects in machine learning and data science and we see you are a bit experienced in the field. Let’s say your laptop’s RAM is only 4GB and you want to train your model on 10GB data set. |
|
Answer» What will you do? Have you experienced such an issue before? In such types of questions, we first need to ask what ML model we have to train. After that, it depends on whether we have to train a model based on Neural Networks or SVM. The steps for Neural Networks are given below:
The steps for SVM are given below:
Now, you may describe the situation if you have faced such an issue in your projects or working in machine learning/ data science. |
|
| 37. |
What are Support Vectors in SVM (Support Vector Machine)? |
|
Answer» In the above DIAGRAM, we can see that the thin lines MARK the distance from the classifier to the closest data points (darkened data points). These are called support vectors. So, we can DEFINE the support vectors as the data points or vectors that are nearest (closest) to the hyperplane. They AFFECT the position of the hyperplane. Since they support the hyperplane, they are KNOWN as support vectors. |
|
| 38. |
What are RMSE and MSE in a linear regression model? |
|
Answer» RMSE: RMSE stands for Root Mean Square Error. In a linear REGRESSION MODEL, RMSE is used to test the PERFORMANCE of the machine learning model. It is used to evaluate the data spread around the line of best fit. So, in simple words, it is used to measure the deviation of the residuals. RMSE is CALCULATED using the formula:
MSE: Mean Squared Error is used to find how close is the line to the actual data. So, we make the difference in the distance of the data points from the line and the difference is squared. This is done for all the data points and the submission of the squared difference divided by the total number of data points gives us the Mean Squared Error (MSE). So, if we are taking the squared difference of N data points and dividing the sum by N, what does it mean? Yes, it represents the average of the squared difference of a data point from the line i.e. the average of the squared difference between the actual and the predicted values. The formula for finding MSE is given below:
So, RMSE is the square root of MSE. |
|
| 39. |
How are the time series problems different from other regression problems? |
Answer»
|
|