Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

Difference Between Sigmoid and Softmax functions?

Answer»

The sigmoid function is USED for BINARY classification. The probabilities sum needs to be 1. Whereas, Softmax function is used for multi-classification. The probabilities sum will be 1.

Conclusion

The above-listed questions are the basics of machine learning. Machine learning is advancing so fast hence new concepts will emerge. So to get up to date with that join communities, attend conferences, read research papers. By doing so you can CRACK any ML interview.

Additional Resources

Practice Coding

Best Machine Learning Courses

Best Data Science Courses

Python Interview Questions

AI MCQs

Machine Learning Engineer: CAREER Guide

Deep Learning Interview

Machine Learning Engineer Salary

Machine Learning VS Data Science

Machine Learning Vs Deep Learning

Difference Between Artificial Intelligence and Machine Learning

2.

What is Reinforcement Learning?

Answer»

REINFORCEMENT LEARNING is DIFFERENT from the other TYPES of learning like supervised and unsupervised. In reinforcement learning, we are given neither data nor labels. Our learning is based on the rewards given to the agent by the environment.

3.

What are Parametric and Non-Parametric Models?

Answer»

Parametric MODELS will have limited parameters and to PREDICT new data, you only NEED to know the parameter of the model.

Non-Parametric models have no limits in TAKING a number of parameters, allowing for more FLEXIBILITY and to predict new data. You need to know the state of the data and model parameters.

4.

What is P-value?

Answer»

P-values are used to MAKE a decision about a hypothesis test. P-value is the minimum SIGNIFICANT level at which you can reject the NULL hypothesis. The LOWER the p-value, the more LIKELY you reject the null hypothesis.

5.

Explain Correlation and Covariance?

Answer»

Correlation is used for measuring and also for estimating the quantitative relationship between TWO variables.  Correlation measures how STRONGLY two variables are related. Examples like, income and EXPENDITURE, demand and supply, etc.

Covariance is a simple way to measure the correlation between two variables. The problem with covariance is that they are hard to COMPARE without NORMALIZATION.

6.

Can logistic regression use for more than 2 classes?

Answer»

No, by DEFAULT logistic regression is a BINARY classifier, so it cannot be applied to more than 2 classes. However, it can be EXTENDED for SOLVING multi-class classification problems (multinomial logistic regression)

7.

How do check the Normality of a dataset?

Answer»

Visually, we can use plots. A few of the NORMALITY checks are as follows:

  • Shapiro-Wilk TEST
  • Anderson-Darling Test
  • Martinez-Iglewicz Test
  • Kolmogorov-Smirnov Test
  • D’Agostino Skewness Test
8.

What are Recommender Systems?

Answer»

A recommendation engine is a system USED to predict users’ interests and RECOMMEND products that are quite likely interesting for them.

Data required for recommender SYSTEMS stems from explicit user ratings after watching a film or listening to a song, from implicit search engine QUERIES and purchase histories, or from other KNOWLEDGE about the users/items themselves.

9.

How can you select K for K-means Clustering?

Answer»

There are two kinds of methods that INCLUDE direct methods and statistical testing methods:

  • Direct methods: It contains elbow and silhouette 
  • Statistical testing methods: It has GAP statistics.

The silhouette is the most frequently used while DETERMINING the optimal VALUE of K.

10.

What is Clustering?

Answer»

Clustering is the process of GROUPING a set of objects into a NUMBER of groups. Objects should be SIMILAR to one another within the same cluster and dissimilar to those in other clusters.

A few TYPES of clustering are:

  • Hierarchical clustering
  • K MEANS clustering
  • Density-based clustering
  • Fuzzy clustering, etc.
11.

What is Collaborative Filtering? And Content-Based Filtering?

Answer»

Collaborative FILTERING is a proven technique for personalized CONTENT recommendations. Collaborative filtering is a type of recommendation system that predicts new content by matching the interests of the individual user with the preferences of many users.

Content-based recommender systems are focused only on the preferences of the user. New recommendations are MADE to the user from similar content ACCORDING to the user’s previous choices.

Collaborative Filtering and Content-Based Filtering
12.

What is a Random Forest? How does it work?

Answer»

Random forest is a versatile MACHINE learning method capable of performing both regression and classification tasks.

Like bagging and boosting, random forest WORKS by combining a set of other tree models. Random forest builds a tree from a random sample of the columns in the test data.

Here’s are the steps how a random forest creates the trees:

  • Take a sample size from the training data.
  • Begin with a single node.
  • Run the FOLLOWING ALGORITHM, from the start node:
    • If the number of observations is less than node size then stop.
    • Select random variables.
    • Find the variable that does the “best” job of splitting the observations.
    • Split the observations into TWO nodes.
    • Call step `a` on each of these nodes.
13.

How to Handle Outlier Values?

Answer»

An OUTLIER is an observation in the dataset that is far away from other observations in the dataset. Tools used to DISCOVER outliers are

  • Box plot
  • Z-score
  • Scatter plot, etc.

Typically, we need to follow three SIMPLE strategies to handle outliers:

  • We can DROP them. 
  • We can mark them as outliers and include them as a feature. 
  • Likewise, we can TRANSFORM the feature to reduce the effect of the outlier.
14.

How do you make sure which Machine Learning Algorithm to use?

Answer»

It completely DEPENDS on the dataset we have. If the data is discrete we use SVM. If the dataset is continuous we use linear regression.

So there is no specific way that lets us know which ML algorithm to use, it all depends on the exploratory data analysis (EDA).

EDA is LIKE “interviewing” the dataset; As part of our INTERVIEW we do the following:

  • Classify our variables as continuous, categorical, and so forth. 
  • Summarize our variables using descriptive statistics. 
  • VISUALIZE our variables using charts.

Based on the above observations select one best-fit algorithm for a particular dataset.

15.

What is Ensemble learning?

Answer»

ENSEMBLE learning is a method that combines multiple machine learning models to create more powerful models.

There are many reasons for a model to be different. Few reasons are:

  • Different Population
  • Different Hypothesis
  • Different modeling techniques

When working with the model’s training and testing DATA, we will experience an error. This error might be bias, variance, and irreducible error.

Now the model should always have a BALANCE between bias and variance, which we call a bias-variance trade-off.

This ensemble learning is a way to perform this trade-off.

There are many ensemble techniques available but when aggregating multiple models there are two general methods:

  • Bagging, a native method: take the training set and generate new training sets off of it.
  • Boosting, a more elegant method: similar to bagging, boosting is used to OPTIMIZE the best weighting scheme for a training set.
16.

What are Loss Function and Cost Functions? Explain the key Difference Between them?

Answer»

When calculating loss we consider only a single data point, then we use the term loss function.

Whereas, when calculating the sum of error for multiple data then we use the cost function. There is no major difference.

In other words, the loss function is to capture the difference between the actual and predicted VALUES for a single record whereas cost functions aggregate the difference for the entire training dataset.

The Most commonly used loss functions are Mean-squared error and Hinge loss.

Mean-Squared Error(MSE): In simple words, we can say how our model predicted values against the actual values.

MSE = √(predicted value - actual value)2

Hinge loss: It is used to train the machine learning classifier, which is

L(y) = max(0,1- yy)

Where y = -1 or 1 indicating two classes and y represents the OUTPUT FORM of the classifier. The most common cost function represents the TOTAL cost as the sum of the fixed costs and the variable costs in the equation y = mx + b

17.

What is a Neural Network?

Answer»

It is a SIMPLIFIED model of the human BRAIN. Much like the brain, it has neurons that activate when encountering something similar.

The DIFFERENT neurons are connected via connections that HELP information flow from one neuron to ANOTHER.

18.

How to Tackle Overfitting and Underfitting?

Answer»

Overfitting means the model FITTED to TRAINING data too well, in this case, we need to resample the data and estimate the model accuracy USING TECHNIQUES like k-fold cross-validation.

Whereas for the Underfitting case we are not able to UNDERSTAND or capture the patterns from the data, in this case, we need to change the algorithms, or we need to feed more data points to the model.

19.

Define Precision and Recall?

Answer»

Precision and recall are ways of monitoring the POWER of machine learning implementation. But they often used at the same time.

Precision answers the question, “Out of the ITEMS that the classifier predicted to be relevant, how many are truly relevant?”

Whereas, recall answers the question, “Out of all the items that are truly relevant, how many are found by the classifier?

In general, the meaning of precision is the fact of being exact and ACCURATE. So the same will go in our machine learning model as well. If you have a SET of items that your model needs to PREDICT to be relevant. How many items are truly relevant?

The below figure shows the Venn diagram that precision and recall.

Precision and recall

Mathematically, precision and recall can be defined as the following:

precision = # happy correct answers/# total items returned by ranker

recall = # happy correct answers/# total relevant answers

20.

What is F1 score? How would you use it?

Answer»

Let’s have a look at this table before directly jumping into the F1 SCORE.

PredictionPredicted YesPredicted No
Actual YesTrue Positive (TP)False Negative (FN)
Actual NoFalse Positive (FP)True Negative (TN)

In binary classification we consider the F1 score to be a measure of the MODEL’s accuracy. The F1 score is a WEIGHTED average of precision and recall SCORES.

F1 = 2TP/2TP + FP + FN

We see scores for F1 between 0 and 1, where 0 is the worst score and 1 is the best score. 
The F1 score is typically used in information retrieval to see how WELL a model retrieves relevant results and our model is performing.