11 + Interview Questions in Deep Learning Interview Questions for Experienced in Deep Learning Interview Questions Page 1 InterviewSolution

1.	Explain the different types of activation functions.
Answer» Following are the different types of activation functions: Sigmoid function: The sigmoid function is a non-linear activation function in an ANN that is mostly utilised in feedforward neural NETWORKS. It's a differentiable real function with positive derivatives everywhere and a certain degree of smoothness, DEFINED for real input values. The sigmoid function is found in the deep learning models' output layer and is used to anticipate probability-based outputs. The sigmoid function is written as follows: f(x)=(1(1+exp-x))-(1.4){"detectHand":false} Hyperbolic Tangent Function (Tanh): The Tanh function is a smoother and zero-centered function having a range of -1 to 1. The output of the tanh function is represented by: f(xi)=exp(xi)∑jexp(xj)-(1.12){"detectHand":false} Because it provides HIGHER training performance for multilayer neural networks, the tanh function is considerably more widely utilised than the sigmoid function. The tanh function's primary advantage is that it gives a zero-centered output, which helps with backpropagation. Softmax function: The softmax function is another type of activation function used in neural networks to generate probability distribution from a vector of real numbers. This function returns a number between 0 and 1, with the sum of the probabilities equal to 1. The softmax function is written like this: f(xi)=exp(xi)∑jexp(xj)-(1.12){"detectHand":false} This function is most commonly used in multi-class models, returning probabilities for each class, with the target class having the highest probability. It can be found in practically all of the output layers of the DL architecture. Softsign function: This is most commonly used in regression computation issues and text-to-speech applications based on deep learning. It's a quadratic polynomial with the following representation: f(x)=(xx+1)-(1.13){"detectHand":false} Rectified Linear Unit Function: The rectified linear unit (ReLU) function is a fast-learning artificial intelligence (AI) that promises to give cutting-edge performance and outstanding results. In deep learning, the ReLU function outperforms other AFs like the sigmoid and tanh functions in terms of performance and generalisation. The function is a roughly linear function that preserves the features of linear models, making gradient-descent approaches easier to optimise. On each input element, the ReLU function performs a threshold operation, setting all values LESS than zero to zero. As a result, the ReLU is written as: f(x)=max(0,x)=xi, if xi≥00, if xi<0-(1.14){"detectHand":false} Exponential Linear Unit Function: The exponential linear units (ELUs) function is a type of AF that can be used to speed up neural network training (just like ReLU function). The ELU function's major advantage is that it can solve the vanishing gradient problem by employing identity for positive values and boosting the model's learning properties. The exponential linear unit function has the following representation: f(x)=x, if x>0α exp(x)-1, if x≤0-(1.22){"detectHand":false}

1.

Explain the different types of activation functions.

Answer»

Following are the different types of activation functions:

Sigmoid function: The sigmoid function is a non-linear activation function in an ANN that is mostly utilised in feedforward neural NETWORKS. It's a differentiable real function with positive derivatives everywhere and a certain degree of smoothness, DEFINED for real input values. The sigmoid function is found in the deep learning models' output layer and is used to anticipate probability-based outputs. The sigmoid function is written as follows:

f(x)=(1(1+exp-x))-(1.4){"detectHand":false}

Hyperbolic Tangent Function (Tanh): The Tanh function is a smoother and zero-centered function having a range of -1 to 1. The output of the tanh function is represented by:

f(xi)=exp(xi)∑jexp(xj)-(1.12){"detectHand":false}

Because it provides HIGHER training performance for multilayer neural networks, the tanh function is considerably more widely utilised than the sigmoid function. The tanh function's primary advantage is that it gives a zero-centered output, which helps with backpropagation.

Softmax function: The softmax function is another type of activation function used in neural networks to generate probability distribution from a vector of real numbers. This function returns a number between 0 and 1, with the sum of the probabilities equal to 1. The softmax function is written like this:

f(xi)=exp(xi)∑jexp(xj)-(1.12){"detectHand":false}
This function is most commonly used in multi-class models, returning probabilities for each class, with the target class having the highest probability. It can be found in practically all of the output layers of the DL architecture.

Softsign function: This is most commonly used in regression computation issues and text-to-speech applications based on deep learning. It's a quadratic polynomial with the following representation:

f(x)=(xx+1)-(1.13){"detectHand":false}

Rectified Linear Unit Function: The rectified linear unit (ReLU) function is a fast-learning artificial intelligence (AI) that promises to give cutting-edge performance and outstanding results. In deep learning, the ReLU function outperforms other AFs like the sigmoid and tanh functions in terms of performance and generalisation. The function is a roughly linear function that preserves the features of linear models, making gradient-descent approaches easier to optimise.

On each input element, the ReLU function performs a threshold operation, setting all values LESS than zero to zero. As a result, the ReLU is written as:

f(x)=max(0,x)=xi, if xi≥00, if xi<0-(1.14){"detectHand":false}

Exponential Linear Unit Function: The exponential linear units (ELUs) function is a type of AF that can be used to speed up neural network training (just like ReLU function). The ELU function's major advantage is that it can solve the vanishing gradient problem by employing identity for positive values and boosting the model's learning properties. The exponential linear unit function has the following representation:

f(x)=x, if x>0α exp(x)-1, if x≤0-(1.22){"detectHand":false}

Discussion

2.

Differentiate between Deep Learning and Machine Learning.

Answer»

Deep Learning: Deep Learning is a subclass of Machine Learning in which a recurrent neural network and an artificial neural network are linked. The algorithms are constructed in the same way as machine learning algorithms are, however, there are many more levels of algorithms. The artificial neural network refers to all of the algorithm's networks put together. In much simpler terms, it mimics the human brain by connecting all of the neural networks in the brain, which is the concept of deep learning. It uses algorithms and a technique to tackle all types of complex problems.

Machine Learning: Machine learning is a SUBSET of Artificial INTELLIGENCE (AI) that allows a system to learn and grow from its experiences without having to be programmed to that level. Data is used by Machine Learning to learn and get accurate outcomes. Machine learning algorithms have the ability to learn and improve their performance by gaining more data. Machine learning is currently EMPLOYED in self-driving cars, cyber fraud detection, face RECOGNITION, and Facebook friend suggestion, among other applications. Learn More.

The following table illustrates the difference between them:

Deep Learning	Machine Learning
Deep Learning is a subclass of Machine Learning.	Machine Learning is a super-class of Deep Learning.
Deep Learning employs neural networks to represent data, which is a very distinct data representation (ANN).	Machine Learning represents data in a DIFFERENT way than Deep Learning since it uses structured data.
In this, the output ranges from numerical values to free-form elements such as text or sound.	In this, the output consists of numerical values
It uses a neural network to evaluate data features and relationships by passing data through processing layers.	It uses a variety of automated techniques to convert input into model functions and forecast future actions.
It usually deals with millions of data points.	It usually deals with thousands of data points.
Machine Learning has evolved into Deep Learning. Essentially, it refers to the depth of machine learning.	Artificial Intelligence has evolved into Machine Learning.

Discussion

3.	What do you know about Dropout?
Answer» Dropout is a regularization approach that helps to AVOID overfitting and hence improves generalizability (that is, the model predicts correct output for most of the inputs in general, rather than only being limited to the training data set). In general, we should UTILIZE a LOW dropout value of 20 percent to 50 percent of neurons, with 20% being a DECENT starting point. A probability that is too low has no effect, whereas a number that is too high causes the network to under-learn. When you employ dropout on a larger network, you're more likely to achieve BETTER results because the model has more opportunities to learn independent representations.

3.

What do you know about Dropout?

Answer»

Dropout is a regularization approach that helps to AVOID overfitting and hence improves generalizability (that is, the model predicts correct output for most of the inputs in general, rather than only being limited to the training data set). In general, we should UTILIZE a LOW dropout value of 20 percent to 50 percent of neurons, with 20% being a DECENT starting point. A probability that is too low has no effect, whereas a number that is too high causes the network to under-learn.

When you employ dropout on a larger network, you're more likely to achieve BETTER results because the model has more opportunities to learn independent representations.

Discussion

4.	Mention the applications of autoencoders.
Answer» Following are the applications of autoencoders:- Image Denoising: Denoising images is a skill that autoencoders EXCEL at. A noisy image is one that has been corrupted or has a little amount of NOISE (that is, random VARIATION of brightness or color information in images) in it. Image denoising is used to gain accurate information about the image's content. Dimensionality Reduction: The input is converted into a reduced representation by the autoencoders, which is stored in the middle LAYER called code. This is where the information from the input has been compressed, and each node may now be treated as a variable by extracting this layer from the model. As a result, we can deduce that by removing the decoder, an autoencoder can be utilised for dimensionality reduction, with the coding layer as the output. Feature Extraction: The encoding section of Autoencoders aids in the learning of crucial hidden features present in the input data, lowering the reconstruction error. During encoding, a NEW collection of original feature combinations is created. Image Colorization: Converting a black-and-white image to a coloured one is one of the applications of autoencoders. We can also convert a colourful image to grayscale. Data Compression: Autoencoders can be used for data compression. Yet they are rarely used for data compression because of the following reasons: Lossy compression: The autoencoder's output is not identical to the input, but it is a near but degraded representation. They are not the best option for lossless compression. Data-specific: Autoencoders can only compress data that is identical to the data on which they were trained. They differ from traditional data compression algorithms like jpeg or gzip in that they learn features relevant to the provided training data. As a result, we can't anticipate a landscape photo to be compressed by an autoencoder trained on handwritten digits.

4.

Mention the applications of autoencoders.

Answer»

Following are the applications of autoencoders:-

Image Denoising: Denoising images is a skill that autoencoders EXCEL at. A noisy image is one that has been corrupted or has a little amount of NOISE (that is, random VARIATION of brightness or color information in images) in it. Image denoising is used to gain accurate information about the image's content.
Dimensionality Reduction: The input is converted into a reduced representation by the autoencoders, which is stored in the middle LAYER called code. This is where the information from the input has been compressed, and each node may now be treated as a variable by extracting this layer from the model. As a result, we can deduce that by removing the decoder, an autoencoder can be utilised for dimensionality reduction, with the coding layer as the output.
Feature Extraction: The encoding section of Autoencoders aids in the learning of crucial hidden features present in the input data, lowering the reconstruction error. During encoding, a NEW collection of original feature combinations is created.
Image Colorization: Converting a black-and-white image to a coloured one is one of the applications of autoencoders. We can also convert a colourful image to grayscale.
Data Compression: Autoencoders can be used for data compression. Yet they are rarely used for data compression because of the following reasons:
- Lossy compression: The autoencoder's output is not identical to the input, but it is a near but degraded representation. They are not the best option for lossless compression.
- Data-specific: Autoencoders can only compress data that is identical to the data on which they were trained. They differ from traditional data compression algorithms like jpeg or gzip in that they learn features relevant to the provided training data. As a result, we can't anticipate a landscape photo to be compressed by an autoencoder trained on handwritten digits.

Discussion

5.	What are autoencoders? Explain the different layers of autoencoders.
Answer» An autoencoder is a type of neural network with the condition that the output layer has the same dimension as that of the input layer. In other words, the number of output units in the output layer is equal to the number of input units in the input layer. An autoencoder is also known as a replicator neural network since it duplicates data from the input to the output in an unsupervised way. By SENDING the input through the network, the autoencoders rebuild each dimension of the input. It may appear simple to use a neural network to replicate an input, however, the size of the input is reduced during the replication process, resulting in a smaller representation. In comparison to the input and output layers, the middle layers of the neural network have fewer units. As a result, the reduced representation of the input is STORED in the middle layers. This reduced representation of the input is used to recreate the output. Following are the DIFFERENT layers in the architecture of autoencoders : Encoder: An encoder is a fully connected, feedforward neural network that COMPRESSES the input image into a latent space representation and ENCODES it as a compressed representation in a lower dimension. The deformed representation of the original image is the compressed image. Code: The reduced representation of the input that is supplied into the decoder is stored in this section of the network. Decoder: Like the encoder, the decoder is a feedforward network with a structure identical to the encoder. This network is in charge of reassembling the input from the code to its original dimensions. As we can see in the above image, the input is compressed in the encoder, then stored in the Code, and then the original input is decompressed from the code by the decoder. The autoencoder's principal goal is to provide an output that is identical to the input.

5.

What are autoencoders? Explain the different layers of autoencoders.

Answer»

An autoencoder is a type of neural network with the condition that the output layer has the same dimension as that of the input layer. In other words, the number of output units in the output layer is equal to the number of input units in the input layer. An autoencoder is also known as a replicator neural network since it duplicates data from the input to the output in an unsupervised way.

By SENDING the input through the network, the autoencoders rebuild each dimension of the input. It may appear simple to use a neural network to replicate an input, however, the size of the input is reduced during the replication process, resulting in a smaller representation. In comparison to the input and output layers, the middle layers of the neural network have fewer units. As a result, the reduced representation of the input is STORED in the middle layers. This reduced representation of the input is used to recreate the output.

Following are the DIFFERENT layers in the architecture of autoencoders :

Encoder: An encoder is a fully connected, feedforward neural network that COMPRESSES the input image into a latent space representation and ENCODES it as a compressed representation in a lower dimension. The deformed representation of the original image is the compressed image.
Code: The reduced representation of the input that is supplied into the decoder is stored in this section of the network.
Decoder: Like the encoder, the decoder is a feedforward network with a structure identical to the encoder. This network is in charge of reassembling the input from the code to its original dimensions.

As we can see in the above image, the input is compressed in the encoder, then stored in the Code, and then the original input is decompressed from the code by the decoder. The autoencoder's principal goal is to provide an output that is identical to the input.

Discussion

6.	What exactly do you mean by exploding and vanishing gradients?
Answer» By taking INCREMENTAL steps towards the minimal value, the gradient descent algorithm aims to minimize the error. The weights and biases in a neural network are updated using these processes. However, at times, the steps grow excessively large, resulting in increased updates to weights and bias TERMS — to the point where the weights overflow (or become NaN, that is, Not a Number). An exploding gradient is the result of this, and it is an unstable method. On the other hand, if the steps are excessively SMALL, it results in minor – EVEN negligible – changes in the weights and bias terms. As a result, we may end up training a deep learning model with nearly identical weights and biases every TIME, never reaching the least error function. The vanishing gradient is what it's called.

6.

What exactly do you mean by exploding and vanishing gradients?

Answer»

By taking INCREMENTAL steps towards the minimal value, the gradient descent algorithm aims to minimize the error. The weights and biases in a neural network are updated using these processes.

However, at times, the steps grow excessively large, resulting in increased updates to weights and bias TERMS — to the point where the weights overflow (or become NaN, that is, Not a Number). An exploding gradient is the result of this, and it is an unstable method.

On the other hand, if the steps are excessively SMALL, it results in minor – EVEN negligible – changes in the weights and bias terms. As a result, we may end up training a deep learning model with nearly identical weights and biases every TIME, never reaching the least error function. The vanishing gradient is what it's called.

Discussion

7.	How does Recurrent Neural Network backpropagation vary from Artificial Neural Network backpropagation?
Answer» Backpropagation in Recurrent Neural Networks differ from that of Artificial Neural Networks in the sense that each node in Recurrent Neural Networks has an additional LOOP as SHOWN in the following image: This loop, in essence, incorporates a temporal component into the NETWORK. This allows for the capture of sequential INFORMATION from DATA, which is impossible with a generic artificial neural network.

7.

How does Recurrent Neural Network backpropagation vary from Artificial Neural Network backpropagation?

Answer»

Backpropagation in Recurrent Neural Networks differ from that of Artificial Neural Networks in the sense that each node in Recurrent Neural Networks has an additional LOOP as SHOWN in the following image:

This loop, in essence, incorporates a temporal component into the NETWORK. This allows for the capture of sequential INFORMATION from DATA, which is impossible with a generic artificial neural network.

Discussion

8.	Differentiate between bias and variance in the context of deep learning models. How can you achieve balance between the two?
Answer» Comprehending prediction errors is crucial when it comes to understanding PREDICTIONS. Reducible (errors that arise due to squared bias or squared variance) and irreducible (errors that arise due to the randomness or natural variability in a system and cannot be reduced by varying the MODEL) mistakes are the two PRIMARY types of errors. There are two types of reducible errors: bias and variance. GAINING a thorough grasp of these flaws aids in the construction of an accurate model by preventing overfitting and underfitting. Bias: The bias is defined as the difference between the ML model's predicted values and the actual value. Biasing results in a substantial inaccuracy in both training and testing data. To avoid the PROBLEM of underfitting, it is advised that an algorithm be low biassed at all times. The data predicted is in a straight line format due to significant bias, and hence does not fit accurately in the data set. Underfitting of data is the term for this type of fitting. This occurs when the theory is too straightforward or linear. Consider the graph below as an illustration of a situation like this. Variance: The variance of the model is the variability of model prediction for a given data point, which tells us about the dispersion of our data. It is the difference between the validation error and the training error. The model with high variance has a very complex fit to the training data and so is unable to fit accurately on new data. As a result, while such models perform well on training data, they have high error rates when testing data. When a model's variance is excessive, it's referred to as Overfitting of Data. Overfitting, which involves accurately fitting the training set using a complicated curve and a high order hypothesis, is not a viable option because the error with unknown data is considerable. Variance should be kept to a minimum when training a data model. The model must always aim for a low bias and a low variance in order to achieve the best balance between the two mistakes.

8.

Differentiate between bias and variance in the context of deep learning models. How can you achieve balance between the two?

Answer»

Comprehending prediction errors is crucial when it comes to understanding PREDICTIONS. Reducible (errors that arise due to squared bias or squared variance) and irreducible (errors that arise due to the randomness or natural variability in a system and cannot be reduced by varying the MODEL) mistakes are the two PRIMARY types of errors. There are two types of reducible errors: bias and variance. GAINING a thorough grasp of these flaws aids in the construction of an accurate model by preventing overfitting and underfitting.

Bias:

The bias is defined as the difference between the ML model's predicted values and the actual value. Biasing results in a substantial inaccuracy in both training and testing data. To avoid the PROBLEM of underfitting, it is advised that an algorithm be low biassed at all times.

The data predicted is in a straight line format due to significant bias, and hence does not fit accurately in the data set. Underfitting of data is the term for this type of fitting. This occurs when the theory is too straightforward or linear. Consider the graph below as an illustration of a situation like this.

Variance:

The variance of the model is the variability of model prediction for a given data point, which tells us about the dispersion of our data. It is the difference between the validation error and the training error. The model with high variance has a very complex fit to the training data and so is unable to fit accurately on new data. As a result, while such models perform well on training data, they have high error rates when testing data.

When a model's variance is excessive, it's referred to as Overfitting of Data. Overfitting, which involves accurately fitting the training set using a complicated curve and a high order hypothesis, is not a viable option because the error with unknown data is considerable.

Variance should be kept to a minimum when training a data model.

The model must always aim for a low bias and a low variance in order to achieve the best balance between the two mistakes.

Discussion

9.	According to you, which one is more powerful - a two layer neural network without any activation function or a two layer decision tree?
Answer» A TWO-layer neural network is made up of three layers: one input layer, one hidden layer, and one output layer. When dealing with neural networks, an activation function is essential since it is required when dealing with COMPLEX and nonlinear FUNCTIONAL mappings between inputs and response variables. When there is no activation function in a two-layer neural network, it is simply a linear network. A Neural Network without an Activation function is just a Linear Regression Model, which has limited capability and frequently fails to PERFORM well. A DECISION tree with a depth of two layers is known as a two-layer decision tree. Decision Trees are a type of supervised machine learning (that is, the machine is fed with what the input is and what the related output is in the training data) in which the data is continually split according to a parameter. Two entities, decision nodes, and leaves can be used to explain the tree. The decisions or final outcomes are represented by the leaves. And the data is separated at the decision nodes. When comparing these two models, the two-layer neural network (without activation function) is more powerful than the two-layer decision tree, because the two-layer neural network will consider more attributes while building a model, whereas the two-layer decision tree will only consider 2 or 3 attributes. The figure on the left depicts a 2 layer decision tree and the figure on the right depicts a 2 layer neural network.

9.

According to you, which one is more powerful - a two layer neural network without any activation function or a two layer decision tree?

Answer»

A TWO-layer neural network is made up of three layers: one input layer, one hidden layer, and one output layer. When dealing with neural networks, an activation function is essential since it is required when dealing with COMPLEX and nonlinear FUNCTIONAL mappings between inputs and response variables. When there is no activation function in a two-layer neural network, it is simply a linear network. A Neural Network without an Activation function is just a Linear Regression Model, which has limited capability and frequently fails to PERFORM well.

A DECISION tree with a depth of two layers is known as a two-layer decision tree. Decision Trees are a type of supervised machine learning (that is, the machine is fed with what the input is and what the related output is in the training data) in which the data is continually split according to a parameter. Two entities, decision nodes, and leaves can be used to explain the tree. The decisions or final outcomes are represented by the leaves. And the data is separated at the decision nodes.

When comparing these two models, the two-layer neural network (without activation function) is more powerful than the two-layer decision tree, because the two-layer neural network will consider more attributes while building a model, whereas the two-layer decision tree will only consider 2 or 3 attributes.

The figure on the left depicts a 2 layer decision tree and the figure on the right depicts a 2 layer neural network.

Discussion

10.	Can a deep learning model be solely built on linear regression?
Answer» Yes, if the problem is represented by a linear equation, DEEP networks can be BUILT using a linear function as the activation function for each layer. A problem that is a composition of linear functions, on the other hand, is a linear function, and there is nothing spectacular that can be accomplished by implementing a deep NETWORK because adding more nodes to the network will not boost the machine LEARNING model's predictive capacity.

Discussion

11.	While building a neural network architecture, how will you decide how many neurons and the hidden layers should the neural network have?
Answer» There is no clear and fast rule for determining the exact number of neurons and hidden layers required to design a neural network architecture given a business problem. The size of the hidden layer in a neural network should be somewhere between the size of the output layers and that of the input layers. However, there are a few basic ways that might help you get a head start on constructing a neural network architecture: The best method to approach any unique real-world predictive modelling problem is to start with some basic systematic experimentation to see what would perform best for any given dataset based on PREVIOUS experience working with neural networks in similar real-world situations. The network configuration can be chosen based on one's understanding of the problem domain and previous EXPERTISE with neural networks. The number of layers and neurons employed on similar issues is always a good place to start when evaluating a neural network's configuration. It is best to start with simple neural network architecture and GRADUALLY increase the COMPLEXITY of the neural network based on PREDICTED output and accuracy.

11.

While building a neural network architecture, how will you decide how many neurons and the hidden layers should the neural network have?

Answer»

There is no clear and fast rule for determining the exact number of neurons and hidden layers required to design a neural network architecture given a business problem. The size of the hidden layer in a neural network should be somewhere between the size of the output layers and that of the input layers. However, there are a few basic ways that might help you get a head start on constructing a neural network architecture:

The best method to approach any unique real-world predictive modelling problem is to start with some basic systematic experimentation to see what would perform best for any given dataset based on PREVIOUS experience working with neural networks in similar real-world situations. The network configuration can be chosen based on one's understanding of the problem domain and previous EXPERTISE with neural networks. The number of layers and neurons employed on similar issues is always a good place to start when evaluating a neural network's configuration.
It is best to start with simple neural network architecture and GRADUALLY increase the COMPLEXITY of the neural network based on PREDICTED output and accuracy.

Discussion

Explore topic-wise InterviewSolutions in .

Explain the different types of activation functions.

Differentiate between Deep Learning and Machine Learning.

What do you know about Dropout?

Mention the applications of autoencoders.

What are autoencoders? Explain the different layers of autoencoders.

What exactly do you mean by exploding and vanishing gradients?

How does Recurrent Neural Network backpropagation vary from Artificial Neural Network backpropagation?

Differentiate between bias and variance in the context of deep learning models. How can you achieve balance between the two?

According to you, which one is more powerful - a two layer neural network without any activation function or a two layer decision tree?

Can a deep learning model be solely built on linear regression?

While building a neural network architecture, how will you decide how many neurons and the hidden layers should the neural network have?