InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Explain the different types of activation functions. |
|
Answer» Following are the different types of activation functions: Sigmoid function: The sigmoid function is a non-linear activation function in an ANN that is mostly utilised in feedforward neural NETWORKS. It's a differentiable real function with positive derivatives everywhere and a certain degree of smoothness, DEFINED for real input values. The sigmoid function is found in the deep learning models' output layer and is used to anticipate probability-based outputs. The sigmoid function is written as follows: f(x)=(1(1+exp-x))-(1.4){"detectHand":false} Hyperbolic Tangent Function (Tanh): The Tanh function is a smoother and zero-centered function having a range of -1 to 1. The output of the tanh function is represented by: f(xi)=exp(xi)∑jexp(xj)-(1.12){"detectHand":false} Because it provides HIGHER training performance for multilayer neural networks, the tanh function is considerably more widely utilised than the sigmoid function. The tanh function's primary advantage is that it gives a zero-centered output, which helps with backpropagation. Softmax function: The softmax function is another type of activation function used in neural networks to generate probability distribution from a vector of real numbers. This function returns a number between 0 and 1, with the sum of the probabilities equal to 1. The softmax function is written like this: f(xi)=exp(xi)∑jexp(xj)-(1.12){"detectHand":false} Softsign function: This is most commonly used in regression computation issues and text-to-speech applications based on deep learning. It's a quadratic polynomial with the following representation: f(x)=(xx+1)-(1.13){"detectHand":false} Rectified Linear Unit Function: The rectified linear unit (ReLU) function is a fast-learning artificial intelligence (AI) that promises to give cutting-edge performance and outstanding results. In deep learning, the ReLU function outperforms other AFs like the sigmoid and tanh functions in terms of performance and generalisation. The function is a roughly linear function that preserves the features of linear models, making gradient-descent approaches easier to optimise. On each input element, the ReLU function performs a threshold operation, setting all values LESS than zero to zero. As a result, the ReLU is written as: f(x)=max(0,x)=xi, if xi≥00, if xi<0-(1.14){"detectHand":false} Exponential Linear Unit Function: The exponential linear units (ELUs) function is a type of AF that can be used to speed up neural network training (just like ReLU function). The ELU function's major advantage is that it can solve the vanishing gradient problem by employing identity for positive values and boosting the model's learning properties. The exponential linear unit function has the following representation: f(x)=x, if x>0α exp(x)-1, if x≤0-(1.22){"detectHand":false} |
|
| 2. |
Differentiate between Deep Learning and Machine Learning. |
||||||||||||||
|
Answer» Deep Learning: Deep Learning is a subclass of Machine Learning in which a recurrent neural network and an artificial neural network are linked. The algorithms are constructed in the same way as machine learning algorithms are, however, there are many more levels of algorithms. The artificial neural network refers to all of the algorithm's networks put together. In much simpler terms, it mimics the human brain by connecting all of the neural networks in the brain, which is the concept of deep learning. It uses algorithms and a technique to tackle all types of complex problems. Machine Learning: Machine learning is a SUBSET of Artificial INTELLIGENCE (AI) that allows a system to learn and grow from its experiences without having to be programmed to that level. Data is used by Machine Learning to learn and get accurate outcomes. Machine learning algorithms have the ability to learn and improve their performance by gaining more data. Machine learning is currently EMPLOYED in self-driving cars, cyber fraud detection, face RECOGNITION, and Facebook friend suggestion, among other applications. Learn More. The following table illustrates the difference between them:
|
|||||||||||||||
| 3. |
What do you know about Dropout? |
|
Answer» Dropout is a regularization approach that helps to AVOID overfitting and hence improves generalizability (that is, the model predicts correct output for most of the inputs in general, rather than only being limited to the training data set). In general, we should UTILIZE a LOW dropout value of 20 percent to 50 percent of neurons, with 20% being a DECENT starting point. A probability that is too low has no effect, whereas a number that is too high causes the network to under-learn. When you employ dropout on a larger network, you're more likely to achieve BETTER results because the model has more opportunities to learn independent representations. |
|
| 4. |
Mention the applications of autoencoders. |
|
Answer» Following are the applications of autoencoders:-
|
|
| 5. |
What are autoencoders? Explain the different layers of autoencoders. |
|
Answer» An autoencoder is a type of neural network with the condition that the output layer has the same dimension as that of the input layer. In other words, the number of output units in the output layer is equal to the number of input units in the input layer. An autoencoder is also known as a replicator neural network since it duplicates data from the input to the output in an unsupervised way. By SENDING the input through the network, the autoencoders rebuild each dimension of the input. It may appear simple to use a neural network to replicate an input, however, the size of the input is reduced during the replication process, resulting in a smaller representation. In comparison to the input and output layers, the middle layers of the neural network have fewer units. As a result, the reduced representation of the input is STORED in the middle layers. This reduced representation of the input is used to recreate the output. Following are the DIFFERENT layers in the architecture of autoencoders :
As we can see in the above image, the input is compressed in the encoder, then stored in the Code, and then the original input is decompressed from the code by the decoder. The autoencoder's principal goal is to provide an output that is identical to the input. |
|
| 6. |
What exactly do you mean by exploding and vanishing gradients? |
|
Answer» By taking INCREMENTAL steps towards the minimal value, the gradient descent algorithm aims to minimize the error. The weights and biases in a neural network are updated using these processes. However, at times, the steps grow excessively large, resulting in increased updates to weights and bias TERMS — to the point where the weights overflow (or become NaN, that is, Not a Number). An exploding gradient is the result of this, and it is an unstable method. On the other hand, if the steps are excessively SMALL, it results in minor – EVEN negligible – changes in the weights and bias terms. As a result, we may end up training a deep learning model with nearly identical weights and biases every TIME, never reaching the least error function. The vanishing gradient is what it's called. |
|
| 7. |
How does Recurrent Neural Network backpropagation vary from Artificial Neural Network backpropagation? |
|
Answer» Backpropagation in Recurrent Neural Networks differ from that of Artificial Neural Networks in the sense that each node in Recurrent Neural Networks has an additional LOOP as SHOWN in the following image: This loop, in essence, incorporates a temporal component into the NETWORK. This allows for the capture of sequential INFORMATION from DATA, which is impossible with a generic artificial neural network. |
|
| 8. |
Differentiate between bias and variance in the context of deep learning models. How can you achieve balance between the two? |
|
Answer» Comprehending prediction errors is crucial when it comes to understanding PREDICTIONS. Reducible (errors that arise due to squared bias or squared variance) and irreducible (errors that arise due to the randomness or natural variability in a system and cannot be reduced by varying the MODEL) mistakes are the two PRIMARY types of errors. There are two types of reducible errors: bias and variance. GAINING a thorough grasp of these flaws aids in the construction of an accurate model by preventing overfitting and underfitting. Bias: The bias is defined as the difference between the ML model's predicted values and the actual value. Biasing results in a substantial inaccuracy in both training and testing data. To avoid the PROBLEM of underfitting, it is advised that an algorithm be low biassed at all times. The data predicted is in a straight line format due to significant bias, and hence does not fit accurately in the data set. Underfitting of data is the term for this type of fitting. This occurs when the theory is too straightforward or linear. Consider the graph below as an illustration of a situation like this. Variance: The variance of the model is the variability of model prediction for a given data point, which tells us about the dispersion of our data. It is the difference between the validation error and the training error. The model with high variance has a very complex fit to the training data and so is unable to fit accurately on new data. As a result, while such models perform well on training data, they have high error rates when testing data. When a model's variance is excessive, it's referred to as Overfitting of Data. Overfitting, which involves accurately fitting the training set using a complicated curve and a high order hypothesis, is not a viable option because the error with unknown data is considerable. Variance should be kept to a minimum when training a data model. The model must always aim for a low bias and a low variance in order to achieve the best balance between the two mistakes. |
|
| 9. |
According to you, which one is more powerful - a two layer neural network without any activation function or a two layer decision tree? |
|
Answer» A TWO-layer neural network is made up of three layers: one input layer, one hidden layer, and one output layer. When dealing with neural networks, an activation function is essential since it is required when dealing with COMPLEX and nonlinear FUNCTIONAL mappings between inputs and response variables. When there is no activation function in a two-layer neural network, it is simply a linear network. A Neural Network without an Activation function is just a Linear Regression Model, which has limited capability and frequently fails to PERFORM well. A DECISION tree with a depth of two layers is known as a two-layer decision tree. Decision Trees are a type of supervised machine learning (that is, the machine is fed with what the input is and what the related output is in the training data) in which the data is continually split according to a parameter. Two entities, decision nodes, and leaves can be used to explain the tree. The decisions or final outcomes are represented by the leaves. And the data is separated at the decision nodes. When comparing these two models, the two-layer neural network (without activation function) is more powerful than the two-layer decision tree, because the two-layer neural network will consider more attributes while building a model, whereas the two-layer decision tree will only consider 2 or 3 attributes. The figure on the left depicts a 2 layer decision tree and the figure on the right depicts a 2 layer neural network. |
|
| 10. |
Can a deep learning model be solely built on linear regression? |
|
Answer» Yes, if the problem is represented by a linear equation, DEEP networks can be BUILT using a linear function as the activation function for each layer. A problem that is a composition of linear functions, on the other hand, is a linear function, and there is nothing spectacular that can be accomplished by implementing a deep NETWORK because adding more nodes to the network will not boost the machine LEARNING model's predictive capacity. |
|
| 11. |
While building a neural network architecture, how will you decide how many neurons and the hidden layers should the neural network have? |
|
Answer» There is no clear and fast rule for determining the exact number of neurons and hidden layers required to design a neural network architecture given a business problem. The size of the hidden layer in a neural network should be somewhere between the size of the output layers and that of the input layers. However, there are a few basic ways that might help you get a head start on constructing a neural network architecture:
|
|