What exactly do you mean by exploding and vanishing gradients?

1.	What exactly do you mean by exploding and vanishing gradients?
Answer» By taking INCREMENTAL steps towards the minimal value, the gradient descent algorithm aims to minimize the error. The weights and biases in a neural network are updated using these processes. However, at times, the steps grow excessively large, resulting in increased updates to weights and bias TERMS — to the point where the weights overflow (or become NaN, that is, Not a Number). An exploding gradient is the result of this, and it is an unstable method. On the other hand, if the steps are excessively SMALL, it results in minor – EVEN negligible – changes in the weights and bias terms. As a result, we may end up training a deep learning model with nearly identical weights and biases every TIME, never reaching the least error function. The vanishing gradient is what it's called.

Answer»

By taking INCREMENTAL steps towards the minimal value, the gradient descent algorithm aims to minimize the error. The weights and biases in a neural network are updated using these processes.

However, at times, the steps grow excessively large, resulting in increased updates to weights and bias TERMS — to the point where the weights overflow (or become NaN, that is, Not a Number). An exploding gradient is the result of this, and it is an unstable method.

On the other hand, if the steps are excessively SMALL, it results in minor – EVEN negligible – changes in the weights and bias terms. As a result, we may end up training a deep learning model with nearly identical weights and biases every TIME, never reaching the least error function. The vanishing gradient is what it's called.

Discussion

No Comment Found

Related InterviewSolutions

Explain the different types of activation functions.
Differentiate between Deep Learning and Machine Learning.
What do you know about Dropout?
Mention the applications of autoencoders.
What are autoencoders? Explain the different layers of autoencoders.
What exactly do you mean by exploding and vanishing gradients?
How does Recurrent Neural Network backpropagation vary from Artificial Neural Network backpropagation?
Differentiate between bias and variance in the context of deep learning models. How can you achieve balance between the two?
According to you, which one is more powerful - a two layer neural network without any activation function or a two layer decision tree?
Can a deep learning model be solely built on linear regression?