What is K-L divergence and what is its relevance in Machine Learning a

1.	What is K-L divergence and what is its relevance in Machine Learning algorithm?
Answer» If we have two different probability distributions P(x) and Q(x) over the same random variable x, we can measure how diﬀerent these two distributions are using the Kullback-Leibler (KL) divergence: In the case of discrete variables, it is the extra amount of information (measured in bits if we use the base-2 logarithm, but in machine learning we usually use nats and the natural logarithm) needed to send a message containing symbols drawn from probability distribution P, when we use a code that was designed to minimize the length of messages drawn from probability distribution Q. The KL divergence has many useful PROPERTIES, most notably being non-negative. The KL divergence is 0 if and only if P and Q are the same distribution in the case of discrete variables, or equal “almost everywhere” in the case of continuous variables. Because the KL divergence is non-negative and measures the diﬀerence between two distributions, it is often conceptualized as measuring some sort of distance between these distributions. One use for KL-divergence in the context of discovering CORRELATIONS is to calculate the Mutual Information (MI) of two variables which can reveal some pattern between two different variables and provide idea about the correlation structure. Another use for Kullback-Leibler divergence is in the domain of variational inference, where an OPTIMIZATION problem is constructed that to minimize the KL-divergence between the intractable target distribution P and a sought element Q from a class of tractable distributions. Many approximating algorithms (which can also be used to fit probabilistic models to data) can be interpreted using KL divergence. Among those are Mean Field, (Loopy) Belief Propagation (generalizing forward-backward and Viterbi for HMMs), Expectation Propagation, Junction graph/tree, tree-reweighted Belief Propagation. (Please refer to: Wainwright, M. J. and Jordan, M. I. Graphical models, exponential FAMILIES, and variational inference, Foundations and Trends text registered in Machine Learning, Now Publishers Inc., 2008, Vol. 1(1-2), pp. 1-305)

What is K-L divergence and what is its relevance in Machine Learning algorithm?

Answer»

If we have two different probability distributions P(x) and Q(x) over the same random variable x, we can measure how diﬀerent these two distributions are using the Kullback-Leibler (KL) divergence:

In the case of discrete variables, it is the extra amount of information (measured in bits if we use the base-2 logarithm, but in machine learning we usually use nats and the natural logarithm) needed to send a message containing symbols drawn from probability distribution P, when we use a code that was designed to minimize the length of messages drawn from probability distribution Q. The KL divergence has many useful PROPERTIES, most notably being non-negative. The KL divergence is 0 if and only if P and Q are the same distribution in the case of discrete variables, or equal “almost everywhere” in the case of continuous variables. Because the KL divergence is non-negative and measures the diﬀerence between two distributions, it is often conceptualized as measuring some sort of distance between these distributions.

One use for KL-divergence in the context of discovering CORRELATIONS is to calculate the Mutual Information (MI) of two variables which can reveal some pattern between two different variables and provide idea about the correlation structure.

Another use for Kullback-Leibler divergence is in the domain of variational inference, where an OPTIMIZATION problem is constructed that to minimize the KL-divergence between the intractable target distribution P and a sought element Q from a class of tractable distributions.

Many approximating algorithms (which can also be used to fit probabilistic models to data) can be interpreted using KL divergence. Among those are Mean Field, (Loopy) Belief Propagation (generalizing forward-backward and Viterbi for HMMs), Expectation Propagation, Junction graph/tree, tree-reweighted Belief Propagation.

(Please refer to: Wainwright, M. J. and Jordan, M. I. Graphical models, exponential FAMILIES, and variational inference, Foundations and Trends text registered in Machine Learning, Now Publishers Inc., 2008, Vol. 1(1-2), pp. 1-305)

What is K-L divergence and what is its relevance in Machine Learning algorithm?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment