Conjugate Priors

January 24, 2018

Conjugate Priors.

1. Analytical inference:

See the Bayes formulae. The problem is that the evidence is difficult to estimate for the analytical inference.

One way to get away is the maximum posterior (MAP) and is very easy to compute, but it comes with many problems.
1. Lack of the in-variance to re-parameterization.
2. The MAP cannot be used as the prior.
3. The MAP is an untypical point (the mode).
4. The credible regions may not be computed.

2. Conjugate distributions.

Another approach to avoid computing the evidence is called the conjugate distributions. As the likelihood and the evidence is fixed by our model, we can only vary the prior so that it is easier to compute the posterior. If the posterior distribution p(θ|x) are in the same family as the prior distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood. For example, the Gaussian family is conjugate to itself (or self-conjugate) with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood that is also Gaussian. See the last slide for the numerical example.

3. Example: Normal and precision.

The Gamma distribution is defined as shown in the slide 1. This distribution is always positive. Using the formulae for expectation and variance, the Gamma distribution can be applied in the real world application as shown in slide 3.

Define the precision as the inverse of the variance. The functional form for the normal distribution can be written in the form of the gamma distribution. Therefore, for the likelihood of the normal distribution, the Gamma prior gives the Gamma posterior as shown in the last slide. In Bayesian inference, the gamma distribution is the conjugate prior to many likelihood distributions: the Poisson, exponential, normal (with known mean), Pareto, gamma with known shape σ, inverse gamma with known shape parameter, and Gompertz with known scale parameter.

4. Example: Bernoulli distribution.

The Beta distribution is defined as shown in the slide 1. This distribution can be used when x belongs in the range of 0 to 1.. Using the formulae for expectation and variance, the Beta distribution can be applied in the real world application as shown in slide 3.The rest of the slide proves that the Beta distribution, defined in the slide 1, is conjugate to the Bernoulli distribution.

The main disadvantage of the conjugate prior is that it might be inadequate to represent the model.