Introduction To Non-Informative Priors

statistics

By Unknown Author

Prior density is denoted by g(.)g(.) in this article

Introduction

Non-Informative Priors are the priors which we assume when we do not have any belief about the parameter let say θ\theta . This leads noninformative priors to not favor any value of θ\theta , which gives equal weights to every value that belongs to Θ\Theta. for example let us we have three hypothesis , so the prior which attach weight of 13 \frac{1}{3} to each of the hypothesis is noninformative prior.

<!--more-->

Note : most of the noninformative priors are improper.

An Example

Now let us assume a simple example let us assume our parameter space Θ\Theta is a finite set containing n elements such as

θ1,θ2,θ3,θ4....θn  Θ{\theta_1,\theta_2,\theta_3,\theta_4....\theta_n} \ \in \ \Theta

Now the obvious weight given to each θi\theta_i when we have not any prior beliefs is 1n\frac{1}{n} that gives us prior is proportional to a constant because 1n\frac{1}{n} is a constant let us say 1n\frac{1}{n}=c hence we can say

g(θ)=cg(\theta) = c

Now let us assume a transformation η=eθ\eta=e^{\theta} , that is θ=log(η)\theta = log(\eta) . If g(θ) g(\theta) is the density of θ\theta then we can write density of η\eta as

g(η)=g(θ)dθdηg(η)=g(log η)d log ηdηg(η)=g(log η)ηg(η)1ηg^*(\eta)=g(\theta)\frac{d\theta}{d\eta} \\ g^*(\eta)=g(log \ \eta)\frac{d \ log \ \eta }{d\eta} \\ g^*(\eta)=\frac{g(log \ \eta)}{\eta} \\ g^*(\eta) \propto \frac{1}{\eta}

Thus if we choose prior for θ\theta as constant , then we have to assume prior for η\eta as proportional to η1\eta^{-1} to arrive at the same answer in both cases either we take θ\theta or η\eta . Thus we cannot maintain consistency and assume both prior proportional to constant . This leads to the search of such noninformative priors which are invariant under transformations.

Noninformative Priors for Location Parameter

A Parameter is said to be location parameter if the density f(x;θ)f(x ; \theta) can be written as a function of (xθ)(x - \theta)

Let X is a random variable with location parameter θ\theta then density can be written as h(xθ)h(x- \theta). Just assume instead of observing X we observed Y = X+c and let us take η=θ+c\eta=\theta+c then can see that the density of Y is given by h(yη)h(y - \eta). Now (X,θ) and(Y,η)(X,\theta) \ and (Y,\eta) have same parameter and sample space which gives us the idea that they must have same noninformative prior

Let gg and gg^* are noninformative priors for (X,θ) and(Y,η)(X,\theta) \ and (Y,\eta) respectively. So according to our argument both will have same noninformative priors , let us assume a subset of real line A

Pg(θ  A)=Pg(η  A)P^g(\theta \ \in \ A ) = P^{g^*}(\eta \ \in \ A )

Now we have assumed η=θ+c\eta=\theta+c so

Pg(η  A)=Pg(θ+c   A)=Pg(θ  Ac)P^{g^*}(\eta \ \in \ A )=P^{g}(\theta +c \ \ \in \ A )=P^{g}(\theta \ \in \ A-c )

which leads us to

Pg(θ  A)=Pg(θ  Ac)Ag(θ)dθ=Acg(θ)dθ=Ag(θc)dθ(*)P^{g}(\theta \ \in \ A)=P^{g^*}(\theta \ \in \ A-c ) \tag{*}\\ \int_Ag(\theta)d\theta=\int_{A-c}g(\theta)d\theta=\int_Ag(\theta-c)d\theta

It holds for any set A of real line , and any c on real line so it lead us to

g(θ)=g(θc)g(\theta)=g(\theta-c)

Now if we take θ=c\theta=c we get g(c)=g(0)g(c)=g(0) ,and we know it is true for all c , it leads us to the conclusion that the prior in the case of location parameter is constant functions , for simplicity most of the statistician assume it equal to 1 , g(.)=1g(.) = 1

Noninformative Priors for Scale Parameter

A Parameter is said to be location parameter if the density f(x;θ)f(x ; \theta) can be written as a 1θh(xθ)\frac{1}{\theta}h(\frac{x}{\theta}) where θ>0\theta>0

For example in normal distribution we N(μ,σ2)N(\mu,\sigma^2) , σ\sigma is a scale parameter .

To get noninformative prior for Scale Parameter θ\theta of a random variable X , instead of observing X we observe Y=cXY = cX for any c>0c > 0 , let us define η=cσ\eta = c\sigma , so then the density of YY is given by 1ηf(1η)\frac{1}{\eta}f(\frac{1}{\eta}) .

Now similar to previous part here (X,θ)(X,\theta) and (Y,η)(Y,\eta) have same sample and parameter space , so both will have same noninformative priors. Let gg and gg^* are noninformative priors for (X,θ) and(Y,η)(X,\theta) \ and (Y,\eta) respectively. So according to our argument both will have same noninformative priors

Pg(θA)=Pg(θA)P^g(\theta \in A)= P^{g^*}(\theta \in A)

Here A is a subset of Positive real line, i.e AR+A \subset R^+ , now putting η=cσ\eta = c\sigma

Pg(ηA)=Pg(θAc)Pg(θA)=Pg(θAc)Ag(θ)dθ=Acg(θ)dθ=A1cg(θc)dθP^{g^*}(\eta \in A) = P^g(\theta \in \frac{A}{c}) \\ P^g(\theta \in A) = P^g(\theta \in \frac{A}{c}) \\ \int_Ag(\theta)d\theta=\int_{\frac{A}{c}}g(\theta)d\theta=\int_A\frac{1}{c}g(\frac{\theta}{c})d\theta

so

g(θ)=1cg(θc)g(\theta)=\frac{1}{c}g(\frac{\theta}{c})

Now taking θ=c\theta=c , we get

g(c)=1cg(1)g(c)=\frac{1}{c}g(1)

Now this equation is true for any value c>0c>0 so , for convenience taking g(c)=1g(c)=1 , it gives us noninformative prior g(θ)=1θg(\theta)= \frac{1}{\theta}

Note : It is an improper prior , 01θdθ=\int_0^{\infty}\frac{1}{\theta}d\theta = \infty

Flaw and introduction of relatively location invariant prior

Now we know noninformative prior for both Scale and Location parameter, but there is flaw . The prior we get for location and scale parameter in previous part are improper priors . If two random variables have identical form , then they have same non informative priors . but the problem here is due to improper priors , noninformative priors are not unique. lets say we have an improper prior g then if we multiply g by any constant k then the resultant gk will give same bayesian decisions as g.

Now in previous parts we have assumed two priors gg and gg^* , but we do not need that , we can get gg^* by just multiplying gg by a constant and vice-versa.

Now equation ()(*) can be written as

Pg(A)=l(k)Pg(Ac)P^g(A)=l(k)P^{g}(A-c)

Where l(k)l(k) is some positive function ,

Ag(θ)dθ=l(k)Acg(θ)dθ=l(k)Ag(θc)dθ\int_Ag(\theta)d\theta=l(k)\int_{A-c}g(\theta)d\theta=l(k)\int_Ag(\theta-c)d\theta

It holds for all A , so g(θ)=l(k)g(θc)g(\theta)=l(k)g(\theta-c) , and taking θ=c\theta=c give us l(k)=g(c)g(0)l(k)=\frac{g(c)}{g(0)} , putting this value back will give us

g(θc)=g(0)g(θ)g(c)(**)g(\theta-c)=\frac{g(0)g(\theta)}{g(c)} \tag{**}

Now there is a lot of prior other than g(θ)=cg(\theta)=c , which satisfy equation (** ) , so any prior of this form will be know as relatively location invariant