mercoledì 16 giugno 2010

Bayesian Estimation

Let x be an observation extracted from a distribution P(x, θ), dependent on an unknown parameter θ. Let us model θ as a random variable extracted from a known distribution p(θ). In the absence of any observation, the a priori distribution of x is the average of all distributions weighted by probability of the value of θ generating them.

P(x) = ∫ P(x, θ) dθ

We wish to estimate θ using both the knowledge of the distribution and an observation of x'.

P(θ | x') * P(x') = P(x' and θ) = P(x' | θ) * P(θ)

Hence it is possible to compute the conditional distribution of x given an observation x'.

P(x | x') = ∫ p(x, θ | x') dθ = ∫ P(x | θ, x') P(θ | x') dθ = ∫ P(x | θ) P(θ | x') dθ

Notice that P(x | θ, x') = P(x | θ) because θ, the parameter of the distribution is a sufficient statistic. Hence, the knowledge of x' does not change the a posteriori distribution of x.

Maximum A Posteriori estimate (MAP)

θ[MAP] = arg max p(θ | x)

Bayesian Estimator: A posteriori expected value of the parameter.

θ[Bayes] = E[θ | x] =  ∫ θ P(θ | x) dθ

Bayesian classification

Let x be an observation and C(x) a Bernoulli random variable, function of x. We are interested in computing the probability distribution of C, conditioned on the value of x.

P(C | x) * P(x) = P(C and x) = P(x | C) * P(C)

  • P(C): prior probability
  • P(x | C): likelihood of x assuming C
  • P(x): evidence
  • P(C | x): posterior likelihood of C given x
Simple Bayes' classifier
Select C such that P(C | x) is maximum.

Risk
When we make a misclassification we incur in a cost. Risk: measure of the uncertainty of this loss.
  • Expected loss
  • VaR
  • Worst Conditional Expectation
Loss:
l(i, k) = Loss incurred in misclassifying an instance of class i as an instance of class k

Expected risk:
R(C_i | x) = SUM(k = 1, K) l(i,k) P(C_k | x)