Register or Login to browse without ads

Thu 2 Sep 2010 - 5:24 pm UTC

Home | Ask a Question | Browse Questions

5 stars ANSWERED on Mon 16 Nov 2009 - 2:20 am UTC by mathtalk

Question: Bayesian inference

Home » Science and Mathematics » #3456

Please carefully read the Disclaimer and Terms & conditions.
Priced at $25.00

Actions: Add Comment

Asked by gidton on Sat 14 Nov 2009 - 9:44 pm UTC:

Suppose that a test for a disease generates the following results:
If a tested patient has the disease, the test returns a positive result
with probability 'p'
If a tested patient does not have the disease, the test returns a positive
result with probability 'q'
We can also assume, that the a priory probability of a patient being
infected is 0.5. 

The test is performed 'n' times on the same patient (with the above
probabilities). We use Bayesian inference to calculate the chances of the
person having the disease, and pick the answer with the higher
probability.

With a given 'n', how do you calculate the rate of correct answers?

Uclue Researcher Request for clarification by Researcher mathtalk on Sun 15 Nov 2009 - 4:54 pm UTC:

Just to be sure, a priori the chance is 0.5 that the tested patient has the
disease, and equally an a priori 0.5 chance of not having the disease?

regards, mathtalk

Question clarification by gidton on Sun 15 Nov 2009 - 5:05 pm UTC:

Exactly

Uclue Researcher 5 stars Answer by Researcher mathtalk on Mon 16 Nov 2009 - 2:20 am UTC:

Half the population is infected with a Disease:

Pr(Disease) = Pr(~Disease) = 1/2

A test is positive with probability p for those
infected, and with probability q for those not
infected:

Pr(TestPos|Disease) = p

Pr(TestPos|~Disease) = q

A sequence of n tests, considered as independent
binomial trials (Bernoulli process), is performed.
We call the event of k positive and n-k negative
test results Test(k,n-k).  Then:

Pr(Test(k,n-k)|Disease) = C(n,k) p^k (1-p)^(n-k)

Pr(Test(k,n-k)|~Disease) = C(n,k) q^k (1-q)^(n-k)

Bayes formula gives us an expression for the
"converse" probability:

                          Pr(Disease & Test(k,n-k))
Pr(Disease|Test(k,n-k)) = -------------------------
                                Pr(Test(k,n-k))

          Pr(Test(k,n-k)|Disease)*Pr(Disease)
= ----------------------------------------------------
  Pr(Disease & Test(k,n-k))+Pr(~Disease & Test(k,n-k))

           p^k (1-p)^(n-k)
= ---------------------------------
  p^k (1-p)^(n-k) + q^k (1-q)^(n-k)

The same technique shows us that Pr(~Disease|Test(k,n-k))
is complementary to the above:

Pr(Disease|Test(k,n-k)) + Pr(~Disease|Test(k,n-k)) = 1

The problem is stated thus:

> We use Bayesian inference to calculate the
> chances of the person having the disease, and
> pick the answer with the higher probability.

> With a given 'n', how do you calculate the
> rate of correct answers?

Specific values of p,q to determine which chance
is larger.  It is possible for some particular values
of p,q,n,k that Disease and ~Disease are equally
likely, given test results Test(k,n-k).  We will
exclude these from "correct answers" as strictly
speaking the test results don't give a diagnosis.

A reasonable assumption, given the context, would be
that p > q, i.e. that the test is more likely to be
positive for a person infected with the disease than
for one not infected.  We will continue discussion
of the problem under this assumption, and try to flag
its importance as we go:

ASSUMPTION:  1 > p > q > 0

Using this assumption it is clear that Test(n,0) will
suffice to diagnose the Disease, since p^n > q^n.

In the same manner it follows that Test(0,n) suffices
to diagnose ~Disease, since (1-p)^n < (1-q)^n.

We can prove that the comparative chances of the two
diagnoses are monotonic with respect to the number of
positive test results:

Proposition:  Let 1 > p > q > 0 be given such that
              for 0 <= k1,k2 <= n:

Pr(Disease|Test(k1,n-k1) > Pr(~Disease|Test(k1,n-k1)

Pr(Disease|Test(k2,n-k2) < Pr(~Disease|Test(k2,n-k2)

Then k1 > k2.

Proof:  The two inequalities imply that:

p^k1 (1-p)^(n-k1) > q^k1 (1-q)^(n-k1)

p^k2 (1-p)^(n-k2) < q^k2 (1-q)^(n-k2)

Dividing the first by the second (as all terms
are positive):

(p/(1-p))^(k1-k2) > (q/(1-q))^(k1-k2)

Since 1 > p > q > 0 implies p/(1-p) > q/(1-q),
the above can only be true if (k1-k2) > 0.  QED

A diagnosis is correct if Disease and Test(k,n-k)
such that:

Pr(Disease|Test(k,n-k) > Pr(~Disease|Test(k,n-k)

or if ~Disease and Test(k,n-k) such that:

Pr(Disease|Test(k,n-k) < Pr(~Disease|Test(k,n-k)

The rate of correct diagnoses can be obtained by
combining the probabilities of these outcomes.
What the above Proposition tell us is that the
cases where Test(k,n-k) gives a disease diagnosis
are k1 <= k <= n for some k1, and the cases where
Test(k,n-k) gives a non-disease diagnosis are
0 <= k <= k2 for some k2, and that k1 > k2.  As
noted before, we cannot rule out that Test(k,n-k)
might give equal chances for Disease and ~Disease.
That is, there may be some k between k2 and k1.

Letting k1,k2 be the bounds on k as described in
the previous paragraph, then:

Pr(Correct Diagnosis) =
                   
  Pr(Disease) *   SUM  Pr(Test(k,n-k)|Disease)
                for k=k1,..,n

+ Pr(~Disease) *  SUM  Pr(Test(k,n-k)|~Disease)
                for k=0,..,k2

 = 0.5*SUM C(n,k) p^k (1-p)^(n-k) for k=k1,..,n
 
 + 0.5*SUM C(n,k) q^k (1-q)^(n-k) for k=0,..,k2

Without given values of p,q (and n), little if
anything more can be said about the rate of the
correct diagnosis.

However if p,q,n are known, then bounds k1,k2
can be found and the summations above evaluated.


regards, mathtalk

5 stars Accepted and rated by gidton on Mon 16 Nov 2009 - 9:20 am UTC:

Excellent. Thank you very much.

Actions: Add Comment

Bookmark it!   Del.icio.us Digg Furl Reddit Yahoo MyWeb StumbleUpon Technorati Mixx MySpace Facebook

Frequently Asked Questions | Terms & Conditions | Disclaimer | Privacy Policy | Contact Us | Spread the word!

© 2010 Uclue Ltd