ANSWERED on Mon 16 Nov 2009 - 2:20 am UTC by mathtalk
Home » Science and Mathematics » #3456
Please carefully read the Disclaimer and Terms & conditions.Actions: Add Comment
Asked by gidton on Sat 14 Nov 2009 - 9:44 pm UTC:
Suppose that a test for a disease generates the following results: If a tested patient has the disease, the test returns a positive result with probability 'p' If a tested patient does not have the disease, the test returns a positive result with probability 'q' We can also assume, that the a priory probability of a patient being infected is 0.5. The test is performed 'n' times on the same patient (with the above probabilities). We use Bayesian inference to calculate the chances of the person having the disease, and pick the answer with the higher probability. With a given 'n', how do you calculate the rate of correct answers?
Request for clarification by Researcher mathtalk on Sun 15 Nov 2009 - 4:54 pm UTC:
Just to be sure, a priori the chance is 0.5 that the tested patient has the disease, and equally an a priori 0.5 chance of not having the disease? regards, mathtalk
Question clarification by gidton on Sun 15 Nov 2009 - 5:05 pm UTC:
Exactly
Answer by Researcher mathtalk on Mon 16 Nov 2009 - 2:20 am UTC:
Half the population is infected with a Disease:
Pr(Disease) = Pr(~Disease) = 1/2
A test is positive with probability p for those
infected, and with probability q for those not
infected:
Pr(TestPos|Disease) = p
Pr(TestPos|~Disease) = q
A sequence of n tests, considered as independent
binomial trials (Bernoulli process), is performed.
We call the event of k positive and n-k negative
test results Test(k,n-k). Then:
Pr(Test(k,n-k)|Disease) = C(n,k) p^k (1-p)^(n-k)
Pr(Test(k,n-k)|~Disease) = C(n,k) q^k (1-q)^(n-k)
Bayes formula gives us an expression for the
"converse" probability:
Pr(Disease & Test(k,n-k))
Pr(Disease|Test(k,n-k)) = -------------------------
Pr(Test(k,n-k))
Pr(Test(k,n-k)|Disease)*Pr(Disease)
= ----------------------------------------------------
Pr(Disease & Test(k,n-k))+Pr(~Disease & Test(k,n-k))
p^k (1-p)^(n-k)
= ---------------------------------
p^k (1-p)^(n-k) + q^k (1-q)^(n-k)
The same technique shows us that Pr(~Disease|Test(k,n-k))
is complementary to the above:
Pr(Disease|Test(k,n-k)) + Pr(~Disease|Test(k,n-k)) = 1
The problem is stated thus:
> We use Bayesian inference to calculate the
> chances of the person having the disease, and
> pick the answer with the higher probability.
> With a given 'n', how do you calculate the
> rate of correct answers?
Specific values of p,q to determine which chance
is larger. It is possible for some particular values
of p,q,n,k that Disease and ~Disease are equally
likely, given test results Test(k,n-k). We will
exclude these from "correct answers" as strictly
speaking the test results don't give a diagnosis.
A reasonable assumption, given the context, would be
that p > q, i.e. that the test is more likely to be
positive for a person infected with the disease than
for one not infected. We will continue discussion
of the problem under this assumption, and try to flag
its importance as we go:
ASSUMPTION: 1 > p > q > 0
Using this assumption it is clear that Test(n,0) will
suffice to diagnose the Disease, since p^n > q^n.
In the same manner it follows that Test(0,n) suffices
to diagnose ~Disease, since (1-p)^n < (1-q)^n.
We can prove that the comparative chances of the two
diagnoses are monotonic with respect to the number of
positive test results:
Proposition: Let 1 > p > q > 0 be given such that
for 0 <= k1,k2 <= n:
Pr(Disease|Test(k1,n-k1) > Pr(~Disease|Test(k1,n-k1)
Pr(Disease|Test(k2,n-k2) < Pr(~Disease|Test(k2,n-k2)
Then k1 > k2.
Proof: The two inequalities imply that:
p^k1 (1-p)^(n-k1) > q^k1 (1-q)^(n-k1)
p^k2 (1-p)^(n-k2) < q^k2 (1-q)^(n-k2)
Dividing the first by the second (as all terms
are positive):
(p/(1-p))^(k1-k2) > (q/(1-q))^(k1-k2)
Since 1 > p > q > 0 implies p/(1-p) > q/(1-q),
the above can only be true if (k1-k2) > 0. QED
A diagnosis is correct if Disease and Test(k,n-k)
such that:
Pr(Disease|Test(k,n-k) > Pr(~Disease|Test(k,n-k)
or if ~Disease and Test(k,n-k) such that:
Pr(Disease|Test(k,n-k) < Pr(~Disease|Test(k,n-k)
The rate of correct diagnoses can be obtained by
combining the probabilities of these outcomes.
What the above Proposition tell us is that the
cases where Test(k,n-k) gives a disease diagnosis
are k1 <= k <= n for some k1, and the cases where
Test(k,n-k) gives a non-disease diagnosis are
0 <= k <= k2 for some k2, and that k1 > k2. As
noted before, we cannot rule out that Test(k,n-k)
might give equal chances for Disease and ~Disease.
That is, there may be some k between k2 and k1.
Letting k1,k2 be the bounds on k as described in
the previous paragraph, then:
Pr(Correct Diagnosis) =
Pr(Disease) * SUM Pr(Test(k,n-k)|Disease)
for k=k1,..,n
+ Pr(~Disease) * SUM Pr(Test(k,n-k)|~Disease)
for k=0,..,k2
= 0.5*SUM C(n,k) p^k (1-p)^(n-k) for k=k1,..,n
+ 0.5*SUM C(n,k) q^k (1-q)^(n-k) for k=0,..,k2
Without given values of p,q (and n), little if
anything more can be said about the rate of the
correct diagnosis.
However if p,q,n are known, then bounds k1,k2
can be found and the summations above evaluated.
regards, mathtalk
Actions: Add Comment
|
Frequently Asked Questions | Terms & Conditions | Disclaimer | Privacy Policy | Contact Us | Spread the word! © 2010 Uclue Ltd |