| Of all possible populations, those with a perfectly uniform
distribution require the collection of the largest sample in order to
be certain that no significant fractions have been missed. We will
consider the case where there are M=20 such fractions. This case can
easily be generalized to any M. For a perfectly uniform distribution,
each of the 20 fractions equals exactly f=0.05.
If we are interested in only one of these fractions, e.g. #1 (in
subsequent figures, the shaded box(es) indicate(s) the fraction(s) of
interest), then the probability of missing this fraction is p=1-f. The
probability that this occurs for each one of k experiments is p =
(1-f)
However, if we are not just interested in one particular fraction, but in all 20 fractions, the probability of missing at least one of them is much larger. It is the probability of missing: |
|
or: or ... or In combinatoric terms:
While better than (1), this is still not the equation that we want, because the probability that any two fractions are simultaneously missed is counted twice, causing an estimate of p that is too high. Therefore, the following situations: |
|
or or ... have to be subtracted from Equation 5. This gives rise to the following expression:
Equation 6 is a better approximation than Equation 5, but the probability that three fractions are missed at the same time is subtracted twice, resulting in too low an estimate for p. |
|
Therefore, a correction is added to (6), becoming a third-order approximation:
This equation will again overestimate p because the probability of simultaneously missing four fractions is counted twice. It is clear by now that this process of iterative corrections to Equation 5 can be repeated until we have corrected for the probability that all twenty fractions are missed:
|
|
This probability equals
or, generalizing by replacing 20 with M:
Equation 10 is a special instance of Equation 2 for A = 0 and B = 0. This form gives the correct value for p when the relevant fractions exactly add up to 100% of the population (i.e. M = 1/f). There are two situations where the relevant fractions do not exactly add up to one:
|
|
A = 1, B = 0:
or A = 1, B = 1:
The derivation of p for these cases is completely analogous to the
derivation of Equation 10. Equation 2
is a generalization that takes care of all possibilities.
In addition to the worst-case scenario, a best-case scenario can also be considered given a certain number of relevant fractions (m). If the number of relevant fractions is not known, the lowest possible p is always associated with a delta function (one single age component). For the latter population p equals zero, which is an information-free trivial result. For example, if m = 3, the best-case scenario is given by: |
The derivation of p for this case is completely analogous to the derivation of Equation 10 with M = m = 3 and f = 1/3:
|