Synthetic populations

Next: Case studies of real Up: More realistic populations Previous: More realistic populations

Synthetic populations

In this section, we will try to find the minimum number of grains that have to be dated to adequately represent an "average" population, as opposed to the best- and worst-case populations of the previous section. We will assume that all possible detrital populations of the geologic record are equally likely to occur. Such populations can be synthetically generated by randomly selecting multinomial proportions from a uniform distribution. This procedure is illustrated in Appendix B. Thus, for any specific number of fractions M, we generate a large number of random populations (e.g. 1000). For each population, we construct a large number (e.g 200) of random samples (again, see Appendix B for details). For each sample, the relevant population fractions are tested to see if the sample contains at least one "grain age" that falls within it. If at least one of the relevant fractions is empty, the test has failed. The ratio of the number of samples that failed the test to the total number of samples represents an estimate of p. This process is repeated for a range of values for M.

Figure 4 shows the result of this procedure for k=60 and f=0.05. For M ranging from one to 100, 1000 populations of that size were created. For each of these populations, 200 samples of k random numbers were generated. For each value of M, a 5, 50, 95, 99 and 100% percentile was computed from the p-values of its 1000 random populations. The higher its "percentile", the closer a synthetic population is to a uniform distribution. For example, a "99 percentile" population is likely to be strongly multimodal, while a "5 percentile" population would be more unimodal. All future plots in this paper that are derived from plots like Figure 4 will only consider the 95% percentile populations. That said, Figure 4 is the numerical "intermediate-case" analogue to Figure 1. p reaches a maximum value at M $\approx$ 35, and not at M=20, which would be the expected result when only considering the fact that at M=20 (=1/f), the number of relevant fractions (m) reaches a maximum. The reason why the peak is located at a higher M is that p is not only a function of m, but the result of a tradeoff between the number of relevant fractions (m) and the total portion of the population that is covered by these fractions, where the latter parameter steadily decreases with increasing M. From Figure 4 (which, as discussed before, is only valid for k=60 and f=0.05), the chance of missing at least one fraction f $\geq$ 0.05 in the median population is 10%; p $\leq$ 18.5% in 95% of the randomly generated populations; p $\leq$ 25% in 99% of the populations; and p $\leq$ 30% in all 1000 populations. Not surprisingly, these probabilities are significantly less than the 64% which was calculated for the worst-case scenario for the same values of k, f and p with Equation 4. However, even for samples from the median synthetic population, the chance of missing at least one fraction $\geq$ 0.05 is more than the 5% which was the result of the erroneous use of Equation 1. Only in little over 5% of all randomly generated populations there is less than 5% chance of missing at least one fraction $\geq$ 0.05 of the population when 60 grains were measured. In addition to a numerical analogue to the analytical parameter $p_{max}$ , it is also possible to obtain a numerical version of $M_{opt}$ (Figure 4). This value will generally be larger than its analytical equivalent for the worst-case scenario. For example, to reduce the chance of missing at least one fraction $\geq$ 0.05 of the population to less than 5%, while still only measuring 60 grains, the maximum number of fractions that can be used in the age-histogram is M $_{opt}$ =6 (as opposed to M $_{opt}$ =2 in the worst-case scenario).

By tracing the evolution of the numerical $p_{max}$ with k and f, Figure 5 illustrates the numerical analogue to Figure 3. It allows a quick estimation of the number of grains that are required for certain key values of f and p. For example, when 95% confidence is desired that no fraction $\geq$ 0.05 is missed, and this for 95% of all randomly generated populations, at least $\sim$ 95 grains have to be dated. This estimate is less than the 117 grains which are necessary in the worst-case scenario, but greater than the 60 grains that Equation 1 implies. Alternatively, when 60 grains are dated, we can be 95% certain that no fraction f $_{act} \geq$ 0.07 was missed. As might be expected, the numerical estimate falls in between the worst-case scenario (f $_{act}$ =0.85) and the result from Equation 1 (f=0.05).

Next: Case studies of real Up: More realistic populations Previous: More realistic populations

Pieter Vermeesch 2004-05-19