# As a thought experiment, we can consider a curve comprising just a single (calibrated) date of an organic sample

## (a) Directly interpreting a summed probability distribution

The sample has a single (point) true date of death, and the curve tells us how believable each possible date is. Neither the sample’s existence nor the true date of its death waxes and wanes through time. Likewise, we cannot interpret the SPD of a small dataset across a narrow time period as representing the fluctuations of a population through time-instead, it represents how believable each year is, as possible point estimates for sample 1 or sample 2 or sample 3, etc. It is this ‘or’ component (the summing) that restricts the interpretation of the curve-the SPD is not the single best explanation of the data, nor even a single explanation of the data, but rather a conflation of many possible explanations simultaneously, each of which is mired by the artefacts inherited from the calibration wiggles.

We deliberately used the word explanation, since the SPD is merely a convolution of two datasets: the raw 14 C/ 12 C ratios with their errors, and the calibration curve with its error ribbon. Therefore, the SPD provides an excellent graphical representation of the data by compressing a large amount of information into a single plot, and its value in data representation should not be disparaged. However, the SPD is not a model and cannot be directly interpreted to draw reliable inferences about the population dynamics.

## (b) Simulation methods to reject a null model

Recognizing the need for a more robust inferential framework, by 2013 methods were developed that moved away from mere data representation, and instead focused on directly modelling the population. An exponential (or any other hypothesized shape) null model could be proposed, and many thousands of simulated datasets could then be generated under this model and compared to the observed. The SPD was no longer the end product; instead, it was used to generate a summary statistic. The summary statistics from each simulated SPD (and the observed SPD) could then be compared, a p-value calculated and (if deemed significant) the hypothesized model could be rejected [25,26]. This approach was successful in directly testing a single hypothesized population history and was widely adopted [12,27–33] as the field moved towards a model-based inferential framework.

## (c) Other approaches to directly modelling the population

The inferential limits of the SPD and the importance of directly modelling population fluctuations have been approached with various underlying model structures. The Oxcal program offers Kernel Density Models , while the R package Bchron employs Bayesian Gaussian mixture models. Both approaches can provide models of the underlying population by performing parameter searches and are based on sound model likelihood approaches. However, Gaussian-based models (both mixture models and kernels) are by nature complex curves with constantly changing gradients. No doubt real population levels also fluctuate through time with complex and relentless change, but this leaves us with a model that can only be described graphically and cannot be easily summarized in terms of dating key demographic events.

Furthermore, these methods do not address how reasonable the model structure is in the first place. There are two approaches to achieve this. Firstly, a goodness-of-fit (GOF) test can establish if the observed data could have been reasonably produced by the model. This is essentially the approach taken by the simulation methods mentioned above where the p-value provides this GOF, and allows the model to be rejected if it is a poor explanation of the data. Secondly, a model selection process can be used to ensure unjustifiably complex models are rejected in favour of the simplest plausible model with the greatest explanatory power.