This chapter deals with modelling issues as distinct from any challenges relating to statistical inference of latent fields and hyperparameters. In this chapter, model parameters are taken to be known and inference details for the forward problem are suppressed, to be dealt with in later chapters. Specifically, the forward stage of the model is taken to have all known parameters; inference on the inverse stage is used to assess different forward models.
The novel contributions contained in this chapter relate to model choice. Specifically, the following questions are addressed: Under what circumstances a large, multivariate model, such as those required for the motivating palaeoclimate problem, may be broken down to produce a series of independent, smaller and more manageable inferential tasks? How might one proceed with such a decomposition? How might the validity or accuracy of the decomposition assessed? Finally, when a model may not be decomposed directly, are there augmentations to the data that might facilitate decomposition?
When dealing with highly multivariate datasets, such as the RS10 pollen and climate dataset, several modelling choices present themselves. These consist of choices for modelling the latent parameters of the hierarchical model, the hyperparameters and the choice of likelihood model for the data given the parameters. It is necessary to make clear the motivations and justifications for each of these choices. This requires the use of “model fit” techniques. Cross validation is the tool selected in this work, the details of which are presented in Section 4.2.
Section 3.1 introduces the type of inverse problem investigated in this thesis for a single spatial process generating counts across locations.
Section 3.2 sets out the motivation for decomposing large, joint models into independent modules. A definition of decomposable models is given and conditions under which models may and may not be exactly disjoint-decomposed are presented. Finally, sources of interaction preventing decomposition are discussed.
A fully-Gaussian case in Section 3.3 allows for the introduction of several key modelling issues in a Normal context. The tractability and familiarity of the multivariate normal model are used to present modelling issues that apply in a wider context of multivariate modelling. Specifically, non-decomposable models are developed in the context of the multivariate normal model.
Departures from normality in Section 3.4 introduce additional issues related to counts data. Novel models for such data are also introduced in this latter section in the form of specialised likelihood functions, such as zero-inflated data models.
Section 3.5 deals with the constrained space associated with compositional data analysis. Some pitfalls of analysis on this space are described. Finally, novel models for specifying complex yet decomposable models for compositional data are presented.
Finally, conclusions are made from the work detailed in the preceding sections. These conclusions are carried forward into the later chapters.
Returning to the toy problem described in Section 2.5.2, a univariate process varies smoothly across a location space. That section demonstrated the effect of differing model hyperparameters on the inverse problem. The inverse predictive distributions were found to be multimodal due to the shape of the response surface. The shape of the response surface also influences the degree of accuracy to which the inverse problem (prediction of location given count) may be solved.
To recap, the goal is to infer unknown location l given training counts Ỹ with training
locations and a new count ynew.
As per Equation (2.24),
![]() | (3.1) |
A flat prior is used for πlnew