MOSAIC

MOSAIC is a tool for modelling multiway admixture using dense genotype data.
See our 2019 paper in Genetics for details
or start with this post for a short and simple lay summary.

Given a set of potentially admixed haplotypes (targets) and multiple labelled sets of potentially related haplotypes (panels),
MOSAIC will infer the most recent admixture events occurring in the targets in terms of the panels.

It is not necessary that any of the panels are a good surrogate for the unseen mixing populations as MOSAIC will infer parameters controlling:
1. The stochastic relationship between panels and ancestral populations.
2. Timings and ancestry proportions of the admixture events.
3. Recombination rates before and after admixture.
4. Mutation / error rates for the haplotype copying.

Phasing improvements in light of the admixture model are performed and local ancestry along the genome is estimated.

Code

Click to download an R Package version 1.5.1, README.txt, and command line interface mosaic.R.

Or you can clone from the git repository that includes these and the source code.

Please read the manual MOSAIC.pdf for details on running the code, illustrated with an simulation example.

Example data in a folder can used tried here: exddata which is also contained within the package once installed.

Some potentially useful conversion tools include convert_from_haps.R to convert shapeit2 output haps files to the format required by MOSAIC and convert_to_haps.R to convert back again.

Human Genome Diversity Panel

A browser of MOSAIC results on an extended version of the Human Genome Diversity Panel demonstrates the flexibility of our approach.
Click on the map below to open an interactive Google Maps interface to the results.

It's worth pointing out here that each population is modelled as a MOSAIC of all others; thus admixture is characterised in terms of how these particular 95 populations are stochastically related, based only on the individuals in these samples. Thus the mantra that absence of evidence is not evidence of absence holds true: in those cases where we do not return a clear admixture model, we are simply saying that the small number of individuals in such a "population" are not well characterised as admixed with respect to the other small samples of individuals. Had we observed data on more individuals in more populations we may well expect to find evidence of admixture.

(Also available as a simple text list.)