MOSAIC is a tool for modelling multiway admixture using dense genotype data.
See 2019 paper in Genetics for details.
Given a set of potentially admixed
haplotypes (targets) and multiple labelled sets of potentially related haplotypes (panels),
MOSAIC will infer the most recent admixture events occurring in the targets in terms of the panels.
It is not necessary that any of the panels are a good surrogate for the unseen mixing populations as
MOSAIC will infer parameters controlling:
1. The stochastic relationship between panels and ancestral populations.
2. Timings and ancestry proportions of the admixture events.
3. Recombination rates before and after admixture.
4. Mutation / error rates for the haplotype copying.
Phasing improvements in light of the admixture model are performed and local ancestry along the genome is estimated.
Or you can clone from the git repository that includes these and the source code.
Please read the manual MOSAIC.pdf for details on running the code, illustrated with an simulation example.
Example data in a folder can used tried here: example data.
Some potentially useful conversion tools include convert_from_haps.R to convert shapeit2 output haps files to the format required by MOSAIC and convert_to_haps.R to convert back again.
It's worth pointing out here that each population is modelled as a MOSAIC of all others; thus admixture is characterised in terms of how these particular 95 populations are stochastically related, based only on the individuals in these samples. Thus the mantra that absence of evidence is not evidence of absence holds true: in those cases where we do not return a clear admixture model, we are simply saying that the small number of individuals in such a "population" are not well characterised as admixed with respect to the other small samples of individuals. Had we observed data on more individuals in more populations we may well expect to find evidence of admixture.
(Also available as a simple text list.)