|
|
Line 1: |
Line 1: |
| In [[genetics]], '''coalescent theory''' is a retrospective model of population genetics. It attempts to trace all [[allele]]s of a [[gene]] shared by all members of a population to a single ancestral copy, known as the [[most recent common ancestor]] (MRCA; sometimes also termed the ''coancestor'' to emphasize the coalescent relationship). The inheritance relationships between alleles are typically represented as a ''gene genealogy'', similar in form to a [[phylogenetic tree]]. This gene genealogy is also known as the ''coalescent.'' Understanding the statistical properties of the coalescent under different assumptions forms the basis of coalescent theory.
| | I would like to introduce myself to you, I am Andrew and my spouse doesn't like it at all. I've usually cherished residing in Kentucky but now I'm contemplating other choices. I am really fond of handwriting but I can't make it my occupation really. Distributing production is where her main income arrives from.<br><br>my webpage :: love psychic ([http://www.khuplaza.com/dent/14869889 home-page]) |
| | |
| The coalescent runs models of [[genetic drift]] backward in time to investigate the [[genealogy]] of [[Antecedent (genealogy)|antecedents]].{{ref|Rosalind}} In the simplest case, coalescent theory assumes no [[genetic recombination|recombination]], no [[natural selection]], and no [[gene flow]] or population structure. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis. The mathematical theory of the coalescent was originally developed in the early 1980s by [[John Kingman]].{{ref|Kingman00}}
| |
| | |
| ==Theory==
| |
| | |
| Consider two distinct [[haploid]] [[organisms]] who differ at a single [[nucleotide]]. By tracing the ancestry of these two individuals backwards there will be a point in time when the MRCA is encountered and the two lineages will have ''coalesced''.
| |
| | |
| ===Time to coalescence===
| |
| A useful analysis based on coalescent theory seeks to predict the amount of time elapsed between the introduction of a mutation and the arising of a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed.
| |
| | |
| The probability that two [[Lineage (evolution)|lineage]]s coalesce in the immediately preceding generation is the probability that they share a parental DNA sequence. In a diploid population with a constant [[effective population size]] with 2''N<sub>e</sub>'' copies of each locus, there are 2''N<sub>e</sub>'' "potential parents" in the previous generation. Under a random mating model, the probability that two alleles share a parent is thus 1/(2''N<sub>e</sub>'') and, correspondingly, the probability that they do ''not'' coalesce is 1 − 1/(2''N<sub>e</sub>'').
| |
| | |
| At each successive preceding generation, the probability of coalescence is [[geometric distribution|geometrically distributed]] — that is, it is the probability of ''non''coalescence at the ''t'' − 1 preceding generations multiplied by the probability of coalescence at the generation of interest:
| |
| | |
| :<math>P_c(t) = \left( 1 - \frac{1}{2N_e} \right)^{t-1} \left(\frac{1}{2N_e}\right).</math>
| |
| | |
| For sufficiently large values of ''N<sub>e</sub>'', this distribution is well approximated by the continuously defined [[exponential distribution]]
| |
| | |
| :<math>P_{c}(t) = \frac{1}{2N_e} e^{-\frac{t-1}{2N_e}}.</math>
| |
| | |
| The standard exponential distribution has both the [[expected value]] and the [[standard deviation]] equal to 2''N<sub>e</sub>''; therefore, although the ''expected'' time to coalescence is 2''N<sub>e</sub>'', actual coalescence times have a wide range of variation. Note that coalescent time is the number of preceding generations where the coalescence took place and not calendar time though an estimation of the latter can be made multiplying 2''N<sub>e</sub>'' with the average time between generations.
| |
| | |
| ===Neutral variation===
| |
| Coalescent theory can also be used to model the amount of variation in [[DNA]] sequences expected from genetic drift alone. This value is termed the mean [[heterozygote|heterozygosity]], represented as <math>\bar{H}</math>. Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages: <math>2\mu</math>. Thus the mean heterozygosity is equal to
| |
| :<math>
| |
| \begin{align}
| |
| \bar{H} &= \frac{2\mu}{2\mu + \frac{1}{2N_e}} \\
| |
| &= \frac{4N_e\mu}{1+4N_e\mu} \\
| |
| &= \frac{\theta}{1+\theta}
| |
| \end{align}
| |
| </math>
| |
| | |
| For <math>4N_e\mu \gg 1</math>, the vast majority of allele pairs have at least one difference in [[nucleotide]] sequence.
| |
| | |
| ==Graphical representation==
| |
| Coalescents can be visualised using [[dendrogram]]s which show the relationship of branches of the population to each other. The point where two branches meet indicates a coalescent event.
| |
| | |
| ==Applications==
| |
| ===Disease gene mapping===
| |
| | |
| The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation; although the application of the theory is still in its infancy, there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory.{{ref|Morris}}{{ref|Browning}}{{ref|Zöllner}}
| |
| | |
| ===The genomic distribution of heterozygosity===
| |
| | |
| The human [[Single-nucleotide polymorphism]] (SNP) map has revealed large regional variations in heterozygosity, more so than can be explained on the basis of ([[Poisson distribution|Poisson-distributed]]) random chance.{{ref|The international SNP map working group}} In part, these variations could be explained on the basis of assessment methods, the availability of genomic sequences, and possibly the standard coalescent population genetic model. Population genetic influences could have a major influence on this variation: some loci presumably would have comparatively recent common ancestors, others might have much older genealogies, and so the regional accumulation of SNPs over time could be quite different. The local density of SNPs along chromosomes appears to cluster in accordance with a [[Taylor's law|variance to mean power law]] and to obey the [[Tweedie distributions|Tweedie compound Poisson distribution]].{{ref|Kendal}} In this model the regional variations in the SNP map would be explained by the accumulation of multiple small genomic segments through recombination, where the mean number of SNPs per segment would be [[gamma distribution|gamma distributed]] in proportion to a gamma distributed time to the most recent common ancestor for each segment.{{ref|Tavare}}
| |
| | |
| | |
| ==History==
| |
| | |
| Coalescent theory is a natural extension of the more classical [[population genetics]] concept of neutral evolution and is an approximation to the Fisher-Wright (or Wright-Fisher) model for large populations. It was ‘discovered’ independently by several researchers in the 1980s,{{ref|Kingman82}}{{ref|Hudson83a}}{{ref|Hudson83b}}{{ref|Tajima83}} but the definitive formalisation is attributed to Kingman.{{ref|Kingman82}} Major contributions to the development of coalescent theory have been made by [[Peter Donnelly]],{{ref|Donnelly95}} [[Robert Griffiths (mathematician)|Robert Griffiths]], [[Richard R Hudson]]{{ref|Hudson91}} and [[Simon Tavaré]]{{ref|Donnelly95}}. This has included incorporating variations in population size,{{ref|Slatkin01}} recombination and selection.{{ref|Kaplan88}}{{ref|Neuhauser97}} In 1999 [[Jim Pitman]]{{ref|Pitman99}} and [[Serik Sagitov]]{{ref|Sagitov99}} independently introduced coalescent processes with multiple collisions of ancestral lineages. Shortly later the full class of exchangeable coalescent processes with simultaneous multiple mergers of ancestral lineages was discovered by [[Martin Möhle]], [[Serik Sagitov]]{{ref|Mohle01}} and [[Jason Schweinsberg]]{{ref|Schweinsberg00}}. Another approach in 2006 due to Bertoin defines efficiently the same exchangeable coalescent processes (after the treatment of the famous fragmentation processes), and is now a genuine powerful approach to investigate coalescent theory in a pure mathematical setting.
| |
| | |
| ==Software==
| |
| | |
| A large body of software exists for both simulating data sets under the coalescent process as well as inferring parameters such as population size and migration rates from genetic data.
| |
| * [http://staff.washington.edu/brendano/treesimj TreesimJ] Forward simulation software allowing sampling of genealogies and data sets under diverse selective and demographic models.
| |
| * [http://beast.bio.ed.ac.uk/ BEAST] - [[Bayesian inference|Bayesian]] [[Markov chain Monte Carlo|MCMC]] inference package with a wide range of coalescent models including the use of temporally sampled sequences.
| |
| * [http://www.daimi.au.dk/~mailund/CoaSim/index.html CoaSim] - software for simulating genetic data under the coalescent model.
| |
| * [http://www.daimi.au.dk/~mailund/GeneRecon/ GeneRecon] - software for the fine-scale mapping of [[linkage disequilibrium]] mapping of disease genes using coalescent theory based on an [[Bayesian inference|Bayesian]] [[Markov chain Monte Carlo|MCMC]] framework.
| |
| * [http://www.stats.ox.ac.uk/%7Egriff/software.html genetree] software for estimation of [[population genetics]] parameters using coalescent theory and simulation (the [[R (programming language)|R]] package popgen). See also [http://mathgen.stats.ox.ac.uk/software.html Oxford Mathematical Genetics and Bioinformatics Group]
| |
| * [http://www.sph.umich.edu/csg/liang/genome/ GENOME] - rapid coalescent-based whole-genome simulation{{ref|Liang07}}
| |
| * [http://genfaculty.rutgers.edu/hey/software#IMa2 IMa] - IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models. IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790). IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.
| |
| * [http://popgen.csit.fsu.edu/ Migrate] - [[Maximum likelihood]] and [[Bayesian inference]] of migration rates under the n-coalescent. The inference is implemented using [[Markov chain Monte Carlo|MCMC]]
| |
| * [http://kimura.univ-montp2.fr/~rousset/Migraine.htm Migraine] - A program which implements coalescent algorithms for a maximum likelihood analysis (using [[Importance sampling|Importance Sampling]] algorithms) of genetic data with a focus on spatially structured populations {{ref|Rousset07}}.
| |
| * [http://evolution.gs.washington.edu/lamarc Lamarc] - software for estimation of rates of population growth, migration, and recombination.
| |
| * [http://home.uchicago.edu/~rhudson1/source/mksamples.html MS & MShot] - Richard Hudson's original program for generating samples under neutral models {{ref|Hudson02}} and an extension which allows [[recombination hotspots]]{{ref|Hellenthal06}}.
| |
| * [http://www.mabs.at/ewing/msms/ msms] - An extended version of ms that includes selective sweeps{{ref|Ewing}}.
| |
| * [http://walnut.usc.edu/~magnus/software/ SARG] - Structure Ancestral Recombination Graph by Magnus Nordborg
| |
| * [http://cmpg.unibe.ch/software/simcoal2/ simcoal2] -software to simulate genetic data under the coalescent model with complex demography and recombination
| |
| * [http://darwin.uvigo.es/ Recodon and NetRecodon] -software to simulate coding sequences with inter/intracodon recombination, migration, growth rate and longitudinal sampling {{ref|Arenas07}} {{ref|Arenas10}}.
| |
| * [http://www.coaltree.net/ COAL] - Program for computing gene tree probabilities and simulating gene trees in species trees under the coalescent model {{ref|Degnan05}}.
| |
| * [http://raphael.leblois.free.fr/#softwares IBDSim] - A computer package for the simulation of genotypic data under general isolation by distance models {{ref|Leblois09}}.
| |
| | |
| ==References and notes==
| |
| ===Articles===
| |
| {{refbegin|2}}
| |
| * {{note|Arenas07}}Arenas, M. and Posada, D. (2007) Recodon: Coalescent simulation of coding DNA sequences with recombination, migration and demography. [http://www.biomedcentral.com/1471-2105/8/458 ''BMC Bioinformatics'' '''8''': 458]
| |
| * {{note|Arenas10}}Arenas, M. and Posada, D. (2010) Coalescent simulation of intracodon recombination. [http://www.genetics.org/cgi/content/abstract/184/2/429 ''Genetics'' ''184(2)'': 429–437]
| |
| * {{note|Browning06}}Browning, S.R. (2006) Multilocus association mapping using variable-length markov chains. [http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1474089&blobtype=pdf ''American Journal of Human Genetics'' '''78''':903–913]
| |
| * {{note|Degnan05}} Degnan, JH and LA Salter. 2005. Gene tree distributions under the coalescent process. Evolution 59(1): 24-37. [http://www.coaltree.net/images/DegnanSalter.pdf pdf from coaltree.net/]
| |
| * {{note|Donnelly95}}Donnelly, P., Tavaré, S. (1995) Coalescents and genealogical structure under neutrality. ''Annual Review of Genetics'' '''29''':401–421
| |
| * {{note|Hellenthal06}}Hellenthal, G., Stephens M. (2006) msHOT: modifying Hudson's ms simulator to incorporate crossover and gene conversion hotspots [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl622v1 ''Bioinformatics'' '''AOP''']
| |
| * {{note|Hudson83a}}Hudson RR (1983a) Testing the constant-rate neutral allele model with protein sequence data. ''Evolution'' '''37''': 203–207 [http://www.jstor.org/cgi-bin/jstor/printpage/00143820/di000260/00p01517/0.pdf?backcontext=table-of-contents&dowhat=Acrobat&config=jstor&userID=825fc2c5@uwa.edu.au/01cce4405d00501b61540&0.pdf JSTOR copy]
| |
| * {{note|Hudson83b}}Hudson RR (1983b) Properties of a neutral allele model with intragenic recombination. ''Theoretical Population Biology'' '''23''':183–201.
| |
| * {{note|Hudson91}}Hudson RR (1991) [http://home.uchicago.edu/~rhudson1/popgen356/OxfordSurveysEvolBiol7_1-44.pdf Gene genealogies and the coalescent process.] ''Oxford Surveys in Evolutionary Biology'' '''7''': 1–44
| |
| * {{note|Hudson02}}Hudson RR (2002) Generating samples under a Wright–Fisher neutral model. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/18/2/337 ''Bioinformatics'' '''18''':337–338]
| |
| * {{note|Kendal}}Kendal WS (2003) An exponential dispersion model for the distribution of human single nucleotide polymorphisms. ''Mol Biol Evol'' '''20''': 579–590
| |
| * Hein, J., Schierup, M., Wiuf C. (2004) ''Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory'' Oxford University Press ISBN 978-0-19-852996-5
| |
| * {{note|Haplan88}}Kaplan, N.L., Darden, T., Hudson, R.R. (1988) The coalescent process in models with selection. ''Genetics'' '''120''':819–829
| |
| * {{note|Kingman82}}Kingman, J.F.C. (1982) On the Genealogy of Large Populations. ''Journal of Applied Probability'' '''19A''':27–43 [http://www.jstor.org/cgi-bin/jstor/printpage/00219002/sp050067/05x2498b/0?frame=noframe&userID=825fc2c5@uwa.edu.au/01cce4405d00501b61540&dpi=3&backcontext=table-of-contents&backurl=/cgi-bin/jstor/listjournal/00219002/sp050067%3fframe%3dframe%26dpi%3d3%26userID%3d825fc2c5@uwa.edu.au/01cce4405d00501b61540%26config%3djstor&config=jstor JSTOR copy]
| |
| * {{note|Kingman00}}Kingman, J.F.C. (2000) Origins of the coalescent 1974–1982. [http://www.genetics.org/cgi/content/full/156/4/1461 ''Genetics'' '''156''':1461–1463]
| |
| * {{note|Liang07}}Liang L., Zöllner S., Abecasis G.R. (2007) GENOME: a rapid coalescent-based whole genome simulator. ''Bioinformatics'' [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/12/1565 '''23''': 1565–1567]
| |
| * {{note|Mailund05}}Mailund, T., Schierup, M.H., Pedersen, C.N.S., Mechlenborg, P.J.M., Madsen, J.N., Schauser, L. (2005) CoaSim: A Flexible Environment for Simulating Genetic Data under Coalescent Models [http://www.biomedcentral.com/1471-2105/6/252/abstract ''BMC Bioinformatics'' '''6''':252]
| |
| * {{note|Mohle01}}Möhle, M., Sagitov, S. (2001) A classification of coalescent processes for haploid exchangeable population models ''The Annals of Probability'' '''29''':1547–1562
| |
| * {{note|Morris02}}Morris, A. P., Whittaker, J. C., Balding, D. J. (2002) Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies [http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=384946&blobtype=pdf ''American Journal of Human Genetics'' '''70''':686–707]
| |
| * {{note|Neuhauser97}}Neuhauser, C., Krone, S.M. (1997) The genealogy of samples in models with selection [http://www.genetics.org/cgi/reprint/145/2/519 ''Genetics'' '''145''' 519–534]
| |
| * {{note|Pitman99}}Pitman, J. (1999) Coalescents with multiple collisions ''The Annals of Probability'' '''27''':1870–1902
| |
| *{{note|Rosalind}}Harding, Rosalind, M. 1998. New phylogenies: an introductory look at the coalescent. pp. 15–22, in Harvey, P. H., Brown, A. J. L., Smith, J. M., Nee, S. New uses for new phylogenies. Oxford University Press (ISBN 0198549849)
| |
| * {{note|Rosenberg}}Rosenberg, N.A., Nordborg, M. (2002) Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms. ''Nature Reviews Genetics'' '''3''':380–390
| |
| * {{note|Sagitov99}}Sagitov, S. (1999) The general coalescent with asynchronous mergers of ancestral lines ''Journal of Applied Probability'' '''36''':1116–1125
| |
| * {{note|Schweinsberg00}}Schweinsberg, J. (2000) Coalescents with simultaneous multiple collisions ''Electronic Journal of Probability'' '''5''':1–50
| |
| * {{note|Slatkin01}}Slatkin, M. (2001) Simulating genealogies of selected alleles in populations of variable size ''Genetic Research'' '''145''':519–534
| |
| * {{note|Tajima83}}Tajima, F. (1983) Evolutionary Relationship of DNA Sequences in finite populations. ''Genetics'' '''105''':437–460
| |
| * {{note|Tavare}}Tavare S, Balding DJ, Griffiths RC & Donnelly P. 1997. Inferring coalescent times from DNA sequence data. ''Genetics'' '''145''': 505–518.
| |
| * {{note|The international SNP map working group}}The international SNP map working group. 2001. A map of human genome variation containing 1.42 million single nucleotide polymorphisms. ''Nature'' '''409''': 928–933.
| |
| * {{note|Zöllner}}Zöllner S. and [[Jonathan K. Pritchard|Pritchard J.K.]] (2005) Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci [http://pritch.bsd.uchicago.edu/publications/ZollnerAndPritchard05.pdf ''Genetics'' '''169''':1071–1092]
| |
| * {{note|Rousset07}}Rousset F. and Leblois R. (2007) Likelihood and Approximate Likelihood Analyses of Genetic Structure in a Linear Habitat: Performance and Robustness to Model Mis-Specification [http://raphael.leblois.free.fr/Papiers/RoussetLeblois2007MBE.pdf ''Molecular Biology and Evolution'' '''24''':2730–2745]
| |
| * {{note|Leblois09}}Leblois R., Estoup A. and Rousset F. (2009) IBDSim: a computer program to simulate genotypic data under isolation by distance [http://raphael.leblois.free.fr/Papiers/LebloisEtAl.2009MolEcolRess_IBDSim.pdf ''Molecular Ecology Resources'' '''9''':107-109]
| |
| * {{note|Ewing}}Ewing, G. and Hermisson J. (2010), MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus, [http://bioinformatics.oxfordjournals.org/content/26/16/2064.full ''Bioinformatics'' '''26''':15]{{refend}}
| |
| | |
| ===Books===
| |
| {{refbegin}}
| |
| * Hein, J; Schierup, M. H., and Wiuf, C. ''Gene Genealogies, Variation and Evolution – A Primer in Coalescent Theory''. [[Oxford University Press]], 2005. ISBN 0-19-852996-1.
| |
| * Nordborg, M. (2001) [http://walnut.usc.edu/papers/references/copy_of_2002/wiley.pdf/view Introduction to Coalescent Theory]
| |
| * Chapter 7 in Balding, D., Bishop, M., Cannings, C., editors, ''Handbook of Statistical Genetics''. Wiley ISBN 978-0-471-86094-5
| |
| * Wakeley J. (2006) ''An Introduction to Coalescent Theory'' Roberts & Co ISBN 0-9747077-5-9 [http://www.roberts-publishers.com/wakeley/ Accompanying website with sample chapters]
| |
| *{{note|Rice}} Rice SH. (2004). ''Evolutionary Theory: Mathematical and Conceptual Foundations''. Sinauer Associates: Sunderland, MA. See esp. ch. 3 for detailed derivations.
| |
| * Berestycki N. "Recent progress in coalescent theory" 2009 ENSAIOS Matematicos vol.16
| |
| * Bertoin J. "Random Fragmentation and Coagulation Processes"., 2006. Cambridge Studies in Advanced Mathematics, 102. [[Cambridge University Press]], Cambridge, 2006. ISBN 978-0-521-86728-3;
| |
| * Pitman J. "Combinatorial stochastic processes" Springer (2003)
| |
| {{refend}}
| |
| | |
| ==External links==
| |
| * [http://www.pandasthumb.org/archives/2004/05/evomath_3_genet.html EvoMath 3: Genetic Drift and Coalescence, Briefly] — overview, with probability equations for genetic drift, and simulation graphs
| |
| | |
| {{Population genetics}}
| |
| | |
| [[Category:Population genetics]]
| |
| [[Category:Statistical genetics]]
| |