Here is a copy of the program I wrote that did the estimation of Nm in the two papers I wrote with Wayne Maddision. The program is not what I would call user friendly but it does run in its present form in a standard Unix system using the pc compiler. Unfortunately, many of the newer Unix systems including the Sun Sparcstation and the NeXT do not come with Pascal compilers. I myself have not used it for four or five years. The program can be converted to run on other computers with other operating systems. My understanding is that people at Cornell (Richard Harrison's lab) and Washington University (Alan Templeton's lab) have it running on IBM compatibles but I do not know the details. The only changes I think you would have to make to get it running on a non-Unix system would be to write or copy a new random number generator to replace the one I have used. I used a Unix utility (random) that generates a uniformly distributed integer between 1 and maxint (which is predefined in a Unix system). The other external function (srandom) sets the initial random number seed. You need a function, "unif" that generates a random number that is uniformly distributed between 0 and 1. The program takes its input from standard input (i.e. the screen) and writes the results to standard output. Here is what it should should do. -- It will tell you that the maximum number of genes in the sample is 256. That is the total number of individuals sampled, not the number of distinct haplotypes. You can increase that number by changing allelelimit. It will then ask for the number of geographic locations sampled and the number of genes (i.e. the number of individuals) sampled from each. -- It will then ask for the value of s that you have computed from your tree. This program does not compute s for you. It is easy enough to do by hand and can be done fairly easily using MacClade, if you have a copy of that. -- The program will ask for the number of replicates, on which each value of Nm is computed. Make that number small at first, say 10. You can increase it later once you have an idea of what range of Nm values you need. -- The program will then ask you if you want confidence limits. Say no at first because you will have a chance to get them later. -- It will then ask you for low and high estimates of Nm. If you have no idea, make the interval large. If you guess wrong, the program will tell you and try to extend the range. -- The program proceeds to do a binary search on the estimate of Nm starting from the low and high values you give it. That is, it divides the interval in half and then determines which half of the interval the estimate is in. It moves the high or low value and does it again. Because each estimate is done by simulation, it cannot be any more precise than the number of replicates in the simulation permits. While the program is running, it prints out values of slow or shigh that correspond to the boundaries of reduced interval that is being searched. That is mainly to let you know the program is doing something. -- The program will stop when the difference between the high and low estimates of Nm differ by less than 0.1. -- The program will then ask you if you want to try again. I suggest increasing the number of replicates to 100 before you get confidence limits and then use that number again to get confidence limits. That will give you some idea of the variability in estimates for your parameter values. This program is extremely slow and I would not suggest running it on a machine for which you have to pay by the minute. The running time depends strongly on the data. Higher estimates of Nm, which correspond to higher s values, make it run much slower. One problem that has come up with using this method for estimating Nm is what to do with a haplotype that is found in more than two geographic locations. In the 1989 paper, we discussed the case with two locations only. You have to increase s by one for each additional location greater than two. For example, a haplotype found in four locations would increase s by three. Please feel free to write with questions. Montgomery Slatkin September 26, 1995 __________________________ program estimator(input,output); const dlim = 64; {Cannot use sets with more than 64 demes} dlim2 = 128; {= 2 * dlim} allelelimit = 256; {The maximum number of genes in sample.} caselimit = 20; type locationtype = 1..dlim; locationset = set of locationtype; allelerecord = record state : locationset; end; allelevector = array[1..allelelimit] of allelerecord; demevector = array[1..dlim] of real; demetype = record k : integer; allele : allelevector; coal, mig : real end; poptype = array[1..dlim] of demetype; var Nm, span, slower, supper, Nmlower, Nmupper, sdev, sreal, tN : real; precision, Nmlow, Nmhigh, Nmmid, slow, shigh, smid : real; nmax, Ne, nprev, numfrom : integer; n : array[1..dlim] of integer; s0, caseno, maxcase, tsplit, maxrep, dmax, dmax2 : integer; h : array[0..allelelimit] of integer; initseed : integer; pop, temppop : poptype; conlim, testchar : char; function random : integer ; external; procedure srandom (i : integer); external; function unif : real; begin unif := random / maxint end; function randint(max : integer) : integer; {generates a random integer between 1 and max} var j : integer; begin j := trunc(unif * max) + 1; if j <= max then randint := j else if j > max then randint := max else randint := 0 end; {randint} function poisson(param : real) : integer; var i : integer; x : real; begin i := -1; x := 1.0; repeat i := i+1; x := x * unif until x < param; poisson := i end; {poisson} function expdist(a : real) : real; {This generates an exponentially distributed random variable with parameter a} begin expdist := -ln(unif) / a end; {expdist} procedure readvalues; const Nefactor = 1000; var d, spanmax : integer; procedure panmictic; {This simulates the coalescent events in a panmictic populattions with these sample sizes. Not implemented yet.} begin span := 1000.0; spanmax := 1000 end; {panmictic} begin {readvalues} writeln; writeln('The maximum total number of genes allowed in sample is ', allelelimit : 1); writeln; write('Number of locations sampled? '); readln(numfrom); if numfrom < 2 then begin writeln('Number of locations must be 2 or larger. Try again.'); halt end; nmax := 0; for d := 1 to numfrom do begin write('n[', d : 1,'] = '); readln(n[d]); if n[d] < 0 then begin writeln('Only positive sample sizes are allowed. Try again.'); halt end; nmax := nmax + n[d] end; if nmax > allelelimit then begin writeln('There are too many alleles, ', nmax : 1, ', in sample.'); writeln('You may try to increase the limit by increasing ''allelelimit'' inthe program and recompiling'); halt end; writeln; write('Minimum number of migration events, s = '); readln(sreal); s0 := round(sreal); panmictic; if s0 < numfrom - 1 then begin writeln('Value of s is smaller than the number of locations sampled. Try again.'); halt end else if s0 = numfrom - 1 then begin writeln('You have concordance between the phylogeny and the geographic loocations of the samples.'); writeln('See Slatkin, 1989, Genetics 121: 609-612 to estimate the upper bound on Nm for this case.'); halt end else if (sreal >= span) and (s0 <= spanmax) then begin writeln('Your value of s, ', s0 : 1,' is larger than the average s for a panmictic population with these sample sizes, ', span : 4 : 2); writeln('No estimate of Nm is possible because the data appear to have been drawn from a panmictic population.'); halt end else if s0 > spanmax then begin writeln('Your value of s, ', s0: 1,' is larger than the maximum possiblen with these sample sizes, ', spanmax : 1); writeln('Check your calculations of s and try again.'); halt end; tN := 20.0; {This may be changed} dmax := 5 * numfrom; if dmax > dlim then dmax := dlim; dmax2 := 2 * dmax; Ne := Nefactor * sqr(nmax); tsplit := round (tN * Ne); for d := numfrom + 1 to dmax do n[d] := 0; end; {readvalues} function sbar(Nm : real) : real; var m, ave, m2 : real; procedure hfill; {This is the procedure that fills a histogram, h[s], that contains the distribution of outcomes of the replicate simulaions. sbar and sdev are calculated from h[s].} var repno, t, a, d, s, kt, asum : integer; netprob, y : real; c : array[0..dlim2] of real; {c[0] must be set to 0.0} merged : demetype; procedure coalesce(var p : demetype); var a, a1, a2 : integer; begin kt := kt - 1; with p do begin if k > 2 then begin a1 := randint(k); repeat a2 := randint(k) until a1 <> a2; end else if k = 2 then begin a1 := 1; a2 := 2 end else begin writeln(***A coalescence occurred when it should not have***); halt end; if a1 > a2 then begin {swap them} a := a1; a1 := a2; a2 := a end; if allele[a1].state * allele[a2].state <> [] then allele[a1].state := allele[a1].state * allele[a2].state else begin allele[a1].state := allele[a1].state + allele[a2].state; s := s + 1 end; for a := a2 + 1 to k do allele[a-1] := allele[a]; end {with p} end; {coalesce} procedure emigrate(d : integer); {This takes a random allele from pop[d] and inserts it into a random pop} var a, i, demeto : integer; begin i := randint(pop[d].k); {This is the allele that moves} if dmax = 2 then demeto := 3 - d else repeat demeto := randint(dmax) until demeto <> d; with pop[demeto] do begin k := k + 1; allele[k] := pop[d].allele[i]; coal := k * (k - 1) / (2 * Ne); mig := m * k end; with pop[d] do begin k := k - 1; for a := i to k do allele[a] := allele[a+1]; coal := k * (k - 1) / (2 * Ne); mig := m * k end end; {emigrate} procedure setc; {pop[i].coal contains the coalescent probabilities in deme i, pop[i].mig contains the probabilities of emigration from deme i, c[i] contains the relative cumulative probabilities used for deciding which event happens when something happens.} var d : integer; begin netprob := 0.0; for d := 1 to dmax do with pop[d] do netprob := netprob + coal + mig; for d := 1 to dmax do c[d] := c[d-1] + pop[d].coal / netprob; {c[0] has been set to 0.0} for d := dmax + 1 to 2 * dmax do c[d] := c[d-1] + pop[d-dmax].mig / netprob end; {setc} procedure initializereplicate; var a, d : integer; procedure setprobs; var d : integer; begin for d := 1 to dmax do with pop[d] do begin coal := k * (k - 1) / (2 * Ne); mig := k * m; end end; {setprobs} begin {initializereplicate} kt := nmax; for d := 1 to dmax do with pop[d] do begin k := n[d]; for a := 1 to k do allele[a].state := [d]; end; {with pop} setprobs; setc end; {initializereplicate} procedure finishreplicate; begin {finishreplicate} h[s] := h[s] + 1; end; {finishreplicate} begin {hfill} c[0] := 0.0; {This is needed later} for a := 0 to allelelimit do h[a] := 0; for repno := 1 to maxrep do begin initializereplicate; t := 0; s := 0; if netprob > 0.0 then repeat t := t + round(expdist(netprob)) + 1; {when the next event occurs} if t <= tsplit then begin y := unif; d := 1; while y >= c[d] do d := d + 1; if d > dmax2 then d := dmax2; {unif can generate a number > 1.0} if d <= dmax then begin coalesce(pop[d]); with pop[d] do begin k := k - 1; coal := k * (k - 1) / (2 * Ne); mig := k * m end {with pop[d]} end {d <= dmax} else emigrate(d-dmax); setc end {t <= tsplit} until (t >= tsplit) or (kt = 1) or (netprob = 0.0); if kt = 1 then {only one allele is left} finishreplicate else {merge the demes and continue} begin with merged do begin k := kt; asum := 0; for d := 1 to dmax do for a := 1 to pop[d].k do begin asum := asum + 1; allele[asum] := pop[d].allele[a] end end; {with merged} for t := merged.k downto 2 do begin coalesce(merged); {t is not time} with merged do k := k - 1 end; finishreplicate end {else} end {for repno} end; {hfill} procedure stats; var s : integer; p : array[0..allelelimit] of real; begin ave := 0.0; m2 := 0.0; for s := 0 to allelelimit do p[s] := h[s] / maxrep; for s := 0 to allelelimit do begin ave := ave + s * p[s]; m2 := m2 + sqr(s) * p[s] end; m2 := m2 - sqr(ave) end; {stats} begin {sbar} m := Nm / Ne; hfill; stats; sdev := sqrt(m2); sbar := ave end; {sbar} function estimate(s : real) : real; {this does the binary search on the value of Nm given the sample sizes and s} begin slow := sbar(Nmlow); if slow > s then repeat writeln('The estimate of Nm is less then your minimum estimate.'); writeln('sbar(Nmlow) = ', slow : 4 : 2,' > ', s : 3 : 1); Nmlow := Nmlow / 2.0; writeln('New minimum estimate of Nm is ', Nmlow : 4 : 2); slow := sbar(Nmlow) until slow < s; writeln('slow = ', slow : 4 : 2); shigh := sbar(Nmhigh); if shigh < s then repeat writeln('The estimate of Nm is greater then your maximum estimate.'); writeln('sbar(Nmhigh) = ', shigh : 4 : 2,' < ', s : 3 : 1); Nmhigh := Nmhigh * 2.0; writeln('New maximum estimate of Nmhigh is ', Nmhigh : 4 : 2); shigh := sbar(Nmhigh) until shigh > s; writeln('shigh = ', shigh : 4 : 2); precision := 0.1; {this may be adjustable} repeat Nmmid := (Nmhigh + Nmlow) / 2.0; smid := sbar(Nmmid); writeln('Nmmid = ', Nmmid : 4 : 2, ', smid = ', smid : 4 : 2); if smid < s then begin Nmlow := Nmmid; slow := smid end else begin Nmhigh := Nmmid; shigh := smid end until (Nmhigh - Nmlow) < precision; estimate := Nmmid end; {estimate} begin {main} readvalues; repeat writeln; write('Maximum number of replicates in simulation? '); readln(maxrep); writeln; write('Do you want confidence limits on Nm? '); readln(conlim); writeln; write('Minimum estimate of Nm? '); readln(Nmlow); write('Maximum estimate of Nm? '); readln(Nmhigh); Nm := estimate(sreal); writeln; writeln('Estimate of Nm is ', Nmmid : 3 : 1); if (conlim = 'y') or (conlim = 'Y') then begin slower := sreal - 2 * sdev; supper := sreal + 2 * sdev; if slower > (numfrom - 1.0) then begin Nmlow := Nm / 16.0; Nmhigh := Nm; Nmlower := estimate(slower) end else Nmlower := 0.0; if supper < span then begin Nmlow := Nm; Nmhigh := 16.0 * Nm; Nmupper := estimate(supper) end else Nmupper := -1.0; writeln; if Nmupper > 0.0 then writeln('There is 95% confidence that Nm lies between ', Nmlower : 3 : 1, ' and ', Nmupper : 3 : 1) else writeln('There is 95% confidence that Nm lies between ', Nmlower : 3 : 1, ' and infinity') end; {if conlim} writeln; write('Do you want to change the number of replicates and try again? '); readln(testchar); if testchar = 'y' then begin write('Do you want confidence limits this time? '); readln(conlim) end until (testchar <> 'y') end.