Here is a copy of the program I wrote that did
the estimation of Nm in the two papers I wrote with Wayne Maddision.  The program
is not what I would call user friendly but it does run in its present form in a standard
Unix system using the pc compiler.  Unfortunately, many of the newer Unix systems
including the Sun Sparcstation and the NeXT do not come with Pascal compilers.
I myself have not used it for four or five years.

The program can be converted to run on other computers with other operating systems.
My understanding is that people at Cornell (Richard Harrison's lab) and Washington
University (Alan Templeton's lab) have it running on IBM compatibles but I do not
know the details.  The only changes I think you would have to make to get it running
on a non-Unix system would be to write or copy a new random number generator to
replace the one I have used.  I used a Unix utility (random) that generates a uniformly
distributed integer between 1 and maxint (which is predefined in a Unix system).  The
other external function (srandom) sets the initial random number seed.  You need a
function, "unif" that generates a random number that is uniformly distributed between 0
and 1.

The program takes its input from standard input (i.e. the screen) and writes the results
to standard output.  Here is what it should should do.

--  It will tell you that the maximum number of genes in the sample is 256.  That is the
total number of individuals sampled, not the number of distinct haplotypes.  You can
increase that number by changing allelelimit.  It will then ask for the number of
geographic locations sampled and the number of genes (i.e. the number of individuals)
sampled from each.

--  It will then ask for the value of s that you have computed from your tree.  This
program does not compute s for you.  It is easy enough to do by hand and can be done
fairly easily using MacClade, if you have a copy of that.

-- The program will ask for the number of replicates, on which each value of Nm is
computed.  Make that number small at first, say 10.  You can increase it later once you
have an idea of what range of Nm values you need.

-- The program will then ask you if you want confidence limits.  Say no at first because
you will have a chance to get them later.

-- It will then ask you for low and high estimates of Nm.  If you have no idea, make the
interval large.  If you guess wrong, the program will tell you and try to extend the range.

-- The program proceeds to do a binary search on the estimate of Nm starting from the
low and high values you give it.  That is, it divides the interval in half and then
determines which half of the interval the estimate is in.  It moves the high or low value
and does it again.  Because each estimate is done by simulation, it cannot be any more
precise than the number of replicates in the simulation permits.  While the program is
running, it prints out values of slow or shigh that correspond to the boundaries of
reduced interval that is being searched.  That is mainly to let you know the program is
doing something.

-- The program will stop when the difference between the high and low estimates of
Nm differ by less than 0.1.

-- The program will then ask you if you want to try again.  I suggest increasing the
number of replicates to 100 before you get confidence limits and then use that number
again to get confidence limits.  That will give you some idea of the variability in
estimates for your parameter values.

This program is extremely slow and I would not suggest running it on a machine for
which you have to pay by the minute.  The running time depends strongly on the data.
Higher estimates of Nm, which correspond to higher s values, make it run much
slower.

One problem that has come up with using this method for estimating Nm is what to do
with a haplotype that is found in more than two geographic locations.  In the 1989
paper, we discussed the case with two locations only.  You have to increase s by one
for each additional location greater than two.  For example, a haplotype found in four
locations would increase s by three.

Please feel free to write with questions.

Montgomery Slatkin
September 26, 1995

__________________________

program estimator(input,output);
  const
    dlim = 64; {Cannot use sets with more than 64 demes}
    dlim2 = 128;  {= 2 * dlim}
    allelelimit = 256;  {The maximum number of genes in sample.}
    caselimit = 20;

  type
    locationtype = 1..dlim;
    locationset = set of locationtype;
    allelerecord = record
		    state : locationset;
		   end;
    allelevector = array[1..allelelimit] of allelerecord;
    demevector = array[1..dlim] of real;
    demetype = record
	    k : integer;
	    allele : allelevector;
	    coal, mig : real
	  end;
    poptype = array[1..dlim] of demetype;

  var
    Nm, span, slower, supper, Nmlower, Nmupper, sdev, sreal, tN : real;
    precision, Nmlow, Nmhigh, Nmmid, slow, shigh, smid : real;
    nmax, Ne, nprev, numfrom : integer;
    n : array[1..dlim] of integer;
    s0, caseno, maxcase, tsplit, maxrep, dmax, dmax2 : integer;
    h : array[0..allelelimit] of integer;
    initseed : integer;
    pop, temppop : poptype;
    conlim, testchar : char;

  function random : integer ; external;

  procedure srandom (i : integer); external;

  function unif : real;
    begin
      unif := random / maxint
    end;

  function randint(max : integer) : integer;
  {generates a random integer between 1 and max}
    var
      j : integer;
    begin
      j := trunc(unif * max) + 1;
      if j <= max then randint := j
        else if j > max then randint := max
        else randint := 0
    end; {randint}

  function poisson(param : real) : integer;
    var
      i : integer;
      x : real;
    begin
      i := -1;
      x := 1.0;
      repeat
        i := i+1;
        x := x * unif
      until x < param;
      poisson := i
    end; {poisson}

  function expdist(a : real) : real;
  {This generates an exponentially distributed random variable with parameter a}
    begin
      expdist := -ln(unif) / a
    end; {expdist}

  procedure readvalues;
    const
      Nefactor = 1000;
    var
      d, spanmax : integer;

    procedure panmictic;
    {This simulates the coalescent events in a panmictic populattions with these
     sample sizes.
     Not implemented yet.}
      begin
	span := 1000.0;
	spanmax := 1000
      end; {panmictic}

    begin {readvalues}
      writeln;
      writeln('The maximum total number of genes allowed in sample is ', allelelimit : 1);
      writeln;
      write('Number of locations sampled? '); readln(numfrom);
      if numfrom < 2 then
	begin
	  writeln('Number of locations must be 2 or larger. Try again.');
	  halt
	end;
      nmax := 0;
      for d := 1 to numfrom do
	begin
	  write('n[', d : 1,'] = ');
	  readln(n[d]);
	  if n[d] < 0 then
	    begin
	      writeln('Only positive sample sizes are allowed. Try again.');
	      halt
	    end;
	  nmax := nmax + n[d]
	end;
      if nmax > allelelimit then
	begin
	  writeln('There are too many alleles, ', nmax : 1, ', in sample.');
	  writeln('You may try to increase the limit by increasing ''allelelimit'' inthe program and recompiling');
	  halt
	end;
      writeln;
      write('Minimum number of migration events, s = ');
      readln(sreal);
      s0 := round(sreal);
      panmictic;
      if s0 < numfrom - 1 then
	begin
	  writeln('Value of s is smaller than the number of locations sampled.  Try again.');
	  halt
	end
      else if s0 = numfrom - 1 then
	begin
	  writeln('You have concordance between the phylogeny and the geographic loocations of the samples.');
	  writeln('See Slatkin, 1989, Genetics 121: 609-612 to estimate the upper bound on Nm for this case.');
	  halt
	end
      else if (sreal >= span) and (s0 <= spanmax) then
	begin
	  writeln('Your value of s, ', s0 : 1,' is larger than the average s for a panmictic population with these sample sizes, ', span : 4 : 2);
	  writeln('No estimate of Nm is possible because the data appear to have been drawn from a panmictic population.');
	  halt
	end
      else if s0 > spanmax then
	begin
	  writeln('Your value of s, ', s0: 1,' is larger than the maximum possiblen with these sample sizes, ', spanmax : 1);
	  writeln('Check your calculations of s and try again.');
	  halt
	end;
      tN := 20.0; {This may be changed}
      dmax := 5 * numfrom;
      if dmax > dlim then dmax := dlim;
      dmax2 := 2 * dmax;
      Ne := Nefactor * sqr(nmax);
      tsplit := round (tN * Ne);
      for d := numfrom + 1 to dmax do n[d] := 0;
    end; {readvalues}

  function sbar(Nm : real) : real;
    var
      m, ave, m2 : real;

  procedure hfill;
  {This is the procedure that fills a histogram, h[s], that contains the
   distribution of outcomes of the replicate simulaions.
   sbar and sdev are calculated from h[s].}
    var
      repno, t, a, d, s, kt, asum : integer;
      netprob, y : real;
      c : array[0..dlim2] of real; {c[0] must be set to 0.0}
      merged : demetype;
     procedure coalesce(var p : demetype);
      var
	a, a1, a2 : integer;
      begin
	kt := kt - 1;
	with p do
	  begin
	    if k > 2 then
	      begin
		a1 := randint(k);
		repeat a2 := randint(k) until a1 <> a2;
	      end
	    else if k = 2 then
	      begin
		a1 := 1;
		a2 := 2
	      end
	    else
	      begin
		writeln(***A coalescence occurred when it should not have***);
		halt
	      end;
	    if a1 > a2 then
	      begin {swap them}
		a := a1;
		a1 := a2;
		a2 := a
	      end;
	    if allele[a1].state * allele[a2].state <> [] then
	      allele[a1].state := allele[a1].state * allele[a2].state
	      else
		begin
		  allele[a1].state := allele[a1].state + allele[a2].state;
		  s := s + 1
		end;
	    for a := a2 + 1 to k do allele[a-1] := allele[a];
	  end {with p}
      end; {coalesce}

    procedure emigrate(d : integer);
    {This takes a random allele from pop[d] and inserts it into a random pop}
      var
	a, i, demeto : integer;
      begin
	i := randint(pop[d].k);  {This is the allele that moves}
	if dmax = 2 then demeto := 3 - d
	  else repeat demeto := randint(dmax) until demeto <> d;
	with pop[demeto] do
	  begin
	    k := k + 1;
	    allele[k] := pop[d].allele[i];
	    coal := k * (k - 1) / (2 * Ne);
	    mig := m * k
	  end;
	with pop[d] do
	  begin
	    k := k - 1;
	    for a := i to k do allele[a] := allele[a+1];
	    coal := k * (k - 1) / (2 * Ne);
	    mig := m * k
	  end
      end; {emigrate}
     procedure setc;
    {pop[i].coal contains the coalescent probabilities in deme i,
     pop[i].mig contains the probabilities of emigration from deme i,
     c[i] contains the relative cumulative probabilities used
     for deciding which event happens when something happens.}
      var
	d : integer;
      begin
	netprob := 0.0;
	for d := 1 to dmax do
	  with pop[d] do netprob := netprob + coal + mig;
	for d := 1 to dmax do
	  c[d] := c[d-1] + pop[d].coal / netprob;  {c[0] has been set to 0.0}
	for d := dmax + 1 to 2 * dmax do
	  c[d] := c[d-1] + pop[d-dmax].mig / netprob
      end; {setc}

    procedure initializereplicate;
      var
	a, d : integer;

      procedure setprobs;
	var
	  d : integer;
	begin
	  for d := 1 to dmax do
	  with pop[d] do
	    begin
	      coal := k * (k - 1) / (2 * Ne);
	      mig := k * m;
	    end
	end; {setprobs}
	
      begin {initializereplicate}
	kt := nmax;
        for d := 1 to dmax do
        with pop[d] do
	  begin
	    k := n[d];
	    for a := 1 to k do allele[a].state := [d];
	  end; {with pop}
	setprobs;
	setc
      end; {initializereplicate}
       procedure finishreplicate;
      begin {finishreplicate}
	h[s] := h[s] + 1;
      end; {finishreplicate}

    begin {hfill}
      c[0] := 0.0; {This is needed later}
      for a := 0 to allelelimit do h[a] := 0;
      for repno := 1 to maxrep do
	begin
	  initializereplicate;
	  t := 0;
	  s := 0;
	  if netprob > 0.0 then
	  repeat
	    t := t + round(expdist(netprob)) + 1; {when the next event occurs}
	    if t <= tsplit then
	      begin
	        y := unif;
	        d := 1;
	        while y >= c[d] do d := d + 1;
		if d > dmax2 then d := dmax2;  {unif can generate a number > 1.0}
	        if d <= dmax then
		  begin
		    coalesce(pop[d]);
		    with pop[d] do
		      begin
	    	        k := k - 1;
	    	        coal := k * (k - 1) / (2 * Ne);
	    	        mig := k * m
		      end {with pop[d]}
		  end {d <= dmax}
	          else emigrate(d-dmax);
		setc
	       end {t <= tsplit}
	  until (t >= tsplit) or (kt = 1) or (netprob = 0.0);
	  if kt = 1 then {only one allele is left}
	    finishreplicate
	  else {merge the demes and continue}
	    begin
	      with merged do
		begin
		  k := kt;
		  asum := 0;
		  for d := 1 to dmax do
		  for a := 1 to pop[d].k do
		    begin
		      asum := asum + 1;
		      allele[asum] := pop[d].allele[a]
		    end
		end; {with merged}
	      for t := merged.k downto 2 do
		begin
		 coalesce(merged); {t is not time}
		 with merged do k := k - 1
		end;
	      finishreplicate
	    end {else}
	end {for repno}
    end; {hfill}
     procedure stats;
      var
        s : integer;
	p : array[0..allelelimit] of real;
      begin
        ave := 0.0;
        m2 := 0.0;
	for s := 0 to allelelimit do p[s] := h[s] / maxrep;
        for s := 0 to allelelimit do
          begin
            ave := ave + s * p[s];
            m2 := m2 + sqr(s) * p[s]
          end;
        m2 := m2 - sqr(ave)
      end; {stats}

    begin {sbar}
      m := Nm / Ne;
      hfill;
      stats;
      sdev := sqrt(m2);
      sbar := ave
    end; {sbar}

  function estimate(s : real) : real;
  {this does the binary search on the value of Nm given the sample sizes and s}
    begin
      slow := sbar(Nmlow);
      if slow > s then
        repeat
          writeln('The estimate of Nm is less then your minimum estimate.');
          writeln('sbar(Nmlow) = ', slow : 4 : 2,' > ', s : 3 : 1);
	  Nmlow := Nmlow / 2.0;
          writeln('New minimum estimate of Nm is ', Nmlow : 4 : 2);
	  slow := sbar(Nmlow)
        until slow < s;
writeln('slow = ', slow : 4 : 2);
      shigh := sbar(Nmhigh);
      if shigh < s then
        repeat
          writeln('The estimate of Nm is greater then your maximum estimate.');
          writeln('sbar(Nmhigh) = ', shigh : 4 : 2,' < ', s : 3  : 1);
	  Nmhigh := Nmhigh * 2.0;
          writeln('New maximum estimate of Nmhigh is ', Nmhigh : 4 : 2);
	  shigh := sbar(Nmhigh)
        until shigh > s;
writeln('shigh = ', shigh : 4 : 2);
      precision := 0.1;  {this may be adjustable}
      repeat
        Nmmid := (Nmhigh + Nmlow) / 2.0;
        smid := sbar(Nmmid);
      writeln('Nmmid = ', Nmmid : 4 : 2, ', smid = ', smid : 4 : 2);
        if smid < s then
	  begin
	    Nmlow := Nmmid;
	    slow := smid
	  end
        else
	  begin
	    Nmhigh := Nmmid;
	    shigh := smid
	  end
      until (Nmhigh - Nmlow) < precision;
      estimate := Nmmid
    end; {estimate}

  begin {main}
    readvalues;
    repeat
      writeln;
      write('Maximum number of replicates in simulation? '); readln(maxrep);
      writeln;
      write('Do you want confidence limits on Nm? '); readln(conlim);
      writeln;
      write('Minimum estimate of Nm? '); readln(Nmlow);
      write('Maximum estimate of Nm? '); readln(Nmhigh);
      Nm := estimate(sreal);
      writeln;
      writeln('Estimate of Nm is ', Nmmid : 3 : 1);
      if (conlim = 'y') or (conlim = 'Y') then
	begin
	  slower := sreal - 2 * sdev;
	  supper := sreal + 2 * sdev;
	  if slower > (numfrom - 1.0) then
	    begin
	      Nmlow := Nm / 16.0;
	      Nmhigh := Nm;
	      Nmlower := estimate(slower)
	    end
	    else Nmlower := 0.0;
	  if supper < span then
	    begin
	      Nmlow := Nm;
	      Nmhigh := 16.0 * Nm;
	      Nmupper := estimate(supper)
	    end
	    else Nmupper := -1.0;
          writeln;
	  if Nmupper > 0.0 then
	    writeln('There is 95% confidence that Nm lies between ', Nmlower : 3 : 1,
		    ' and ', Nmupper : 3 : 1)
	    else writeln('There is 95% confidence that Nm lies between ', Nmlower : 3 : 1,
		    ' and infinity')
	end; {if conlim}
      writeln;
      write('Do you want to change the number of replicates and try again? ');
      readln(testchar);
      if testchar = 'y' then
	begin
	  write('Do you want confidence limits this time? ');
	  readln(conlim)
	end
    until (testchar <> 'y')
  end.