Downloading MDBlocks:

We distribute the source code for the program MDBlocks as a gzipped tar file. This includes the source code for ranlib and it includes a makefile.

 

Data Format For MDBlocks:

To use MDBlocks, you must have phased haplotype data. The format is simple. The file must begin with information on the number of haplotypes (Seqs) and the number of Markers in each. The first two lines must be:

Seqs: XXX
Markers: YYY

Where XXX is the number of haplotypes in the dataset and YYY is the number of SNPs at each data set. Notice that the format here is quite strict---you have to have each of these lines in the proper order , there must be a colon immediately following both "Seqs" and "Markers", and there must be a space between the colon and the number (XXX or YYY above).

After that, each line is a haplotype. The first entry of each row must be a string (with no white space in it). This could be a name given to each of the haplotypes, for example. The remaining columns are white-space delimited (tabs or spaces) indicators of the SNP alleles carried at each of the loci. SNP alleles must be denoted either as 0's or 1's. "-1" is used to denote missing data. "-2" is used to denote "ambiguous heterozygosity." This case arises when phasing is done using trios, and the two parents and child in the trio are all heterozygous at the locus. With regard to the "-2"s, the haplotypes are considered to be paired. Hence, if a "-2" is used in haplotype n, where n is an odd number then there must be a "-2" at the same locus at haplotype (n+1). A small example data set would look like:

Seqs: 10
Markers: 15
Ex1	1	1	1	0	0	-2	0	1	0	1	1	0	0	0	0	
Ex2	1	-1	1	0	0	-2	1	0	0	0	0	1	1	1	0	
Ex3	0	0	0	0	1	1	0	1	0	1	1	0	0	0	0	
Ex4	0	0	0	0	1	0	1	0	1	0	0	1	1	0	0	
Ex5	0	0	0	1	1	0	0	0	0	0	0	1	1	0	0	
Ex6	0	0	0	0	1	0	1	0	0	0	0	1	1	0	-1	
Ex7	0	0	0	0	1	0	0	0	0	-1	0	1	1	0	0	
Ex8	0	0	0	0	1	0	1	0	0	0	0	1	1	0	0	
Ex9	0	0	-2	0	1	0	0	0	0	0	0	1	1	0	1	
E10	1	1	-2	0	0	0	0	0	0	0	0	1	1	0	1

 

Command Line Options:

MDBLocks is invoked by the command "MDBlocks". It takes a single required argument, which is the data file name. The file name may be followed by two different options, "-g" and "-s". The "-g" option makes the program use the IADP (the iterated approximate dynamic programming algorithm) which is typically much faster than the default IDP (iterated dynamic programming algorithm) which is used when the "-g" option is not invoked. The "-s" option allows you to specify two random number seeds to seed the random number generator for dealing with missing data. The two numbers must be positive integers and must follow the "-s". Three example command lines are:

MDBlocks myfile.txt
MDBlocks myfile.txt -g
MDBlocks myfile.txt -g -s 23786 98733

 

Program Standard Output:

MDBLocks directs output both to standard output and to various output files. The program output to standard output from running the above data file with the command "MDBlocks myfile.txt -g" looks like:

Program MDBlocks Version 1.0
Released 8 MAY 03
written by Eric C. Anderson (dr_eriq@uclink.berkeley.edu)
       and John A. Novembre (novembre@socrates.berkeley.edu)
Copyright (c) by The Regents of the University of California
Please see user documentation for full software agreement.

Seeds: 1269384 6471
Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8 Ex9 E10 

Filling 4 Unresolved Heterozygosity Holes:


Filling 3 Missing Data Holes:

Computing Matrices, a = 0
Computing Matrices, a = 1
Computing Matrices, a = 2
Computing Matrices, a = 3
Computing Matrices, a = 4
Computing Matrices, a = 5
Computing Matrices, a = 6
Computing Matrices, a = 7
Computing Matrices, a = 8
Computing Matrices, a = 9
Computing Matrices, a = 10
Computing Matrices, a = 11
Computing Matrices, a = 12
Computing Matrices, a = 13
Computing Matrices, a = 14
Assumed R = 1, Calculated R = 2 
Continuing iterations...
Assumed R = 2, Calculated R = 3 
Continuing iterations...
Assumed and Calculated R values are converged to 3

Outputting detailed score summary to CodeLengths.tex

The following line is designed to be easy to grep (using the @) out of the stdout.

@146.499746,4,0 5 13 15 ,0.000 0.000 , 2 2 2,648177177,1603874186

The synopsis of this is as follows:

 

Program File Output:

There are two files that the program generates:

  1. "CodeLengths.tex"---This is a LaTeX file that gives a detailed breakdown of the description length for the block designation found by MDBlocks. Run LaTeX on it to get an easy-to-read table.
  2. "FileName.log" --- for any data file name, a file is created and then appended to each time MDBlocks is run on that particular data file. It records the starting time, starting seeds, ending seeds and ending time of each program run.

 

Copyright Notice:

Copyright (c). The Regents of the
University of California (Regents). All Rights Reserved.

Permission to use, copy, modify, and distribute this software and its
documentation for educational, research, and not-for-profit purposes,
without fee and without a signed licensing agreement, is hereby granted,
provided that the above copyright notice, this paragraph and the
following two paragraphs appear in all copies, modifications, and
distributions. Contact The Office of Technology Licensing, UC Berkeley,
2150 Shattuck Avenue, Suite 510, Berkeley, CA 94720-1620, (510) 643-7201,
for commercial licensing opportunities. Created by Eric C. Anderson and
John A. Novembre,
Department of Integrative Biology, University of California, Berkeley.

IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS,
ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF
REGENTS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY,
PROVIDED HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.