Software for Haplotype Inference

References:
Huang, Y.-T., Chao, K.-M., and Chen, T., 2005, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” Journal of Computational Biology, 12: 1261-1274.
PowerPoint Presentation

Given a set of genotypes, we wish to find a minimum subset of haplotypes that can resolve these genotypes. This problem has been shown as NP-hard.

項目符號

For short genotypes, we design and implement a software called SDPHapInfer.

項目符號

For long genotypes, we design and implement a software called LongHapInfer.

Note that the zip files are compressed using WinZip. If you encounter any problem, please contact Prof. Kun-Mao Chao.

SDPHapInfer

SDPHapInfer is implemented in MatLab. The SDPHapInfer formulates the haplotype inference problem as the following integer quadratic programming (IQP) problem. The objective function is to minimize the number of haplotypes, and the constraint functions are to resolve each genotype.

To solve the IQP problem, we propose an iterative SDP-relaxation algorithm which finds a solution within a factor of O(log n) of the optimal solution. The SDP problem is solved via an external MatLab library called SDPA-M.

項目符號

Download SDPHapInfer. This zip file contains four MatLab scripts: SDPHapInfer.m, ReadGenoFile.m, Invert.m, Chk_Resolve.m, and Find_Index.m.

項目符號

Input file format:
First line: #_of_genotypes #_of_SNPs. Second line: first genotype; Third line: second genotype; ...
For each locus on the genotype, we represent the homozygous wild type as 0, homozygous mutant as 1, and heterozygous site as 2. The following is an example containing 3 genotypes with 7 SNPs.
3 7
1022101
1001222
2211002

項目符號

To run this program, type "SDPHapInfer" under the MatLab command line, and you will see a prompt for input of the file name.
>>SDPHapInfer
****************************************
*********** SDPHapInfer v0.4 ***********
****************************************
Please specify the input file name: test.dat
...

項目符號

The SDPHapInfer will generate an output file *.out (e.g., test.dat.out) that contains solutions.
2211002 = 0011000 + 1111001 ;
1022101 = 1001101 + 1010101 ;
1001222 = 1001101 + 1001010 ;

Number of iterations: 1
Number of selected haplotypes: 5
List of selected haplotypes:
0011000
1001010
1001101
1010101
1111001

 

LongHapInfer

LongHapInfer is implemented in MatLab. This algorithm partitions the genotypes into segments, infers subhaplotypes for each segment, and concatenates subhaplotypes in different segments for the final solution. LongHapInfer requires an external MatLab library called SDPA-M.

項目符號

Download LongHapInfer. This zip file contains four MatLab scripts: LongHapInfer.m, ReadGenoFile.m, Invert.m, Chk_Resolve.m, and Find_Index.m.

項目符號

Input file format:
First line: #_of_genotypes #_of_SNPs. Second line: first genotype; Third line: second genotype; ...
For each locus on the genotype, we represent the homozygous wild type as 0, homozygous mutant as 1, and heterozygous site as 2. The following is an example containing 3 genotypes with 7 SNPs.
3 7
1022101
1001222
2211002

項目符號

To run this program, type "LongHapInfer" under the MatLab command line, and you will see a prompt for input of the file name.
>>LongHapInfer
****************************************
*********** LongHapInfer v0.1 ***********
****************************************
Please specify the input file name: test.dat
...

項目符號

The LongHapInfer will generate an output file *.out (e.g., test.dat.out) that contains solutions.

 

 

This site was last updated 03/29/06