Software for Finding Robust Tag SNPs

References:
項目符號

Huang, Y.-T., Zhang, K., Chen, T. and Chao, K.-M., 2005, “Selecting Additional Tag SNPs for Tolerating Missing Data in Genotyping,” BMC Bioinformatics, 6: 263.

項目符號

Chang, C.-J., Huang, Y.-T., and Chao, K.-M., 2006, “A Greedier Approach for Finding Tag SNPs,” Bioinformatics, 22: 685-691.

In the genotyping process, the tag SNPs may be genotyped as missing data, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data. However, the ambiguity caused by missing data can be avoid by genotyping a larger set of SNPs, called "robust tag SNPs." To find robust tag SNPs, we propose two greedy and one iterative linear programming (LP) relaxation algorithms. The greedy algorithms are implemented in Java and the iterative LP-relaxation algorithm is implemented in Perl. In addition, we also implement a Java program  which enumerates those possible combinations of SNPs to find the optimal solution. The following four programs are freely available for academic use.

項目符號

The program for the first greedy algorithm

項目符號

The program for the second greedy algorithm

項目符號

The program for the iterative LP-relaxation algorithm

項目符號

The program for finding the optimal solution.

項目符號

The program based on a greedier approach (This approach finds better solutions than greedy algorithms.)

Note that the zip files are compressed using WinZip. If you encounter any problem, please contact Prof. Kun-Mao Chao.

The program for the first greedy algorithm:

項目符號

The first greedy algorithm is implemented in Java. To run this program, you have to first download the Java Development Kit (JDK) from http://java.sun.com. You can compile the Java sources and then run the program.

項目符號

Source code: including RunG1.java and Greedy1.java.

項目符號

Java classes: you can skip the compiling process by running these classes directly.

項目符號

Input file format: BlockID haplotype_frequence haplotype_pattern (see an example). For the haplotype patterns, we represent the major allele as 1, the minor allele as 0, and the missing allele as n. The following is an input file of two haplotype blocks, each of them contains 2 haplotype patterns with 7 SNPs.

項目符號

Block1 2 1100110

項目符號

Block1 1 11110n0

項目符號

Block2 4 1100101

項目符號

Block2 3 11n0101
 

項目符號

Running this program requires two input parameters.

項目符號

Parameter 1: the number of missing SNPs allowed.

項目符號

Parameter 2: the input file containing haplotype blocks.
 

項目符號

For example, if you wanna tolerate 3 missing SNPs and the input file is test.dat. In the command line, please type

項目符號

java RunG1 3 test.dat

項目符號

An output file "Greedy1_3.txt" containing the robust tag SNPs for each block is generated.

The program for the second greedy algorithm:

項目符號

The second greedy algorithm is implemented in Java. To run this program, you have to first download the Java Development Kit (JDK) from http://java.sun.com. You can compile the Java sources and then run the program.

項目符號

Source code: including RunG2.java and Greedy2.java.

項目符號

Java classes: you can skip the compiling process by running these classes directly.

項目符號

Input file format: BlockID haplotype_frequence haplotype_pattern (see an example). For the haplotype patterns, we represent the major allele as 1, the minor allele as 0, and the missing allele as n. The following is an input file of two haplotype blocks, each of them contains 2 haplotype patterns with 7 SNPs.

項目符號

Block1 2 1100110

項目符號

Block1 1 11n1000

項目符號

Block2 4 1100101

項目符號

Block2 3 110010n
 

項目符號

This program requires two input parameters.

項目符號

Parameter 1: the number of missing SNPs allowed.

項目符號

Parameter 2: the input file containing haplotype blocks.
 

項目符號

For example, if you wanna tolerate 3 missing SNPs and the input file is test.dat. In the command line, please type

項目符號

java RunG2 3 test.dat

項目符號

An output file "Greedy2_3.txt" containing the robust tag SNPs for each block is generated.

The program for the iterative LP-relaxation algorithm:

項目符號

The iterative LP-relaxation algorithm is implemented in Perl, where the linear programming is solved via a program called lp_solve. To run this program, you have to first install the Perl environment (http://www.activestate.com/).

項目符號

Download (containing the Perl script ILPRelax.pl and the external program lp_solve.exe).

項目符號

Input file format: BlockID haplotype_frequence haplotype_pattern (see an example). For the haplotype patterns, we represent the major allele as 1, the minor allele as 0, and the missing allele as n. The following is an input file of two haplotype blocks, each of them contains 2 haplotype patterns with 7 SNPs.

項目符號

Block1 2 1100110

項目符號

Block1 1 1111n00

項目符號

Block2 4 1100101

項目符號

Block2 3 110010n
 

項目符號

This program requires two input parameters.

項目符號

Parameter 1: the number of missing SNPs allowed.

項目符號

Parameter 2: the input file containing haplotype blocks.
 

項目符號

For example, if you wanna tolerate 3 missing SNPs and the input file is test.dat. In the command line, please type

項目符號

perl ILPRelax.pl 3 test.dat

項目符號

An output file "ILP_3.txt" containing the robust tag SNPs for each block is generated.

The program for finding the optimum solution:

項目符號

The program for finding the optimum solution is implemented in Java. To run this program, you have to first download the Java Development Kit (JDK) from http://java.sun.com. You can compile the Java sources and then run the program.

項目符號

Source code: including RunOPT.java and Optimize.java.

項目符號

Java classes: you can skip the compiling process by running these classes directly.

項目符號

Input file format: BlockID haplotype_frequence haplotype_pattern (see an example). For the haplotype patterns, we represent the major allele as 1, the minor allele as 0, and the missing allele as n. The following is an input file of two haplotype blocks, each of them contains 2 haplotype patterns with 7 SNPs.

項目符號

Block1 2 1100n10

項目符號

Block1 1 1n11000

項目符號

Block2 4 1100101

項目符號

Block2 3 1100101
 

項目符號

Running this program requires two input parameters.

項目符號

Parameter 1: the number of missing SNPs allowed.

項目符號

Parameter 2: the input file containing haplotype blocks.
 

項目符號

For example, if you wanna tolerate 3 missing SNPs and the input file is test.dat. In the command line, please type

項目符號

java RunOPT 3 test.dat

項目符號

An output file "OPT_3.txt" containing the robust tag SNPs for each block is generated.

The program based on a greedier approach:

項目符號

GPT.rar

項目符號

readme.txt

項目符號

GPT.exe

項目符號

testfile

 

 

This site was last updated 03/29/06