Background: All genetic association analysis relies on a specific feature of the human genome - linkage disequilibrium (LD). LD describes the nonrandom assortment of alleles (variants) across the genome. Genetic variants are both physically and statistically correlated, such that a lot of genetic data across the genome is redundant. In fact, sets of single genetic variants can be collapsed into what are called haplotype blocks that can identify all variant sites in its region of the genome. By performing association analysis on these haplotypes instead of single variants, the power of gene-mapping studies can increase due to the reduced number of tests performed and the increase in genetic information. In finding disease associated genes, this structure within the genome can be used in the mapping process - to first identify regions of the genome that may be related to the disease, and then look for more complex models within these genomic regions. There are several approaches for inferring haplotypes in genetic association studies, and these approaches may be coupled with data-mining methods (such as Multifactor Dimensionality Reduction) to detect complex predictive models. Using both real and simulated data, this collapsing approach will be investigated for it performance/power to detect complex predictive models. Freely available data simulation and analysis tools will be used for the empirical comparison and real data analysis, implemented in the R language and Unix/Linux applications. Students will use NCSU's super computing cluster, through the High Performance Computing (HPC) center in order to aid in these computationally-intensive comparisons.