Background: The underlying genetic etiology (model) of complex traits includes a myriad of possible functional models, including heterogeneity (where multiple models can result in the same trait outcome), epistasis (gene-gene and gene-environment interactions), and sources of noise in the dataset (including missing data, and many types of error). Because of the large number of sources of potential noise, two teams of students will work with these methodologies to perform a more extensive empirical comparison (divided into Project 1a and Project 1b). The impact of these factors is largely unknown for many novel data-mining approaches, and the relative performance of novel computational approaches to detect such models in the presence of noise needs to be understood. Data simulations will be used to empirically compare several commonly used novel, computer-intensive and traditional data-mining approaches designed for candidate gene studies (e.g. Multifactor Dimensionality Reduction, Random Forests, logistic regression, etc). Insights on the relative performance of these methods will be used to guide the analysis of a real dataset in pharmacogenomics. Freely available data simulation and analysis tools will be used for the empirical comparison and real data analysis, implemented in the R language and Unix/Linux applications. Students will use NCSU's super computing cluster, through the High Performance Computing (HPC) center in order to aid in these computationally intensive comparisons.