Accessibility Navigation:

Department of Statistics Logo







The International Year of Statistics (Statistics2013)
PARTNERS
National Institute of Statistical Sciences Logo
Statistical and Applied Mathematical Sciences Institute Logo
Bioinformatics Research Center Logo
Center for Quantitative Sciences in Biomedicine Logo
Department of

Statistics

NCSU Dept of Statistics
5109 SAS Hall
2311 Stinson Drive
Raleigh, NC 27695-8203

Tel: (919) 515-2528
Fax: (919) 515-7591

Computation for Undergraduates in Statistics Program


2010 - 2011 Projects


Accessing the Relative Performance of Data-Mining Approaches

Background: The underlying genetic etiology (model) of complex traits includes a myriad of possible functional models, including heterogeneity (where multiple models can result in the same trait outcome), epistasis (gene-gene and gene-environment interactions), and sources of noise in the dataset (including missing data, and many types of error). Because of the large number of sources of potential noise, two teams of students will work with these methodologies to perform a more extensive empirical comparison (divided into Project 1a and Project 1b). The impact of these factors is largely unknown for many novel data-mining approaches, and the relative performance of novel computational approaches to detect such models in the presence of noise needs to be understood. Data simulations will be used to empirically compare several commonly used novel, computer-intensive and traditional data-mining approaches designed for candidate gene studies (e.g. Multifactor Dimensionality Reduction, Random Forests, logistic regression, etc). Insights on the relative performance of these methods will be used to guide the analysis of a real dataset in pharmacogenomics. Freely available data simulation and analysis tools will be used for the empirical comparison and real data analysis, implemented in the R language and Unix/Linux applications. Students will use NCSU's super computing cluster, through the High Performance Computing (HPC) center in order to aid in these computationally intensive comparisons.

Copyright 2011 NCSU Department of Statistics
Comments / Problems: webmaster@stat.ncsu.edu
Privacy Statement
NCSU Policies