Accessibility Navigation:

Department of Statistics Logo







The International Year of Statistics (Statistics2013)
PARTNERS
National Institute of Statistical Sciences Logo
Statistical and Applied Mathematical Sciences Institute Logo
Bioinformatics Research Center Logo
Center for Quantitative Sciences in Biomedicine Logo
Department of

Statistics

NCSU Dept of Statistics
5109 SAS Hall
2311 Stinson Drive
Raleigh, NC 27695-8203

Tel: (919) 515-2528
Fax: (919) 515-7591

Computation for Undergraduates in Statistics Program


2010 - 2011 Projects


Crashing Building Blocks: Using Genomic Structure to Collapse Data

Background: All genetic association analysis relies on a specific feature of the human genome - linkage disequilibrium (LD). LD describes the nonrandom assortment of alleles (variants) across the genome. Genetic variants are both physically and statistically correlated, such that a lot of genetic data across the genome is redundant. In fact, sets of single genetic variants can be collapsed into what are called haplotype blocks that can identify all variant sites in its region of the genome. By performing association analysis on these haplotypes instead of single variants, the power of gene-mapping studies can increase due to the reduced number of tests performed and the increase in genetic information. In finding disease associated genes, this structure within the genome can be used in the mapping process - to first identify regions of the genome that may be related to the disease, and then look for more complex models within these genomic regions. There are several approaches for inferring haplotypes in genetic association studies, and these approaches may be coupled with data-mining methods (such as Multifactor Dimensionality Reduction) to detect complex predictive models. Using both real and simulated data, this collapsing approach will be investigated for it performance/power to detect complex predictive models. Freely available data simulation and analysis tools will be used for the empirical comparison and real data analysis, implemented in the R language and Unix/Linux applications. Students will use NCSU's super computing cluster, through the High Performance Computing (HPC) center in order to aid in these computationally-intensive comparisons.

Copyright 2011 NCSU Department of Statistics
Comments / Problems: webmaster@stat.ncsu.edu
Privacy Statement
NCSU Policies