ADNI data is made available to researchers around the world. As such, there are many active research projects accessing and applying the shared ADNI data. To further encourage Alzheimer’s disease research collaboration, and to help prevent duplicate efforts, the list below shows the specific research focus of the active ADNI investigations. This information is requested annually as a requirement for data access.
|Principal Investigator's Name:||David Fardo|
|Institution:||University of Kentucky|
|Proposed Analysis:||(I have recently joined Paul Crane's group - the analysis plan is repeated below:) Pathway-wide association study (PWAS) of AD 0. Background. SNPs form the unit of analysis for Phase I genome-wide association studies (GWAS) such as that recently accepted for publication from the ADGC. SNPs are convenient because the observed data are SNPs. Each SNP is in linkage disequilibrium with DNA located close to that SNP, and an association between a SNP and a phenotype is rarely due to a causal relationship at that SNP; almost always a true signal for a SNP is due to one or more causal relationships in LD with the SNP. There are important analogies to psychometric tests, in that observed data (item responses) are rarely of particular interest in themselves. We propose to apply recent thinking in pathway analysis to ADGC data in an effort to uncover additional relevant signals that should be followed up with subsequent analyses. 1. Dataset to be analyzed. Initially the ADGC data and if possible all of the datasets analyzed in Stage 1 and 2 of Naj et al. 1 2. Variable(s) to be requested and analyzed. SNP-level data, case-control status and age, essentially all of the variables analyzed in Naj et al. 3. Analyses to be performed. We propose to follow the steps as outlined in a recently-published paper by Zhao et al. 2: 3a. Map SNPs to genes. Zhao et al. mapped SNPs to genes within the 5 kb upstream and downstream region. If SNPs were mapped to multiple genes using this definition, they used a hierarchical mapping scheme, coding > intronic > 5’utr > 3’ utr, following 3. Information on gene ID, gene names, and their start and end positions on a chromosome was downloaded from NCBI’s Genome database http://www.ncbi.nlm.nih.gov/Genomes/ 3b. Identify pathways. Zhao et al. used the KEGG Pathway Database (201 annotated pathways) and the Biocarta Pathway Database (320 annotated pathways) using DAVID (Database for Annotation, Visualization, and Integrated Discovery). 3c. Focus on pathways with ≥15 genes (247 pathways). We will repeat with pathways with at least 10 genes, and with pathways with at least 5 genes, as time and resources allow. 3d. Use principal components analysis to capture variation in minor allele frequency across SNPs within each gene. 3e. For each pathway, put principal components into regression models to predict disease. Select models with 1, 2, … K genes using LASSO 3f. Compare models using AIC and BIC. 3g. Use the number of genes for the best candidate to inform 1,000 permutations where disease status will be randomly assigned and repeat the steps. 3h. Determine false discovery and family-wise error rates using the permuted data sets. 4. Members of the proposed analysis teams Our group includes members at the University of Washington (Paul Crane, MD MPH; Shubhabrata Mukherjee, PhD; S. McKay Curtis, PhD, and Emily Trittschuh, PhD), Boston University (Robert Green, MD MPH, soon to be at Harvard University, and Rick Sherva, PhD), Indiana University (Andy Saykin, PhD, and Li Shen, PhD), Brigham Young University (John “Keoni” Kauwe, PhD), and the University of Kentucky (David Fardo, PhD). 5. Data to be added (e.g. expression data, eSNP data, data from other studies, etc). None 6. Data cleaning methods as in Naj et al. 7. Population structure analysis. As in Naj et al. 8. Timeline. We will work closely with the analysis group(s) pursuing other similar structure analyses. Zhao et al. is one of several recent approaches to pathway analyses. We anticipate working alongside others pursuing similar analyses, sharing results, and pursuing publication(s) in collaboration with other techniques. At this stage no one knows the ideal analytic strategy, so the best publication would include results from multiple different approaches. We anticipate work on this project will take roughly a year. Deliverables. Published manuscript. 1. Naj A, Jun G, Beecham G, et al. Common variants in MS4A4/MS4A6E, CD2uAP, CD33, and EPHA1 are associated with late-onset Alzheimer’s disease. Nature Genetics 2011 in press. 2. Zhao J, Gupta S, Seielstad M, Liu J, Thalamuthu A. Pathway-based analysis using reduced gene subsets in genome-wide association studies. BMC Bioinformatics 2011;12(17). 3. Torkamani A, Topol EJ, Schork NJ. Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 2008;92(5):265-72.|