Leach- GWAS

¡Supera tus tareas y exámenes ahora con Quizwiz!

WR Example of GWAS help in *disease prevention/ risk prediction *

*Type 1 diabetes* is due to *immune-destruction* of beta-islet cells of the pancreas. If we can form a therapy to prevent this, genetic screening will help prevention! *GWAS has identified over 50 loci associated with this disease. Overall explaining ¾ of the disease risk* *3/4!* Diabetes type 1 is ideal for prevention because: is shown to meet all criteria for effective genetic risk prediction! Early identification is important for early identification is important owing to effective preven¬tive or disease-altering therapies, heritability is high, identified genetic loci explain a large portion of the risk, and genotyping scores show high discriminative value5 *However, there isn't an effective preventative therapy as of yet* *upon finding an effective genetic therapy on protecting beta-islet cells of the pancreas, this will be ideal*

Impact of GWAS *(WR)*

*pharmacogenomics* - leads towards personalised medicine. Theres been trials on how personal SNPs affect drug response. focus on CV disease treatment eg *Warfin* -However, *pharmacogenomics* have identified that GWAS may have limited impact because most SNPs have such a tiny impact it's unlikely that they'll influence drug impact

how we identified genetic variants

- International HapMap Consortium (now archived) (http://hapmap.ncbi.nlm.nih.gov/) - *1000 Genomes Project *(http://www.internationalgenome.org/). All variants MAF>1%. 3 phases - dbSNP - single nucleotide polymorphism database (https://www.ncbi.nlm.nih.gov/projects/SNP/).

impact of GWAS

- identify new genetic pathways behind disease -diseaseome (many genetics behind 1 disease) (*'molecular sub phenotypes*') -Individiuals with the same disease may have a different genetic reason - LEADS TO PERSONALISED MEDICING (however WR= this is limited impact. eg Warfin.)

• Understanding of the genetic architecture of complex traits is still quite limited in terms of:

- number of genetic variants involved. - their allele frequencies. - their effect sizes. - their interactions.

what's required for categorical trait stats tests

-must be normally distributed (test with t test etc) -variance must be the same for each genotype -may require transformation of data eg log scale

Why do we have the missing heritability problem (why isn't all variance explained by GWAS?)

-novel/ rare variants often missed -alleles/SNPs with low penetrace/ effect sizes are often missed -most studies only look at SNPs, what about other structural variants? -epistasis -GxE interaction • Several examples in humans EG: FTO alleles increase body weight + obesity risk, but this effect can be reduced by exercise

Limitations of GWAS Big list

1) Association may not be true association- could be an artefact, could be CLOSE to a QTL but not a QTL itself 2) most GWAS studies have used populations of *european descent*: Means that *SNPs are missed* that potentially have large impact on phenotype Clinical application must take this into account!-> when evaluating the predictive value of genetic models, *they often perform best in the population from which they were developed*, and their performance can be affected by differences between populations in genotype frequencies, phenotypic effect sizes and disease incidence46. 3) difficulty in finding causal variants The associated SNP may be *within* or *close to* a gene that is relevant to the trait of interest. Goal is to identify this gene and its variants (alleles) that confer different disease risks. Quantitative Trait Nucleotide (QTN). In Reality... *Some associations are nowhere near a functional gene e.g. a variant on chr 9p21 associated with heart attack is 150kb from the nearest gene. * *WR*: Recent data demonstrating *strong correlations of GWAS-defined signals with enhancer elements and other regulatory regions*33,34 , however, have renewed enthusiasm for exploring functional implications of these signals 4)Generalisability/ Applicability of SNP association - SNP associations identified in one population are not always transferrable to other populations. -Different populations have different histories leading to different LD structure - so the markers segregating with a causal gene may differ. 5) Most trait variation remains unexplored The bulk of the genetic variance underlying the trait heritability has still not been explained. e.g. > 30 markers associated with Crohn's disease explain < 10% of genetic variance. The so-called "Missing heritability" problem. A not uncommon view is that GWAS was a useful early technology that should largely be supplanted by sequencing approaches to detect rarer variants of large effect22. why? epistasis, SNP focus, GxE interaction (being overcome) *novel/rare variants often missed* *alleles with small effect sizes often missed* 6) focus on *common polymorphisms* Most focusses on those with a MAF of 5%+ Means that lower freq. variants but higher phenotypic impact are missed. BUT Deeper sequencing-based characterization of genomic variation17,18, fine mapping26,27, imputation28,29 and denser single-nucleotide polymorphism (SNP) arrays are extending the reach of GWAS to ever lower ranges of minor allele frequency 8) issue of *assessing LD * LD made the GWAS approach possible (by allowing a single 'tag' SNP to serve as a proxy for much of the surrounding genetic variation with which it is inherited), parsing such signals to a single causative variant (if one exists) can require substantial additional research. *EG* early *GWAS for childhood-onset asthma* - the loci has flagged up ORMDL3, GSDMB and other alleles ! *This signal has yet to be isolated to one of these plausible and tightly linked genes in the 17q21 locus, delaying the understanding of the potential role, if any, of these genes in asthma pathogenesis and the potential application of this finding to diagnostic or therapeutic decision making. This signal has yet to be iso¬lated to one of these plausible and tightly linked genes in the 17q21 locus, delaying the understanding of the poten¬tial role, if any, of these genes in asthma pathogenesis and the potential application of this finding to diagnostic or therapeutic decision making* 7) *missing heritability problem*- due to lots of reasons on another card

Alternative hypothesis for genetic traits (2)

1) CDCV hypothesis (common disease common variant hypothesis) Common complex traits are largely due to common variants with small to modest effect size. This was the focus of human genetics at the turn of the 21st century. 2) CDRV hypothesis (common disease rare variant hypothesis) Common complex traits are largely due to the *summation of low frequency variants with high penetrance.*

population structure leading to false positives in GWAS

1) if there's unknown subgroups/ families this can lead to allele frequencies differing between subgroups This can lead to traits differing between subgroups BUT NOT ALLELES IN RELATION TO THE PHENOTYPIC TRAIT OF INTEREST!

Overview of GWAS

1. Spectrum of human genetic variation. 2. Design of association studies. 3. Principles of association analysis. 4. Statistical methods and challenges. - don't know detail 5. What has been achieved so far? 6. Limitations of GWAS - independent learning

Challenges in studying genetics of human populations

1. incomplete penetrance/ phenocopy 2. small family size 3. non random mating 4. G x E interaction 5. cultrual transmission of traits 6. G x E correlation 7. common environment effects 8. social, ethical + political factors

type 2 diabetes environmental factors

1. pollution can cause *disturbs glucose metabolism, causing insulin resistance* OR up to 11.5 for those with pesticide exposure 2. post-paternal. 9 pound baby or above= more likely.

what does association in GWAS mean

1. true association 2. in linkage disequilibrium with true QTL 3. artefact of population admixture

the classic variant stat

3 million differences between 2 humans

WR: other disease linked to GWAS (non SNP)

CNV genome wide studies identify CNV role in schizophrenia etc Schizophrenia CNV case:control studies found *1* Scizophrenics have significantly MORE RARE CNVS than controls, *2* Significantly LARGER (>100kb) CNVS than controls

Types of variant

Common MAF>1% Rare MAF <1% novel

Spectrum of human genetic variation

Common variants have minor allele frequency (MAF) > 1%. Rare variants have minor allele frequency < 1%. Novel/de novo variants occur only in a single family/individual. most variants are neutral (do not contribute to phenotype)

Confounding variables

Confounding occurs when an extraneous factor, or a set of factors, *can at least partially explain an apparent association or a lack of an apparent association between a risk factor and the outcome.* EG gender explains the GENE that u think is linked to the case factors that cause differences between the experimental group and the control group other than the independent variable eg age, gender

FTO gene

Fat mass and obesity-associated gene 44-65% of people have at least one copy Stimulates excessive food intake Physical activity can attenuate the gene's influence

stats tests for GWAS

For *categorical GWAS*: • Basic concept: comparing freq of genetic variants between 2 groups • Eg. case control studies • Can use stats similar to chai squared - Fisher exact test (good for small groups) - *Cochran-Armitage trend test* - To compare frequency of genetic variants in the two groups (case/control) For *continuous GWAS*: • EG continuous/ quantitative traits: blood pressure, cholesterol level etc • Anova • Linear regression - Both require: - The trait to be normall distributed for each genotype (t test etc) - The variance to be the same for each genotype - May require transformation of data eg log scale

WR *drug treatment influenced by GWAS*

GWAS identified ITPA variants and ribavirin toxicity This helps guide treatment options for Hepititis C *ALSO* pharmacogenomics leading to personalised medicine w Warfin?

Benefit of GWAS

Higher resolution mapping for QTLs (because its association mapping!)

*1000 genomes project*

It seeks to integrate data on all types of variation that might cause human disease. Produced an 'integrated map' 38million SNPs 1.4 million indels

how gene x environment issue is being overcome

Large prospective cohorts are being established to facilitate G x E studies. eg the national children's study in the US eg Avon Longitudinal Study of Parents and Children in UK

location of SNP assocations in GWAS

Lots are in DNAaseI peaks Some in TF binding sites

Cochran-Armitage trend test

Most commonly used test of association between genotypes and case-control status in a GWAS. *Y axis= the risk* *(case/ (case+control)* X axis= genotype value (0,1,2) If pi=qi = no statistical significance (pi= proportion of genotype in cases)

GWAS method

NOTE: most current GWAS are not fully "genome wide"- some common SNPs and more rare SNPs are NOT INCLUDED in genotyping 1) Select "cases" - individuals with the disorder. 2) Select "controls" - individuals without the disorder. *control for population stratification! AGE + GENDER* 3) Collect high-throughput genotype data. e.g. Affymetrix genome-wide human SNP array. - This tests loads of SNPs throughout the genome - NOTE: most current GWAS are not fully "genome wide"- Not every SNP is assessed- just the common ones usually 4) Do stats tests + TEST FOR ASSOCIATION on every SNP marker 5*CALCULATE OR + P VALUE FOR EACH MARKER* 5) Set up hypotheses: Null hypothesis: there is no association between the marker (e.g. SNP) genotype and the trait. Alternative hypothesis: there is a significant association between the marker genotype and the trait. 6) PRODUCE A *MANHATTAN PLOT* GWAS tests for association of each marker/SNP with disease status in tens of thousands of individuals.

incomplete penetrance

Not all individuals with a mutant allele genotype show the mutant phenotype

How do we measure strength of associations?

Odds ratios (odds of it being in cases compared to controls)

MANHATTAN PLOT

Plot of results from a genome scan; shows which genomic regions are significantly associated with traits of interest GWAS

*GOOD WR* clinical uses of GWAS

Risk prediction, disease classification, drug development + drug tosticity

Wider reading - advancement on GWAS

There's been a study on rare variants! eg CNV genome wide studies 1) family based studies of de-novo CNVs 2) Case: control analysis of CNV burden *CNV >100bps reported in schizophrenia* *Rare CNVs > Schizophrenics vs control 3) single marker/ association of target regions/ genes

example disease with GWAS

Type 2 diabetes • Genes in several different pathways are involved Eg pathways affecting pancreatic b cell function Eg pathways affecting glucose levels and obesity

*linkage disequilibrium*

When a pair of alleles from two loci are *inherited together in the same gamete more/less often than random chance would expect* *ALLELES ARE CORRELATED WITH ONE ANOTHER* * Departure from 0.25 means there is non-random distribution (in the case of 2 markers with 2 alleles at each marker) *This correlation in structure is called LD *LD varies between different regions of the genome + different populations *LD means instead of seeing all allele combinations, you see some more than others

Diseasome

a disease map that connects diseases that share genes that have altered expression. Reveals relationships among diseases that were not obvious from traditional medical science genetic background behind seemingly different diseases overlap alleles may confer resistance to some diseases and increase risk to others EG *PTPN22*

Significance levels for GWAS

a'= *5x 10-8* (this is 0.05/1,000,000 Replication with an independent sample is then also required too afterwards!

Negatives of GWAS

can give false positives (if confounding variables/ subpopulations)

*The 2 types of GWAS*

direct Directly *genotype all SNPs* e.g. by DNA sequencing. Can directly identify functional SNPs (e.g. SNP 8) and so *do not rely on the existence of LD*. indirect This is *most GWAS* Genotype a *subset of SNPs* (including functional and tag SNPs) e.g. using SNP microarrays. Rely on LD between genotyped SNPs and causal SNPs to identify significant associations.

IBD

identify by state what we assume for GWAS (cf IBD)

all stats methods for QTL mapping

liklihood ratio

LD

linkage disequilibrium

linked genes vs linkage disequilibrium

linked- physically on the same chromosome. Closely linked are rarely separated by recombination LD- have genes/ alleles that are correlated, they *are often closely linked BUT DONT HAVE TO BE LINKED* - they *might be on serrate chromosomes!* *Just have to be correlated* - this is a statistical relationship with genes/ alleles EG selection may favour LD of alleles on different chromosomes

OR

odds ratio the strength of GWAS associations • OR >1 = genetic variant INCREASES risk of the disease • OR = 1 = equal risk of disease in control and case groups • OR<1 = genetic variant gives protection against the disease

*Examples of false positives in GWAS*

• Presence of subpoulations (population structure) • Confouding causes

linked alleles

physically located on the same chromosome whereas linkage diseqilibrium can be on 2+ chromosomes

linkage analysis methods

single marker analysis pairwise analysis (IM, CIM, MIM)

Phenocopy

someone gets the disease without the allele

linear regression

statistical test used on *continuous GWAS* data (EG *CASE:CONTROL*) Needs normal distribution of data (therefore *cannot be for categorical *case:control**) eg blood pressure, height etc Y axis = phenotypic value X= genotype value (0,1,2)

haplotype blocks

stretches of the genome where recombination is infrequent and linkage disequilibrium is high These allele combinations are therefore in 'blocks'

PTPN22

the allele that confers resistance against chrons, but increases threat of other autoimmune diseases

Direct association

the causal SNP for the trait

tagSNPs

the few SNPs used to identify a haplotype SNPs that are in the same haplotype block as the directly associated SNP tagSNPs allow for indirect SNP studies

Indirect association

the indirectly associated SNPs for the trait (tag SNPs, in the same haplotype block as the directly associated SNP, therefore are flagged up with the quantitative trait)

Linkage equilibrium

when the genotype of a chromosome at one locus is independent of its genotype at the other locus eg if 2 markers with 2 alleles, you would have 4 haplotypes linkage equilibrium= each is expected frequency of 0.25

Frequency of OR values:

• Consider a risk allele "C" and wild type "T". Three genotypes: CC CT TT • Most odds ratios for heterozygous genotypes of the associated variants ≈ 1.1. • Most odds ratios for homozygous genotypes of the associated variants ≈ 1.5. • Very few variants have odds ratio ≥ 10. • Quantitative traits can be controlled by a large number of genes, each with small effects (small ORs)

Advantages of GWAS

• Previous knowledge of the genetics of the trait not required. • Can fine map QTL to 10-100kb because many recombination events have occurred in the history of the population. • Can reveal causal genes in an unbiased way. :D

How LD is used

•As long as GWAS genotype at least *one SNP in each haplotype block*, we can identify SNPs associated with the trait •SNPs in LD with the target SNP can also identify the causal SNP •This means most studies are called 'indirect SNP studies' *'tag'SNPS*


Conjuntos de estudio relacionados

Marketing Chapter 7 Warm-Up and Quiz

View Set

Georgia rules and codes pertinent to life insurance

View Set