GG3 L12-16

Ace your homework & exams now with Quizwiz!

Reprogramming (removal of epigenetic marks)

2 main phases of removal of epigenetic marks: 1. formation of gametes (primordial germ cell) -gametes have diff genes requirement than indiv 2. Early zygote development (blastocyst)

PCA of Europe

2 principle components - first = explains most variance -second = explains the second Countries cluster similarly (Finland = outlier - a lot of history associated with outside of W Europe) Tilt = get same 2 principle component (PC1 and PC2) -PC1 = North/South Cline in change of allele freq -PC2 = East/West Change Outcome: strong correlation between geography and the genetic observing Reason: isolation by distance and mate close together

Genomic Personalised medicine

2. Genetic testing - Diagnostic, risk assessment - when can we use a diagnostic test? -when is risk assessment rather than diagnostic testing appropriate? -what makes a good diagnostic test/risk assessment? -what limits its usefulness? 3. Genome-guided treatment - pharmacogenomics -what limits the success?

KYP and kyp mutants

KYP = histone (!!!) methyltransferases (encodes SET domain protein) -contain SRA (SET and RING assoc domain) -catalyses H3K9me2 methylation (dimethylation) -eliminates H3K9me2 but also CNG methylation ! kyp mutant = suppressor of clark kent bizarelly = histone methylation needed for DNA methylation = not expected

Mammal and plant version of Maintenance of CG and CHG methylation

MAMMALS UHRF protein with SRA domain = important for recruiting DNMT1 for CG methylation in mammals less dependent on histone methylation in plants than animals but does involve 2 domains PLANTS for CNG methylation - have loop of 2 different proteins 1. DNA methyltransferase is recruited to H3K9me2 2. methylates DNA 3. DNA methylation recurits the methyltransferase domain 4. adds to stability of the inheritance of this mark - prob why CHG is faithfully propagated through DNA replication in plants but not animals

MET1 and met1 mutants

MET1 = DNA methyltransferase in mutant (met1) = CG methylation is strongly reduced

Simple vs Complex genetic components of disease

Simple: - chromosomal e.g. Down syndrome -single gene (Mendelian) e.g. Cystic fibrosis, Huntington's disease Complex: - multiple genes e.g. breast cancer, Crohn's disease, diabetes

Diagnostic test or risk prediction using genetics

Single gene disordres -tests can be designed to pick up variance in single gene -effectivness depends on effect of environment = genotype may have minor importance so not good e.g. cystic fibrosis Complex disorder - no longer simple relationship - risk prediction = either looking at important genes or combined effect of many genes -e.g. breast cancer

Polygenic score

a tool for risk prediction using multiple variants -used in research -commercial prediction Not focusing on 1-2 faulty genes = looking at all the SNPs thought to be associated For each individuation, can get some risk score by summing the effects of the alleles the individual carries Used GWAS that studies effect of allele and then for one individuals = you are homozygous at this one, homozygous at this other one, heterozygous at other sum up effect of all the SNPs SNP allele effects can come from GWAS and use which ones should include in the score better than predicting risk in a site where lots of variants contributing to risk

Human Population Genomics

considers patterns in DNA sequence within and between populations infer past processes that have led to the genetic differences we observe

Missing heritability

detected variants only explain a low proportion of the genetic variance (or heritability) - small effect variants not detected -only detect causal variants in LD with tagging SNPs which are common= rare variants not in LD with tagging SNPs too simplistic : just looking at: "here is a variant = does it affect disease" should use more complicated models

Modern Genomics Improvements

easy and relatively cheap chips enable 500,00 or 1,000,000 SNPS to be genotyped can impute(predict) additional SNp genotypes using sequence and high density SNP data next-gen sequencing - lots of effort in exome sequencing - very few GWAS in the exome -stil likely to be in the SNPs as they are coding SNPs

Cycling Spm + a2m1 in maize

ertain isolates of the element either cycled between inactive and active phases during development or underwent an inactivation event of longer duration and sufficient stability to be heritable, but which was nonetheless occasionally reversed red and white patches not due to excision - due to changes in the activity of the Spm transposon = inactive is red patches red = where transposase no longer expressed stable changes in expression of the actual transposon and how it produces transposase where methylated = can be quite stable

Causes of Diseases

genetic predisposition - spontaneous mutaiton -inherited age, sex environment

Genome-guided treatment e.g. Warfarin dosing

genetic variatio ninfluences the response of an indiv to drug treatments potential to improve therapy - drug, dose -cost effective difficult to reliably assess and validate - need very large data sets - individuals with disorder and variety of drugs --- Warfarin dosing - anticoagulant - widely used (originally a poison against rats) -problem = high variability in response want a dose that reduces clotting without leading to severed bleeding -2 genes involves - 1. CYP2C9 (enzyme for metabolism of warfarin) = 30% variation (if you metabolise fast = need more) VKORC1 (warfaring drug target) -variants in these associated with drug efficacy -indiv genotypes being used to determine dose prevent clots doesn't lead to severe bleeding

Human Epigenome

genome-wide DNA methylation patterns of all human genes

The epigenome

genome-wide DNA methylation patterns of all human genes in all major tissues Methylated cytosine - modest modification but important = changes in activity and heritable to some degree histone methylation -mono, di, tri of lysines -in some cases, could be inheritable -normally occur on the amino acid tail recognised by other proteins which leads to changes

Paradox of siRNA and DRM2

if a gene is methylated and stops it from being transcribed = how do you make these sRNAs that are important in recruiting DRM2? (maintenance of CHH methylation needs DRM2, which needs siRNA, which needs transcription of the gene) - but if this is silenced, then no siRNA will be made... Solution: they have 2 plant specific RNA polymerase = Polymerase 4 and Polymerase 5

Malaria

infectious Plasmodium agent transmitted by a vector Environment is important - climate, location 3 genetic components 1. in human host - resistance, susceptibility to infection and disease state 2. in agent - transmissibility and pathogenecity 3. in vector- ability to transmit malaria

Inferring ancestry using DNA

infer major geographic origin of ancestors from DNA Male inheritance information from Y chromosome Female inheritance infor from mitochondrial DNA Example: 1,387 individuals sampled across Europe 200,000 SNPs 50% individuals within 310km of their origin 90% within 700km used grandparental origin = not very accurate

Variance described by GWAS

variance explained by all GWAS hits that are significant = numbers low although heritability is 0.7-0.8 for Schizophrenia - GWAS hits only explain 0.01 missing heritability = difference between pedigree and SNP based estimates --

Selection with/without recombination

selection without recombination: - a favorable mutation arises and increases in frequency selection with recombination: - Expect = each haplotype becomes a moisaic of ones in the past - unlikely to be maintained intact - Observe = a favorable mutation causes reduced diveristy around the mutation

Breast cancer - cause

several genes found that increase the chance of having breast cancer BRCA1 and BRCA2 - if mutation in these = 50 to 80% chance of getting breast cancer other variants identified large effects usually rare (e.g. TP53, PTEN) <3% cases caused by inherited faulty gene

Heritability

the proportion of the phenotypic variance of a trait that is genetic -0 = none -1 = completely inheritable can estimate from sets of relatives - have expectation of the proportion of the genome that is shared

Demethylation

- passive -active - requires TET proteins and occurs only at restricted times in development = allows development to occur

How to analyse is something is methylated/unmethylated

treat DNA with sodium bisulfide - very reactive and modifies the DNA - changes cytosine to uracil if it is methylated, reaction doesnt happen and remains as methylcytosine therefore, where find a T where used to be a C or A = unmethylated base

Case-control association study

-collect group of diseased individual and healthy individuals -unlikely that in single locus indiv will be homozygous for disease carrying allele as usually have multiple genes -if locus has sufficiently large effect = expect difference in allele frequency between cases and controls -don't know the genes involve - wish to search throughout genome for DNA variants that segregate with the disease -require observable variants in DNA = normally SNPs or other molecular marker that we KNOW WHERE THEY ARE LOCATED dont use sequence data - too expensive, time consuming and too many variants --- since only have genotyped tagging SNPs - causal variation is unlikely to be genotyped - identify the causal variant because tagging SNPs will be in LD with causal variant (may be in same haplotype block) -usually located near causal variant but extent of LD can be large = several genes so doesn't necessarily identify gene therefore, if T has high frequency in cases = suggests T is linked with disease = either: 1. in LD with causal locus 2. itself is contributing to disease ----- need to MATCH the case and control groups - come from same population, same sex distribution, same age

Cystic Fibrosis + Tests

1 in 2,500 have disease mutation in CFTR (transmembrane conductance regulator) produce sticky mucus that clogs lungs and digestive system autosomal recessive most common = 3bp eletion causing loss of aa phenylalanine at 508th position 20 common mutants - some result in no protein produced -some stops regular function therefore, variation in disease severity -- Tests: - see if carrier of more common mutations -can test placenta of unborn baby -routing testing of newborns -possible gene therapy - putting normal genes into lung tissue

Association analysis

1. Case-control study = used when considering a disease 2. Compare frequency of variant in cases compared with controls = must be of similar age, sex, geographic location need to ensure differences are only due to the disease of interest therefore , need to MATCH the case and control groups - come from same population, same sex distribution, same age

Types of Methylation (summary) in Arabidopsis

1. DRM2 required to put DNA methylation on in all contexts once its there: a) CG methylation copied by MET1 aided by VIM1 (SRA protein) b) CHG methylation copied by CMT3 aided by KYP (histone methyltransferase) c) CHH methylation = ASYMMETRIC = can't rely on these symmetric ones - involves DMR2 - each cell division to be put back on

Causes of population differences

1. Mutation 2. Genetic drift 3. Migration 4. Selection

Haplotype diversity

1. Mutation occurs on green background 2. mutation is in LD with alleles on same homologous chromosoem 3. length of ancestral chromosome inherited around mutation decreases over generations due to recombination = breaking down LD

Clinical Translation of Results from GWAS

1. Novel biological insights - clinical advances a) therapeutic targets b) biomarkers c) prevention 2. Personalised medicine - improved measure of individual processes a) diagnostics b) prognostics c) therapeutic optimisation

Diseases found to have related underlying cause

1. T2 diabetes and myocardial infarction 2. T2 diabetes and Crohn's disease 3. Crohn's disease and ulcertative collitis

DRM2

1. a de novo methyltransferase - puts on DNA methylation (2 examples before - CMT1 and MET1 - were maintenance enzymes = ensured that once the DNA methylation is there, DNA methylation is copied) drm2 mutant = loss of de novo methylation , little effect on existing CG and CNG methylation identified by looking at arabidopsis genome sequence and finding proteins with homology to methyltransferases in animals 2. Required for maintenance of asymmetric methylation (CHH) CMT3 and MET1 required for CG and CHG which are symmetric, but DRM2 required for CHH - how is DRM2 recruited to the targets? -siRNA pathway component = found using a transgene assay

Test for significance in case-control studies

1. test for independence of marker alleles and disease status - using chi-square test - contingency table 2. logistic regression - same basis but would also allow to account for slight sex differences, or things that might be important - envrionmental variables Have to perform many tests - 700,000 at each SNP Need strict significance thresholds = prevent a large number of false positives (identifying association when none is present) - if 100 tests with 1% threshold - 5 tests will be significant by chance Also need to make sure there is no population stratification = different genetic population to contorls -use PCA, clustering to check

Limitations of GWAS

1. identified SNPs are not necessarily causal -are in LD with the causal SNP -LD regions can contain many genes 2. frequently associated SNPs are not in coding regions but between genes or in introns -could be in regulatory regions -could be in LD with coding SNP -could be in long range LD 3. usually associated with increased risk of disorder (rather than disorder itself) 4. most only have small effect, a few variants with large effect useful to identify pathways -e.g. multiple sclerosis and cytokine pathway

Identifying Selective sweeps

1. look for a reduction in diversity - selection at favorable allele -hitch-hiking of neighboring alleles 2. distribution of SNP allele frequencies = expect polymorphism to change in frequency 3. Haplotype structure - Extended Haplotype Homozygosity (EHH) - length of that haplotype will tell us about the ongoing selection 4. Comparative genomics = McDonald-Kreitman 5. Compare populations = long branch length in phylogeneti ctree indicates directional selection

Genetic screen for suppressors of clark kent epiallele

1.mutagenise the plants with a chemical 2.screen for plants with normal flowers (carrying a suppressor mutant) -Logic: might knock out a gene important for this methylation and might go back to normal 3. identify genes that are required for DNA methylation of clark kent alleles or other parts of pathway = identifying things that silence clark kent identified = CMT3 gene mutant -encodes a plant specific DNA methyltransferase -has Methyltransferase domain, chromodomain, BAH domain

What we need to make GWAS better

1. need large samples - Problem: need to pull indiv across diff countries/diff geographies = problems with population stratification 2. Meta-analyses - pool together and combine samples = analysis within samples and combine summary statistics (not original raw data just summary stats) Main aim: look for the same variant having same effect in diff population 3. Functional evidence - can't just have statistical association - need functional evidence that variant is having an effect of disease interested in

RNA silencing (the basics + the plant one)

1. ss RNA copied by RDR (RNA-dep RNA polym) into dsRNA 2. recognised by DICER = chops it up into dsRNA 3. bound by ARGONOT = a) chops RNA transcripts with homology to sRNA b) recruit DNA methyltransferase = methylation in plants: lots of different ones e.g. DNA methylation at Heterochromatic or repeat regions: -involves 23-24nt sRNAs -leads to DNA methylation (related things called Piwis in animals) -if sequence the sRNA in plants = most have homology to transposons = TARGETTING TRANSPOSONS -via ARGONOT which recruits DRM2 siRNA gives specificity to DRM2 by binding to nascent transcripts in the nucleus many transposons make double stranded/aberrant transcripts that are targetted if this directs DRM2 and methylation to the promoter = transcription is silenced

Factors affecting size of haplotype around a mutation

1. strength of selection = how fast the mutation is being fixed 2. age of mutation

Haplotype tagging

A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype if look in half chromosome (recombniation occured between 2 blocks) = only see 3 types of haplotype (red, blue, mix) if genotype individual at first SNP and second SNP - tells us everything about this half haplotype in the generation Summary: only two SNPs (T) required to determine the state of all others in the block For both blocks together = only 3 SNPs required =only 5 haplotypes segregating (selection of SNPs that will cover the majority of variation in human genome)

cmt3 mutants

CMT3 -encodes a plant specific DNA methyltransferase -has Mtase domain, chromodomain, BAH domain mutant - flower now looks normal = suppresses clark kent how? -CMT3 is needed for CNG methylation -so mutant stops CNG methylation (clark kent is caused by heavy methylation) -in cmt3 mutant = CNG methylation is eliminated, CHH shows minor reduction and little effect on CG -the SUP promoter has CNG sequences but very few CG = therefore, CNG methylation important in silencing the SUP promoter in clark kent alleles -cmt3 mutants = have restored SUP expression Along a chromosome - CG and CHG enriched around the centromere -cmt3 completely eliminated CHG but doesnt affect CG methylation

H3K9me2 and DNA methylation

CG, CHG, CHH all enriched at centromere = where transposons are but looking at H3K9me2 = exactly same places distributed in genome = CORRELATED CNG methylation needs histone methylation = if lose histone, lose K9 why is CNG methylation dependent on H3K9me2? - related to the proporties of the protein putting on the methylation (THE METHYLTRANSFERASES) -structure of CMT3 = - S-adenylmethionine = where methyl group comes from to be added -BAH and chromodomain contact separate H3K9me2 tails = contact separate tails on different nucleosomes what does this mean? - where these is histone methylation, that might recruit the DNA methylation enzyme which will methylate the DNA -if no modified histones, may not get the enzyme being brought into chromatin to methylate CHG -CMT3 combines the 2 K9s and recruits the active site to modify the DNA -explains why DNA methylation is associated with K9 why is K9 histone methylation recruited to the DNA methylation regions? turns out there is an extra level: -kryptonite also has SRA domain = DNA binding domain - -SRA domain binds methylated DNA ! -methylated cytosines are flipped out of helix and bound by SRA domain -therefore, kryptonite has SRA domain 1.SRA recruits kryptonite to the methylated DNA 2. modifies histones 3.recruits DNA methyltransferase enzyme 4. adds to stability and heritability of mark -when found SRA protein, also found in Vim1 protein

Types of case-control studies

Candidate gene studies - when have prior knowledge of potential causal genes - used in the past = smaller sample size if do a candidate gene study Genome-wide association studies (GWAS) - scan variants throughout the genome

Genetic control of methylation; Heritability

Correlations in DNA methylation between relatives largely caused by underlying genetic similarity - if look at levels of relatedness from MZ twins, parent-offspring to unrelated = LEVEL OF CORRELATION IS COMPLETELY ASSOC WITH HOW RELATED THEY RE - there is some correlation between Parent-parent (would expect 0) so suggest might be component of environment some common environmental effects 20% variation due to DNA sequence variation If look at heritability of methylation level at each site but 66% of sites show some evidence of transgenerational inheritance (h > 0) also see high proportion when heritability is 0 Distributed throughout genome

Methylation in disease

CpG island hypermethylatoin -Alzheimer's disease (NEP) -Rheumatoid arthritis (DR3) CpG island hypomethylation -Multiple sclerosis (PADI2) Repetitive sequences aberrant methylation - ICF (immunodeficiency, centromeric instability and facial anomalies)

Methylation is mutagenic

CpGs under-represented in the genome a methylated cytosine deaminate = thymine (not picked up as error) normal cytosine deaminate = uracil (recognised and repaired) reason we see less CpG = being deaminated to thymine

Mendelian diseases

Cystic Fibrosis - autosomal recessive Huntington's disease - autosomal dominant Haemophilia A -sex-linked recessive gene these were the first diseases investigated observable pattern in inheritance = pedigree analysis clear link between phenotype and mutation in a certain gene OMIM = Online Mendelian Inheritance in Man

Sickle cell anemia

HbS allele = has single nucleotide mutation from A to T (GAG to GTG = glutamic acid to valine) mutation maintained in population as carriers are advantageous in malaria regions

Epigenetic modifications (broad)

DNA methylation -cytosine -adenine in bacteria methyl group can tag DNA and activate/repress genes Histone modifications (the binding of epigenetic factors to histone tails alters the extent to which DNA is wrapped around histones ) Chromatin remodelling Histone variants Non-coding RNAs

DNA methylation and transposon activity

DNA methylation densest in transposon-rich regions if take mutants where knock out CG and CHG methyltransferase =lose a lot of DNA methylation loss of DNA methylation = transposase genes are expressed and transcribed = become more active = disrupting genes e.g.if self the plants with lots of transposons = more and more abnormal as they accumulate transposon-induced mutations therefore, DNA methylation RESTRICTS transposon activity Experiment: -southern blot wiht digested DNA from diff plants -hybridise with probe to specific transposons -WT = looks like they are 4 diff copies -met1CMT3 double mutants = getting extra copies and new copies

How to describe variation?

Differences in allele frequencies: 1. Heterozygosity 2. Fst 3. Principal Components analysis (PCA) / Clusters 4. Phylogenetic trees

Obesity (BMI) GWAS - issues

Found intron in FGO gene - associated with BMI strange = no one could find a functional relationship - not changing the expression of FTO Reason: SNPs associated with the promoter of FTO and IRX3 = homeobox further along the chromosome -SNPs controlling expression of IRX3 = those SNPs are not associated with expression of FTO in the brain happen to lie within the gene but in an intron of the gene = not the FTO gene that is causing the variation conclusion: NEED TO BE CAREFUL WHEN INFERRING FUNCTION

Processes that cause current Population differences

Genetic Drift = expect overtime to increase homogeneity - faster in smaller samples -by chance, lose haplotypes and some gain in frequency -end result = increased similarity between individuals Population split = increased variaiton between populations due to drift Bottleneck = reduced population size leads to loss of variation and increased effects of drift Founder event = new group founded by few individuals = less diversity Selection = if favorable mutation arises (or an existing one becomes beneficial) and increases in frequency

Modern Resources

Human Genome Project = 1990 International HapMap Project (2002) - genes assoc with human disease and response to pharmaceuticals -identify useful SNPs using 270 people from range of populations 1000 Genomes Project = 2008 - sequencing project (4x coverage) of 2500

Variation between humans; 1000 Genomes Project

Humans are 99.9% similar only thinking about 1% that differs 1000 Genomes Project -sequenced over 2,500 individuals -average genome differs from human reference at 4-5million sites = 99.9% of them are SNPs (indels) -individuals also have 2,000-2,500 structual variants (less but influence alot of bases) -most variants are rare (64 million automosomal with frequence < 0.5%) within a genome most are common: 40,000 to 200,000 (i.e. 1-4%) have frequency <0.5%

How was siRNA pathway identified

Identification using FWA Transgene Assay to find siRNA 1) took mutants in lots of different genes associated with different chromatin modifying pathways 2) found that: a. drm2 mutant b. rdr2 = an RNA-dependent RNA polymerase (ssRNA -> dsRNA) c. dcl3 = DICER (chopes dsRNA into sRNA molecules) d. PolIV = novel plant-specific polymerase all mutants unable to methylate FWA transgene therefor ALL NEEDED FOR DE NOVO METHYLATION these components DEFINE A SRNA PATHWAY IN PLANTS Problem:

Disease

Infectious Innate (congenital, degenerative, autoimmune) Social (behavioural, drug addiction)

Transgenerational inheritance of methylation

Inheritance of some phenotypes not explained by sequence differences suggests there might be: -epigenetic marks transmitted -incomplete reprogramming (some site markers not removed) -messenger molecules transmitted in gametes directing reinstatement of epigenetic marks Difficult to show in humans -exclude environment -exclude genetic variaiton (DNA sequence) Evidence in mice - Agouti viable yellow allele (more complicated) -Axin fused allele

Where does methylation occur?

Intergenic regions = between genes Repetitive elements Most CpG sites (60-80%) methylated NOT CpG islands = remain unmethylated to allow genes to be transcribed - methylated CpG = silences the gene (role in X inactivation) as get older - loss of general methylation but increase of methylation in CpG islands

Phylogenetic tree as Description of Variance

Maximum likelihood tree - see individuals that are geographically closely located also cluster together

Axin fused allele in mice

Metastable epiallele associated with kinky tail = developmental disorder - an allele with diff expression due to epigenetic modif that is MAINTAINED throughout lifetime SUMMARY: methylated IAP promoter = normal tail unmethylated IAP promoter = kinky tail (truncated and normal form expressed) has a repetitive element (IAP) inserted between exon 6 and 7 Mechanism - IAP has strong promoter = causes transcription of the axin gene at exon 7 on top of trasncription of the whole gene -this form of gene will have both full long axin transcript but also have truncated form = causes kink in tail -methylation of IAP promoter = wild-type axin -no methylation = truncated axin Mice that were homozygous for that axin-fused allele were taken and 2 forms: see different proportion of offspring with the kink in tail Parents with kinky tail have more kinky tailed offspring SOME MEMORY OF THE PHENOTYPE

Signatures of Natural selection

Positive selection = for advantageous variants a) hard sweeps from a favorable mutation b) soft sweeps from standing variation c) polygenic selection Purifying selection (selection against deleterious variants)

clark kent allele of SUPERMAN gene

SUPERMAN = Arabidopsis mutant - rather than 6 male reproductive organs - has 12 or more -expressed transcription factor expressing in the center of the flower -acts as a regulator of floral homeotic genes, controlling the development of the flowers of Arabidopsis thaliana plants clark kent = 9 male organs (weak version of superman) - inheritable -cross with superman gave mutants = they are on the same gene -if you self them sometimes get wild type - - didn't find changes in DNA sequence despite being heritable -if took DNA from superman and propagated in E.coli and put it back = now wild type = cytosine methylation not copied when copied into E.coli (doesn't have enzymes to maintain it) why? mutation caused by DNA methylation = EPIALLELE Wild type SUP = DNA unmethylation sup mutants = unmethylated but sequence alterations that disrupt protein clark kent = SUP DNA sequence is unaltered but DNA heavily methylated

Lactase persistence

Recent human evolution (5-10,000 years) gene-culture co-evolution in Europe - single allele causing this (13910 T) -upstream of LCT (lactase gene) -EHH > 1Mb (shorter) = evidence of sweep 9000 years ago in Africa and Middle East - several mutations identified -distinct from European -EHH > 2Mb = on-going selective sweep 3,000-7,000 years associated with large Fst = large diff in allele frequencies across populations = selective pressure that differed among populations (more pressure on populations with livestock than those without) associate with long haplotypes - because newer mutation

Breast cancer - risk

Risk assesment based on ocurrence in relatives - family history Screening - offered to subset of population Envrionment - obesity, alcohol Genetic testing - if you do carry rare variants, could get surgery

Mutation Types

SNPs = single base change Deletion/insertions Copy number variances (CNVs) Larger deletions and insertions (e.g. Alu) Inversions

Imputation

Statistical inference of unobserved genotypes Using known haplotypes in a population (HapMap or 1000 Genomes Project) = allows to test for association between a trait of interest (a disease) and genetic variances, but whose genotypes have been statistically inferred (imputed) process of finding tagging SNPs done in reverse can fill in missing SNPs - genotype at low level but go to reference populations that were sequenced (e.g. 1000 genomes project) impute up to 2 million SNPs in each individual by taking into account haplotypes present in population helps tremendously in narrowing-down the location of probably causal variants in genome-wide association studies, because it increases the SNP density (the genome size remains constant, but the number of genetic variants increases) thus reduces the distance between two adjacent SNPs.

VIM1 recruiting MET1 mechanism

Summary: The (VIM) proteins (Uhrf1 in vertebrates) recognize hemimethylated symmetrical (CG) dinucleotides generated by DNA replication and recruit (MET1) (Dnmt1 in vertebrates) to methylate these sites fully 1. Recognition of the hemi-methylated CpG site by SRA domain of VIM1 2. Recruitment of MET1 3. Transfer of hemi-methylated DNA to MET1

ENCODE

The Encyclopedia of DNA elements aim to identify all functional elemtns in the human genome When first set up database -showed epigenetic modification -putting it in an enormous database vertical columns = different experimental targets (DNA methylation, DNA binding, modified histones) Tier 1 and 2 = common Tier 3 = more specialised

Heterozygosity

The expected proportion of heterozygotes (individuals carrying 2 different alleles) at a locus also called diversity = probability of 2 alleles sampled at random are different Example: - Heterozygosity decreases the further you are from Ethiopia -consistent with idea humans arose from Africa -serial Founder effect

Haplotype

a group of alleles in an organism that are inherited together from a single parent

Clustering

Vertical lines for each individual in data clustered according to 7 subpopulations or ancestral populaiton shows: african population mostly composed of african ancestry...

a2 locus Transposon in Maize

a2 locus with insertion of transposon = red pigment a2-m1 = defective Spm transposon insert = non-autonomous can't excise as lacks one of the ends recognised by Spm transposase (inverted repeat missing - can't jump at all) A2 was active despite transposon being inserted there as long as no other autonomous one in the genome Why? = transposon inserted in an intron a2 gene expressed and intron is spliced out along with transposon = RED PIGMENT if introduce autonomous transposon in genome = encodes transposase which binds to defective transposon and blocks expression inhibits splicing/transcription = no activity = WHITE PIGMENT therefore = gene under control of sequences in the transposon

Epiallele*

allele that behaves like a mutant but no change in DNA sequence = epigenetic change

VIM1 and vim1 mutants

also has SRA protein binds hemimethylated DNA in CG context region interacting with met1 which is CG methyltransferase vim mutants - reduction of CG methylation -traditionally thought met1 recognised hemimethylated protein but no - its anothe rone (CHG occurs in cmt3 and kryptonite)

Genetic causes of disease

any mutation of DNA -inherited, familial -spontaneous -somatic small scale -point mutations (nonsense,missense, insertions, deletions) large scale -translocation, duplications, inversions non-nuclear/epigenetic

Epigenetics*

any potentially stable and heritable change in gene expression that occurs without a change in DNA sequence

Amylase as a Selective Sweep

as move away from locus - see the divergence rapidly increase with polymorph frequency if find low diversity in the region = has operated here as it is well defined = must be old polymorph and recombination has been able to break down length of haplotype

Role of methylation

cell differentiation and development -allow expression of restricted subsets of genes in each cell type -methylation in CpG rich promoters represses expression mitotic stability to maintain tissue identity -daughter cells must express same set of genes (can't have muscle where prev had muscle cell) maintains genome integrity -without DNMT1 see abnodmal karyotype with deletions, insertions and translocations -represses transposable elements (full role still to be determined)

How is methylation status inherited?

know the actual methylation itself is not inherited One idea: 1. SNP which is polymorphic in population - potential impact on methylation at a local methylation site -cis-acting: genotype at SNP influencing whether this CpG site gets methylated or not -trans-acting: SNP could be some distance from CpG site or even on different chromsome GWAS for a methylation site e.g. MHC region of Chrom 6 - methylation site located - if any SNPs in genome are significantly associated with that methylation level -Answer: YES! all located near CpG site itslef = cis-acting could do this for all methylation sites far more cis-acting SNPs identified than trans-acting (but this is only including ones that could be detected and had large effect) If look in different way = rather than looking individually, put all the SNPs into a model simultaneously and looked at variance explained found that cis-acting is actually explaining a lot less variation that trans-acting trans-acting can't be identified individually as too small

Drug addiction

large genetic component (50%) many environmental risk factors

Welcome Trust Case Control Consortium

looked at 7 diseases - Bipolar -Coronary artery -Crohn's -Hypertension -Rheumatoid arthritis -T1 diabetes -T2 diabetes genetic architecture of diff diseases not surprisingly different now increased green peaks as increased the amount of SNPs using = increased sample size

methylated QTL associated with disease

looked at 7 diseases Hypertension, Crohm's disease, Rheumatoid arthritis - our subset of SNPs are explaining more variation than randomly selected subset of SNPs Not working for = Diabetes, bipolar, coronary heart disease

Breast cancer- tests

low frequency of faulty genes = not worth yscreening whole population not all carriers of faulty genes will get breast cancer genetic tests are available to women with a high risk of faults in BRCA1, BRCA2, TP53 and PTEN

Methylation in cancer

low level of methylation = feature of cancer decreases during formation of cancer = causing activation of oncogenes CpG island methylation increases - CpG islands in promoters of tumor suppressor genes = silencing - silence suppressor gives selective advantage to cell CpG island hypermethylation Diagnosis = GSTP1 in prostate cancers, not benign growths Prognosis = miR-34b/c associated with metastasis Treatment = MGMT suitable for treatment with temozolomide Hypomethylation = removing methylation 1. in REPEATS and INTERGENIC regions -leads to genomic instability e.g. LINE family member L1 in many cancers such as breast, lung, liver 2. in CpG poor promoters -leads to activation of oncogenes e.g. S100P in pancreatic cancer

Age-related macular degeneration

major cause of visual impairment in adults over 50 inflammatory process identified genes in the complement system - Complement factor H -SNPs explain 60% population risk unusual GWAS because SNPs identified explained a LARGE proportion of population risk

Principal Components Analysis (PCA)

means of representing multi-dimensional data in fewer dimensions = combining SNP individual data into one variable = one principal component useful to identify patterns use allele frequencies or individual SNP data - have many thousands of SNPs per individual -reduce to much smaller number of variables

Fst

measure of differences in alleles frequences between populations - comparing 2 populations or more measure of population differentiation due to genetic structure 0= no difference between populations (no subdivision) -heterozygosity is the same whether pooled or averaged population 1 = complete division = very far apart - each population was fixed for a different allele Geographically closer groups are genetically more similar (lower Fst) Gst = Fst averaged over loci

How to determined causal variants

using markers throughout genome (e.g. SNPs) Linkage - within families look for co-inheritance of disease and DNA causal variant (variant contributing to the risk of disease) Association - across popualtion look for co-existence of disease and DNA variant

Bisulphite sequencing

method to determine proportion of cytosines that are methylated at a given position methylated cytosine doesnt get converted to uracil

FWA (and effect on drm2 mutants)

methylated = FWA (normal flowering) unmethylated = fwa mutant (late flowering) typically a methylated and silenced gene occasionally loses methylation and expresses (like clark kent plants occasionally lose methylation and 1 in 1000 will be WT) FWA activated = late flowering phenotype if you take FWA transgene (unmethylated) and introduce it into a plant (by transformation) = transgene gets methylated (de novo) and is silenced so plants flower normally (not late) - gets recognised probably through sRNA drm2 mutants - drm2 but normal FWA = plants look normal because FWA gene remains methylated because the copying enzymes (MET1 and CMT3) are just copying the DNA methylation -fwa (unmethylated transgene) with WT DRM2= gets methylated = normal flowering -if fwa transgene with drm2 = no longer methylated = no longer initiation de novo fresh methylation

DNA methylation in mammals

methylation of cytosine mammals have CpG dinucleotides = makes life easier symmetrical and can be maintained through cell division (some non-CpG obseved in embryonic stem cells and neural tissue) de novo DNA methyltransferases = DNMT3a and DNMT3b existing DNA methyltransferases (in hemimethylated DNA) = DNMT1 = corrects this and methylated the other cytosine

Selection

more favorable phenotypes leave more offspring

DNA methylation in plants

more sophisticated methylation than in animals have symmetric DNA methylation in CG context = why? because C on other strand so both will by methylated ALSO = ChG methylation (H= any of the 3 bases) alot of CHH = with no C on the other strand imparted by diff enzymes

Variation between Populations

most variation is within populations (accounts for 90% variance) only slightly more genetic differences between individuals from different populations than 2 unrelated within a population

Geographic variation

mostly chance is causing variation - demography more than adaptation BUT some genes are selected 1.living in high altitudes 2. metabolising lactose 3. resistance to local diseases

Migration

movement of groups to unpopulated areas or individuals between populations

Adaptation and Human Disease

natural selection driven by pathogens (rather than diet or climate) affects pathways involved in immune function Disease susceptibility and incidence -differ geographically -how much is adaptive? = some is adaptive

Multiple Sclerosis and GWAS

neurological condition found that most genes identified are involved in autoimmune pathways not neurologic degenerative identified genes already targets of therapy association with vitD metabolism -explains geographic distribution - more common in extreme latitudes where less sun -possible route of therapy

SNP chips (panels)

now use around 700,000SNPs on avg SNP chip predetermined SNP variants -selected as tagging SNPs -common e.g. the frequency of the less common alleles= MAF (minor allele frequence) is >1% may tag other variants, such as CNVs (copy number variants)

GWAS*

observational study of a genome wide set of genetic variants in different individuals to see if any variant is associated with a trait ultimate goal: use genetic risk factors to make predictions about who is at risk and to identify the biological underpinnings of disease susceptibility for developing new prevention and treatment strategies

Linkage Disequilibrium (LD)

occurrence of combinations of alleles at 2 loci more often than expected the non-random association of alleles at 2 or more loci

GWAS successes

over 3800 published GWAS 600 diseases and traits 107,000 SNP trait associations reported Early successes -age-related macular degeneration (eye condition)

Solar radiation

variable variant in KRT77 gene = temperature homeostasis when comparing different regions of the world = the frequency of this allele decreased along with solar radiation even when accounting for demography saw this pattern = selected due to solar radiation

Inferring evolutionary history

past processes cause current population diff in allele and haplotype frequencies (genetic drift, bottleneck, founder, selection..) we can use these diff to infer those past processes (to some extent) -population size changes, splitting times 1. Allele frequency spectrum within a population - the distribution of allele frequencies across all loci -can predict under various models (e.g. population size, splitting times) -identify previous events consistent with observations 2. Phylogenetic trees to explore relationships between populations - various models and methods -correlation of allele frequencies - more similar between populations that split recently than those that split longer ago Putting methods together e.g. Human migration schematic - humans arose in africa and left africa and went both west into europe and east into west world -number of bottlenecks -some admixture = indiv from both europe and africa (BASED ON CURRENT DNA and info on premodern humans = ancient DNA

PolIV and PolV

plant-specific RNA polymerases which transcribe methylated DNA (!!!) and maintain CHH methylation by recruiting DRM2 - SHH binds and recruits PolIV to transcribe methylated DNA - PolIV transcripts are made double stranded by RDR2 - Cleaved by DCL3 (Dicer-like) into 24nt sRNAs - Bound by AGO - Guide AGO to complementary scaffold RNAs transcribed from same locus by PolV - sRNA-bound AGO and PolV transcript interact - Recruit DRM2 Methylates C residues of DNA strand acting as template for PolV after every division = direct DRM2 to methylate those genes leads to silencing of transposons

Huntington's disease

progressive neurodegeneration certainty for carriers of a mutation in the huntingtin gene purely genetic factor except age of carrier = late onset

Alternative approaches to case-control studies

quantitative traits associated with disease e.g. cholesterol level, blood pressure, biochemical levels rather than looking at frequency difference, look for mean trait differences assoc with SNP allele/genotype

Genetic Drift

random fluctuations in allele frequencies over generations, due to finite populations

Haplotype blocks - choosing DNA variants to test

relies on fact that variants are associated with each other in haplotype blocks within a block = variants are in high LD (low diversity) individuals are created by mosaics of haplotypes from each of the blocks therefore = DON'T NEED ALL SNPs

Epigenetic clock

want to have indication of proportion of sites methylated = gives an indication of biological age of an individual runs alongside, but not always in parallel (one faster than the other) with chronological age may inform life expectancy predictions if plot = can see reasonably good agreement between the 2 'epigenetically old' or 'epigenetically young' 5 year increase in methylation age compared with chronological age = 21% greater mortality risk independent of other life-course predictors of aging and health e.g. possession of the e4 allele of APOE, education, childhood IQ, social class, diabetes, high blood pressure, cardiovascular disease looking at individual methylation sites: - none that are reproductively associated with age = lots of variation - direction is associated with where CpG site is - if in CpG islands, methylation level increases over age if not in CpG island = decreasing over age - 24% sites associated with age -nearly 50/50 methylation sites either increase or decrease with age Not all methylation sites behaving the same way 2 different arrangements: both show change overtime 1) epigenetic drift = change is not predictable over time - changes depending on individual 2) epigenetic clock = continuous increase over all individuals

Inheritance of DNA methylation

when DNA replicated = new strands are unmethylated but will always be a methyl group on one of the 2 strands (as symmetric DNA) if enzymes recognise hemimethylated DNA = put on new methyl group therefore, symmetry methylation thought to be mechanism for the inheritance of DNA methylation through cell division -was a big surprise that clark kent mutation is due to methylation why? because in animals DNA methylation is removed in embryogenesis and put back on = if a gene gets methylated during lifetime of organism, it is usually removed in the next generation = not heritable in plants = doesn't seem to happen - no reprogramming of DNA methylation in embryo development -changes in DNA methylation that occured in some plant that originally got clark kent is actually heritable from gen to gen

GWAS results - Manhattan plot

y axis = the test statistic saying the strength of evidence for an association higher = stronger association plotting the -log of the p-value stringent threshold account for large number of tests, here P < 5x10-7 dot for each SNP shades of blue = chromosome green dots = significance


Related study sets

Real Estate Finance: Chapter 2 Money and the Monetary System

View Set

PrepU - Assessment: Skin, Hair, Nails

View Set

Chapter 7 Work and Kinetic Energy Concept Problems

View Set

Spain, France, and England in the New World

View Set

Anatomy and Physiology Chapter 9 Bryant Miles

View Set

Unit 1 Web Development Fundamentals_HTML_&_CSS

View Set