GHD
Angelman Syndrome (AS) & Prader-Willi syndrome (PWS)
1 in 10,000-15,000 births AS: hypotonia, speech defects, hyperactivity, seizures, movement ataxia, severe retardation, excessive inappropriate laughter PWS: hypotonia, failure to thrive, hyperphagia, short, small hands/feet, hypogonadism, mild mental retardation, obsessive-compulsive mannerisms DNA mutation for both is the same diff phenotypes- imprinting of the region
Telomerase function
1) Binds to the telomere & the internal RNA component aligns with the existing telomere repeats 2) Telomerase synthesises new repeats using own RNA component as template 3) Repositions itself on the chromosome and the RNA template hybridises w/ the DNA once more CRISPR-Cas9 - can now do experiment in Euplotes, etc Before was impossible in protozoa- yeast model organism Polymerisation & hybridisation > translocation & rehybridisation > further polymerisation
Why ds DNA good choice to encode genetic info
1) Ds DNA doesn't fall apart when phosphodiester backbone nicked- stabilised by H bonds and hydrophobic interactions between faces of individual base pairs 2) Nt bases protected from aqueous phase by phosphodiester backbone 3) Double helix can be melted under physiological conditions (enzymes for DNA repair/recombination, too high temp=energetically costly) ds DNA melts at high temps- 2 ss (reversible; hybridisation/reassociation, absorption ay 260nm increases as dsDNA melts; hyperchromicity, melting temp sensitive to GC content because they contribute greater stability than AT)
Double helix stabilised by 2 types of non-covalent interaction
1) H bonds between bases 2) Hydrophobic stacking interactions between faces of the bases; stronger at high salt concentration
2 types of retrotransposons
1) Long terminal repeat (LTR): like retroviruses but cannot move between cells 2) Reverse transcriptase/endonuclease (RT/en) (non-LTR); autonomous & non-autonomous ~15% of human genome
Transposon genetics examples
1) Maize Transposons can cause unstable alleles; high rates of reversion Bz; bronze locus in maize Bz normal, bz point mutation, bz-m transposable element insertion 2) Drosophila
Reverse transcriptase; 3 activities
1) RNA template mediated DNA synthesis; reverse transcription- RNA primed (but can use a DNA primer). Key activity in biotech for making cDNA 2) DNA template mediated synthesis; RNA primed 3) RNAase H~ (hybrid) degrades RNA in RNA/DNA Hybrids; only in ds endonuclease. Liberates short oligonucleotides DNA pol mediated only
Gene amplification in whole organisms
1) Ribosomal DNA amp in Xenopus oocytes involves an extrachromosomal rolling circle 2) Maturation of micronucleus in ciliated protozoa e.g. Tetrahymena Macronucleus- production of mRNA (proteins). Maturation involves lots of modification (gene amp). Micronucleus involved in meiosis. 3) Chorion gene amp in Drosophila nurse cell Adaption for massively increased rates of RNA synth often from a few particular genes
Theories for explanation of extra genomic DNA
1) Selfish DNA- doesn't explain why tandem repeats are also bigger 2) Bulk DNA- adaptive change to increase nuclear vol & cell size; not tested at pop. level and why not adapt to other types of variation? 3) Metabolic cost: no correlation between cell division rate and genome size, bac cells often contain chromosome present in nested set of many copies, DNA only constitutes 2%-5% of dry weight of cell 4) Competition between genome growth (maladaptive) and selection THE EXTRA DNA IS A MUTATIONAL LIABILITY 5) Power of natural selection a Ne Although the fraction of the human genome that encodes protein is small, the functional elements of genes can extend over mega bases so variants affecting gene expression may occur Mb from any gene
Activation induced deaminase AID necessary for production of high affinity IgG in mammals and birds and for class switching in all vertebrates
1) Somatic hyper-mutation (SHM); necessary for high affinity antibodies Increases mutation rate range of affinity for antigen > greater response > bind/present to T-cell Selection > low affinity apoptosis > affinity increases over generations e.g. plasma cells 2) Class switch recombination (CSR) 3) Gene conversion in birds Range of antibodies 1 active gene > damaged > repair with pseudo-gene template (gene conversion of pool of pseudo-genes)
Main telomere functions
1) allow replication of extreme ends of the linear chromosome 2) protest the ends of the DNA molecule- stop end-end fusion
2 types of programmed gene amp
1) extrachromosomal - does not permanently affect genome of dividing cells or germline 2) intrachromosomal - alters genome; incompatible with chromosome segregation i.e. "dead-end" cells that do not divide (Drosophila follicle cells are also endopolyploid) but are specialised for high rates of chorion RNA/protein production
Selection works poorly in population of low effective size; low Ne
1) if natural selection favours increased body size it will also favour a reduced pop. size, this will reduce efficiency of natural selection and favour the accumulation of mildly deleterious variation such as increase in non-coding DNA 2) if natural selection favours reduced- will also favour increased pop. size, which will increase efficiency and reduce genome size S>1/Ne: selection effective, S<1/Ne allele fate determined by drift LARGE SIZE; LARGE GENOME SMALL SIZE; SMALL GENOME
Genetic consequences of transposable elements
1) insertion of transposable elements may activate or disrupt a gene upon movement 2) can promote rearrangements by providing dispersed regions of homology
Transposons are useful as experimental mutagens; "transposon tagging"
1) put a transposon into a genome 2) get it to move by turning on transposase 3) select for your mutant; for example an auxotroph 4) seq the mutant & find where the transposon has landed 5) you have your gene w/o having to know anything about it before chuck transposon randomly lots of times
Neutral Indel Model
1) repeats aren't subject to selection 2) ~2% functional seq conserved between human and mouse 3) the proportion of constrained seq declines exponentially with evolutionary divergence
2 types of DNA transposons
1. Excisive mechanism e.g. Tn5, Tn10, P elements (cut & paste, not copy) 2. Replicative mech e.g. Tn3, bacteriophage M Look diff, similar mechanistically
Sister chromatid exchange dangerous for circular chromosomes
1. Replication 2. SCE 3. Bridge 4. Breakage 5. Replication/repair 6. Fusion Can cause formation of a dicentric chromosome that undergoes a Breakage Fusion Bridge cycle Likelihood of an SCE increases w/ chromosome size An uneven no. of SCEs on a circular chromosome will cause formation of a dicentric chromosome & trigger a BFB cycle Large circular chromosomes- unstable However, SCEs on linear chromosomes don't change their structure- more stable Unequal SCE leads to duplications and deletions (accumulation of repeated seqs (tandem and dispersed) and expansion of eukaryotic genomes) BFB cycles can occur in humans cells if they lose a telomere ******** REVISIT DIAGRAMS
ORF2 does most of the work
150kd protein Includes endonuclease & RT domains RT uses a DNA primer of LTR retroposons Target primed reverse transcription Primer generated by endonuclease activity Needs primer/template
RNA editing Apolipoprotein B (ApoB)
2 forms (Same RNA but different proteins due to RNA editing) -ApoB100(%): 512kda synthesised in liver 2 domains; lipoprotein assembly & LDL receptor (slower), LDL (Low Density Lipoprotein) & VLDL (Very), transports lipids to liver from tissues, 36 hr cycle, CAA -ApoB48(%): synthesised in small intestine, lipoprotein assembly domain only, required for synthesis of chylomicrons by gut (from gut to liver), 3 hr cycle, glutamine codon (CAA) 2153 > UAA by APOBEC-1 triglyceride in both 11 APOBEC genes in human All APOBEC3, AID & APOBEC1 have demonstrated ssDNA C to U activity (deamination)
How can cut and paste transposons increase in copy number when the mechanism is conservative?
2 ways in which excisive transposons spread 1. jump ahead of replication fork 2. jump in G2 & use sister for repair a) transposition > replication b) transposition > DSB (double strand break) Interruptions in the DSB repair process can lead to defective or degenerate elements DIAGRAM Degenerate transposon in the AC/DS system Activator (Ac) element Poly A site Have repeats but not active transposase Dissociation (Ds) elements Incomplete transposase gene Degenerate transposons Often have the wheels (terminal repeats) No engine (transposase) Sometimes both wheels and engine not working; they're stuck True of both replicative/excisive Ac Ds element is an example of a degenerate transposon: Ac (activator) can mobilise Ds (dissociator) SINEs are retrotransposon version (LINEs)
Ds DNA
H bonds between bases, two strands anti-parallel arrangement, wrapped around each other in plectonemic coils
ENCODE (Encyclopedia of DNA Elements)
2003 National Human Genome Research Institute (NHGRI) Project to identify all functional elements in human genome seq Pilot phase then tech development phase BROADLY 10% FUNCTIONAL (Protein coding 1.22%, bound by transcription factors 8.10%) Lindblad Toh et al: 5.4% mammalian genome in total shows evidence of constraint 30% constrained seqs associated w/ protein coding 29.7% intronic 38.6% intergenic IF SEQ COMMON IN ALL SPECIES ASSUMED ESSENTIAL TO SURVIVAL (OTHERWISE MUTATED)
Cancer genomes have mutational signature that can be ID'd by analysing DNA seqs of cancers
21 breast cancers: mutations at 5MeCpG, characteristic of Apobec3B (other types of signature, e.g. sunlight (melanoma), tobacco smoke (lung cancer)) Rainfall plots show signature B mutations are frequently clustered: KATAEGIS (have characteristics of Apobec3B products- arise from its action on ss DNA)
L1 transposition in cancer cells
290 samples analysed; 2756 somatic L1 retrotranspositions 53% of samples had at least 1 event 24% events mobilise 3' seqs of which ~50% mobilised unique seqs alone & no L1 seqs 95% events from 72 L1, 50% from only 4 "hot" L1s 5% of events arose from elements that were products of cancer transposition
HIV therapy
3 key targets of anti-retroviral drugs: 1) reverse transcriptase 2) integrase 3) protease Many drugs that inhibit these have been developed: NRTI; nucleotide RT inhibitor NNRTI; non-nucleotide RT inhibitor INSTI; integrase strand transfer inhibitor PI; protease inhibitor Early on just use individual inhibitors but resistance emerged rapidly so use cocktails of 3 drugs; NRTI+NRTI+INSTI (or NRTI+NRTI+PI) e.g. Triumeq or Stribald
PCR based mutagenesis
4 primer method allows a specific change to be introduced somewhere in the middle of a PCR product uses 4 primers in 3 PCR reactions cheap/straightforward E.g. AT -> CG primers & reverse primer w/ mutant base incorporated PCR amp mutation incorporated at 1 end frag half size whenever use primer- introduces base 2nd PCR reaction introduces desired mutation 1 end mutation at 1 end of DNA frag- other half of original frag overlap (CG)- put together; longer seq however, can pair w/ wrong partner denature the 2 mutant half products & allow the strands from each to base pair via region of overlap extended by DNA polymerase primer starts synth (Klenow polymerase starts it) synth missing DNA regions using Klenow amp the large frag containing the mutation using the original forward primer & the second reverse primer
Targeting to 5'UTR
5'UTR of specific mRNA will base pair w/ inhibit translation start AUG- initiates translation can block translation w/ morpholino binding to 5'UTR- between cap & AUG interacts w/ 5' cap (modified gene nt) specifically blocked
Activity & position of 82 intact L1 elements in human genome
6 L1's responsible for majority of new transpositions; all members of L1-Ta subfamily Bigger=more active Only a few members move around
Immunoblots (Western blots)
= amount of protein ab- how much bound amount of product depends on product coloured- easy to detect - 2ndary linked to enzyme - labelled whole genome approach how highly expressed? measure protein corresponding to gene 1) separate proteins using SDS PAGE (denatures proteins) - gel electrophoresis separates proteins SDS = detergent- denatures coats/proteins (neg charge) 2) transfer proteins to nylon membrane by Western blot (detect DNA&RNA via hybridisation- proteins need antibodies) 3) add an ab produced in mice to recognise a specific protein (primary ab) 4) add a labelled rabbit anti-mouse ab (2ndary ab- mouse, unlabelled; recognises mouse ab, binds to primary on nylon membrane) to detect the bound primary ab DETECT LABELLED AB 2ndary ab linked to enzyme (alkaline phosphatase, horseradish peroxidase)- detected by adding substrate
Polytenization
A kind of endopolyploidy w/ in-register chromatid pairing
Endoreduplication, endomitosis & endocycling
A) Cell division cycle G1 > S > G2 > M B) Endoreplication Endomitosis: G1 > S > G2 > skip M, keep cycling Endocycling: G > S
Cytidine deaminases as mutagens
APOBEC enzymes mutate DNA and contribute to the development of tumours and tumour drug resistance
Retroviruses
HIV = retrovirus Discovered 1908-11 - viruses that caused leukaemia/sarcomas in chickens 60s - established they have RNA genome, called RNA tumour viruses Temin's proviral hypothesis- must get into host genome 1970 - found to contain reverse transcriptase (converts seq info in RNA to DNA)
HIV closely related to retrotransposons
HIV, AIDS: 35mil deaths New infections in decline due to provision of anti-retrovirals. Provision variable, e.g. Russia infection rate increasing.
Consequences of polyploidy
Advantages over diploid relatives: Increased cell size- larger fruits, flowers and organism Advantageous characteristics from multiple species (allopolyploids) (immediate hybrid vigour) Whole genome duplications allow long term evolution of genetic novelty Possible disadvantages: Sterility (problems forming "balanced" gametes during meiosis), esp. for odd-numbered polyploids
DNA transposons
All transposable elements have TSDs (Target Site Distributions) UTR- UnTranslated Regions DR<-ITR--transposase--ITR->DR Retrotransposons Autonomous LTR -LTR-->Gag-Prt-Pol(RT&EN)-Env-LTR--> HERV Non-LTR TSD-5'UTR--ORF1--ORF2(EN/RT/C)--3'UTR-TSD L1 Element Non-autonomous TSD------TSD (LR) Alu Element TSD. <--Alu-like---VNTR --SINE-R--TSD SVA Element ??????
Real time quantitative PCR (qPCR)
Allows the synthesis of a PCR product to be followed over time Important in precise quantitation of the amount of starting template Sensitive gene expression analysis Kinetic approach Early stages while still linear Measures how the amount of product changes w/ each PCR cycle Can use a reporter probe oligont in the PCR reaction Emits fluorescence in proportion to the amount of PCR product present Fluorescence measured & followed as the reaction proceeds OR - Sybr green - binds only ds DNA - fluorescence Reporter probes for real time PCR Dual labelled probes e.g. TaqMan have fluorophore at 5' end, quencher at 3' end Fluorophore released by 5' exonuclease activity of Taq during extension cycle (emits fluorescence on excitation) The cycle at which the amount of fluorescence exceeds a set threshold is the C(T) value The CT value can be converted to amount of starting molecules using a standard curve
Basic minimal insertion sequence structure
Also called simple transposons 1) ends-inverted repeats: genetically required, in cis 2) TNP (transposase): genetically required, trans-acting 3) short direct repeats; generated at time of insertion; not part of transposon *** SDR - end - TNP ORF - end - SDR
HIV more complex - VIF which confers resistance to APOBEC
Also important additional proteins- vpr *rev* vpu *tat* nef
Polyploidy
Alteration in the no. of chromosome sets per cell No. of chromosomes in a basic, ancestral set (x the monoploid number) (n the haploid number is no. of chromosomes in gamete n=23=x for human (diploid)) According to no. of chromosome sets in somatic cells: 2x=diploid, 3x=triploid, etc E.g. wheat, hexaploid, has 42 chromosomes per somatic cell (6 sets of 7 chromosomes, i.e. x=7) Wheat gametes contain 3 sets, haploid chromosome no. n=21 (n doesn't = x)
Non-autonomous non-LTR retrotransposons
Alu elements related to signal recognition particle RNA Otherwise almost always derived from tRNA Referred to as SINE elements Prominent in CG rich regions of human genome One new Alu insertion per 20 births (human/chimp comparisons) In human Alu elements; 280bp seq that lacks introns 1mil in genome Rely upon other mobile elements to integrate & PolIII for transcription, internal promoter
Studying gene function using RNA interference
Andrew Fire & Craig Mello Nobel prize 2006
Diagnosis of chromosome abnormality
Array analysis- will miss balanced changes Karyotyping- will miss small FISH- specific probes for specific analyses
DNA-based methods for clinical analysis of chromosomes
Array-based genomic hybridisation (lots of data) Uses molecular technique to measure DNA copy across genome (signatures) Can identify gains & losses not visible by karyotyping Can use comparative genomic hybridisation (array-CGH) or analysis of signal strength within each sample PCR for trisomy
Klenow polymerase
Artificial- less stable, heat sensitive DNA polymerase I cleaved with subtilisin removes 5' to 3' exo activity (commercial enzyme produced by expression of truncated gene in E.coli) Heat sensitive- inactivated if left at 75 deg for 20 mins Error rate- low, high fidelity enzyme DNA poly I holoenzyme (3 domains): polymerase domain, 3'->5' exonuclease domain (can degrade nucleic acid strand, degrade different ends, proofreading- if mistake, degrade + resynth), 5'->3' exonuclease domain -----> proteolysis results in large fragment (Klenow frag used in seq reaction) consisting of poly domain and 3'->5' domain, and a separate 5'->3' domain
High throughput Sanger sequencing
Automated gel sequencing - 36 samples per gel (each gives 450bp of sequence) - 9 gels in 24hrs - about 150kb Capillary sequences use fine capillaries instead of gel - 96 capillary machine - 750kb per day
Polyploids may form 1 of 2 ways
Autopolyploids (all chromosomes from same species; 4x=AAAA) by spontaneous premeiotic endoreduplication e.g. strawberry Allopolyploids (chromosome sets from diff species; 4x=AABB) by interspecific hybridisation and endoreduplication diploid species neither has pairing partner hybrid would be inviable w/o endoreduplication (we need it to get 2 copies of each AB -> AABB) functional gametes - need pairing partner @ 1st division of meiosis Parents closely related A and B sets often partially homologous - homEologous e.g. bread wheat AABBDD
Types of double helix
B: ds DNA mostly (0.34nm/3.4 Angstroms per bp, ~10bp per turn, 3.4nm spacing per turn, major/minor grooves for seq specific protein binding- proteins such as transcription factors bind to/recognise specific seqs in ds DNA by binding to major groove) A: ds RNA and RNA-DNA duplexes Z: left handed (alternating purines and pyrimidines, e.g. GCGCGCGC, biological function unproven)
How much of genome is functional?
Biochemistry: what binds, what is transcribed Bioinformatics: what is conserved or constrained and mutates less frequently than expected (divide genome into windows & identify those with excess of conserved residues as compared to local ancestral repeat) NIM suggests: 8.2% constrained by purifying selection, ~5% shows evidence of lineage specific constraints, only 2.2% constrained in both mouse and human GENERAL RULE: 10% functional, ~1.5% coding
Nucleic acid hybridisation is fundamental to biology & molecular genetics
Biology: mRNA translation-tRNA, codon~anticodon Molecular genetics: PCR, southern blotting, FISH (fluorescence in situ hybridisation), DNA sequencing
Tetrahymena telomeres work in yeast- things get weird
Blackburn & Szostak made linear plasmid w/ Tetrahymena telomeres & put into yeast; the DNA at end no longer Tetrahymena but new Selectable marker gene+ori (ars) flanked by telomeric DNA (TTGGGG) Put into yeast > gene flanked by yeast telomeric DNA (TG1-3)n - no template to mediate; special mechanism Applicable to all eukaryotes Mechanism = telomerase Template = telomerase RNA
RNA came before DNA
Both can self-assemble on basis of complementarity and store info but only RNA catalytically active RNA world - RNA performs both catalysis as starting point for life (DNA later) EVIDENCE: 1) DNA precursors made from RNA precursors by ribonucleotide reductase 2) 2 different thymidylate synthetases that methylate dUMP in many organisms (suggests T evolved twice) ThyA; ThyX is coupled 3) PBS1 & PBS2 phage of B.subtilis contain U instead of T in their DNA Ribonucleotide reductase; reduces ribonucleotides to DNA (removes OH)
Cytochrome c oxidase- mitochondrial heteroplasmy
COX (cytochrome c oxidase) +/- comes later in life- loss of vision mum > child (mitochondrial DNA)
DDE transposons (class best characterised)
CUT OUT/PASTE IN Original seq Cut out transposon Paste into target seq DDE motif in transposase: aspartic acid... aspartic acid... glutamic acid Schematic of Tn5 action; mech I Excision > integration > repair enzymes > duplication of target seq Tn5 transposon DNA between donor DNA > transposase binding > cleavage > transposase + target DNA > target capture > strand transfer ****** Source of target site duplication "TSD" mech II Target DNA > transposase > nicks > transposon > DNA polymerase
Advantages of RNA-seq
Can accurately measure transcription of poorly expressed & v highly expressed genes (wide dynamic range) - microarrays (less accurate at high freq) & chips have a much narrower range (saturated) Can distinguish between alt spliced mRNA variants from the same gene Can distinguish between transcripts from diff alleles of the same gene
Gene amp cancer
Can occur in cells in culture & cancer cells; amplified genes exist as either small extrachromosomal regions (double minutes- signs of cancer) or integrated as (homogeneously staining regions) V useful for protein production Double minutes- unstable Homogeneously staining region- stable Oncogene amp often occurs in solid tumours MYCN oncogene amp 20% neuroblastomas; visible as HSRs (homogeneously staining regions) EGFR (receptor): 40% gliomas; as double minutes ERBB2 (profactor gene) in breast, ovarian and gastric cancers
CRISPR-Cas9 syst in Strep
Cas9 gene > Cas9 protein w/ RuvC & HNH domains (each domain cut 1 strand) each ss nuclease >>> direct repeats (CRISPR repeats- identical in seq) w/ spacer DNA corresponding to seqs from foreign DNA- usually phage sea RNA products containing spacer seq + part of the direct repeat (common seq at end of each) invading DNA seqs captured & incorporated into the genome as spacers between CRISPR repeats 1) the spacer is transcribed to RNA which also contains part of the CRISPR repeat- crRNA 2) base pairs w/ tracrRNA which contains a complementary seq to the CRISPR repeat segment 3) paired RNAs interact w/ the Cas9 protein to form the active nuclease 4) the active nuclease is then targeted to copies of the invading DNA (e.g. phage) by base pairing between the crRNA & 1 strand of. DNA (cuts DNA w/ Cas9 protein)
Cytidine deaminases
Catalyse cytidine deamination and play important roles in: RNA editing Innate immunity to retroviruses and transposable elements Adaptive immunity: immunoglobulin gene diversification Cancer; unregulated expression generates general mutator phenotype in tumor subclones
When retroviruses integrate
Cause wide variety of diff mutations Insertion of a retrovirus can be an important genetic event e.g. tutor, hairless mouse, insertion of retrotransposon upstream of duplicated amylase gene in primates
CGH (comparative genomic hybridisation)
Cells from normal control > extract DNA Cells from patient > extract (target) DNA BOTH Label w/ 2 diff fluorochromes > mix in equal quantities, hybridise to microarray of clones (cohybridised against array) > read red & green fluorescence (fluorescence finds corresponding target) > work out red:green ratio for each cell; align to database of clones e.g. red/green- yellow If red, loss of green from patient Used to identify gain of loss of chromosomal regions Array analysis can also demonstrate small changes
Ploidy
Chromosome no. within a cell/within the cells of an organism
Chromothripsis
Chromosome shattering Looks like catastrophic fragmentation & rearrangement of 1 chromosome, usually w/ loss of material Infreq in population Most e.g.s seen in cancer genomes (~2% of all cancers)- chromosomes scrambled Can also occur in normal cells: rearrangement of chromosome 2 led to cure of WHUM syndrome (lost CXCR4 gene mutation- lost segment containing harmful mutation) Original chromosome seq > catastrophic breakage > rearrangement > lost chromosomal material
Numbering of human chromosomes (Paris nomenclature)
Chromosomes numbered according to (decreasing) size (except 21&22- 22 slightly bigger) Metacentric, submetacentric or acrocentric Short arms labelled p(ettite) Long arms labelled q Arms labelled outwards from centromere Satellites- short arms of 13, 14, 15, 21 & 22 (all acrocentric). Contains rDNA & repetitive seqs (all contain tandem repeats rDNA)
DS DNA
Complex viruses; Herpes, smallpox, T4, lambda, as well as cellular life
So telomerase...
Contains RNA & protein component RNA component: TER, TERC, TR- telomerase RNA Variable in length, contains 1.5 copies of the C(y)A(x) telomere repeat, >1kb in yeast Acts as template for T(x)G(y) synth Protein component: (TERT) In vitro: only TER & TERT (EST2p) required for activity In yeast in vivo EST1p & EST3p also required In humans: NOP10, NHP2, GAR1 & dyskerin
Non LTR retrotransposons mediated by RT/en
Copy out/copy in Directly copied into genome Doesn't make ds DNA
Reverse transcriptases
Copy seq info from RNA to DNA p123 & Est2p compared w/ RT domains w/ those of other RTs Mutate motifs (which look like RT) - put back into yeast - look a phenotype - if functioning as RT: short telomeres - finite lifespan p123 = EST2 = TERT Most closely related to RT of non-LTR retrotransposons & uses a DNA primer Mutations in the RT domain of EST causes yeast to senesce- YES RT
Overview of reverse transcription
Critical role of R (repeated) seq 5'CAP/R/U5-PBS---viral RNA---PPT-U3/R/AAA3' ~reverse transcription~ U3/R/U5---viral DNA---U3/R/U5 PBS; primer binding site PPT; poly-purine tract
APOBEC 3D, F, G & H restrict HIV propagation
Cytidine deamination restricts propagation Deamination > Uracil DNA glycosylase (removes U) > endonuclease > severed viral cDNA ss OR deamination > RTase (2nd strand cDNA synthesis, fixation) > replication > ds DNA has T (instead of U) APOBEC3G exerts innate antiretroviral immune activity against retroviruses, most notably HIV, by interfering with proper replication. However, lentiviruses such as HIV have evolved the Viral infectivity factor (Vif) protein in order to counteract this effect. Vif interacts with APOBEC3G and triggers the ubiquitination and degradation of APOBEC3G via the proteasomal pathway.
DNA pyrimidine bases
Cytosine (carbonyl ===O, amino group NH2), Thymine (HN, carbonyl, methyl (CH3) Uracil (same as thymine w/o methyl)
FISH analysis
DNA clone > fluorescent labelling & denaturation chromosome preparation on microscope slide > denature DNA in situ BOTH COMBINE > fluorescent signal on chromosome fluorescent probes hybridise w/ chromosomes
Methylation of mammalian genomes
DNA methylation pattern is erased in the early embryo & re-established at abt the time of implantation, followed by specific alterations in methylation e.g. de novo methylation & repression of genes necessary for pluripotency (specialisation) X inactivation some seqs escape this de- & re- methylation process (e.g. imprinted genes) monosomy incompatible w/ life X dosage balance- male/female diff drosophila turn up X in males, mammals opposite
Transposons in human & mammalian genomes
DNA transposons 2.4-2.7% of human genome Members of Mariner/TC1 class are present. Also found in insects suggesting that horizontal transfer has occurred in the past Not active in most mammals Exception maybe vespertilionid bat
Assaying for L1 activity; a Trojan horse assay
Determination of L1 retrotransposition kinetics in cultured cells ***L1 element put in marker gene w/ intron - marker has opp. orientation to L1, intron has same orientation. Splices out RNA Transcribes reverse orientation of EGFP Transcription/reverse transcription & integration Now w/o intron Expression from integrated genomic EGFP EGFP antibiotic resistance EGFP cassette- now codes marker function, integrated/expressed Reverse transcription- RNA intermediate Transfect HeLa cells Selection Fluorescence activated cell scanning (if grows/glows) Determine if transposition occurring If right order- transposed, express proteins
The sequencing reaction
Dideoxy seq Dideoxynucleotide triphosphates (Ps involved in way Nts added, H allows to sequence DNA with this method) Template DNA Klenow fragment of DNA polymerase I (from E.coli, used in DNA replication, synthesises DNA by replacing RNA primers w/ DNA) Sequencing primer (start DNA synth) 4 reactions - GATC (identify position) Include one dideoxy NTP in each reaction 2P lost - end of ss DNA No 3'OH chain terminates (chain terminator) OH > H blocked Would be paired with partner template strand Extended by DNA poly Template DNA & primer Klenow in eppendorf tubes dGTP (ddCTP), dATP (ddTTP), dTTP (ddATP), dCTP (ddGTP) (one radioactively labelled for each of the 4 reactions) Nested set of fragments Dideoxy specific in each DNA Continue about 500-600nts Complementary bp- if incorporates dd - TP, synth stops Denature (separate radioactively labelled from template) then analyse by polyacrylamide gel electrophoresis, x ray film (audiograph) High resolving factor, diff positions on gel GATC in diff lanes Primer reading gel from bottom up gives seq from primer into unknown DNA *********** LOOK AT BOOKLET Template DNA Early methods used ss DNA Obtained by inserting DNA into M13 or phagemid vector Vector transferred to E.coli Ss DNA made in E.coli
Genome size scales with complexity of organism when many organisms are compared: but large metazoa are outfitters and pose "C value paradox"
E.coli - 4.5 (genome size Mb) S.cerevisiae - 15 C.elegans - 100 D.melanogaster - 180 Human - 3000 Lillies - 90,000 Lungfish - 139,000
Bases, nucleosides and nucleotides
E.g. Adenine (A), Adenosine, Adenosine monophosphate (AMP), Deoxyadenosine, Deoxyadenosine monophosphate (dAMP)
Eukaryotic genomes not always bigger than prokaryotic
E.g. EUKARYOTE Cryptomonad nucleomorph 551kb 3 linear chromosomes, 531 genes, 1 pseudogene, coding regions of some genes shorter than homologues of free living organisms, most genes needed for self-perpetuation of nucleomorph: histones, DNA replication, spindle, cell cycle control; no complex regulation needed PROKARYOTE Eukaryote parasite Encephalitozoon cuniculi (neurological pathogen of rabbits) Intracellular fungal parasite: 2.9Mb 11 linear chromosomes 1,997 protein encoding genes: 85% shorter than years, mainly concerned with transmission of genetic info
Chromosome elimination
E.g. sciarid flies where paternal chromosomes are selectively eliminated at 3 stages of life cycle
LTR retrotransposons are similar to retroviruses but lack envelope proteins
E.g. yeast Ty elements contain gag & pol but not env genes Life cycle- LTR retrotransposon w/ cDNA > integration > transcription in chromosomes/nucleus > mRNA > translation > VLP formation & reverse transcription Seq organisation like retroviruses Not active currently in human genome but evidence for recent activity in so far as endogenous retroviruses (HERV) are present in human genome at sites from which they are absent in higher primates Empty LTRs arise by homologous recombination and some are unique to human 7% human genome Early transposon element (Etn) active in mice & have caused several lab mutants- probably mobilised by retroviral infection bc no autonomous LTR retrotransposons found in mice
General approaches
Early methods: 1970s & 80s - use ss circular template DNA & a complementary primer pair containing the required mutation Extend the primer to copy the template using DNA poly (probs Klenow) (before PCR was available) ds DNA w/ 1 mutant strand Introduce into E.coli & allow to replicate- approx. half of the resulting molecules will have 2 mutant strands wt in E.coli, mutant in vitro In practise never works this well- so select against replication of the original non-mutant DNA strands Lose mutation
Cassette method
EcoRI, Pst1 (restriction enzymes) Cut target plasmid DNA, replace w/ mutant DNA Synthesise 2 complementary oligonucleotides Ligate cassette to cut plasmid, pair artificial strands Don't usually have 2 nearby restriction enzymes in practise
Summary that illustrates roles of the ENDs and transposase. Genome w/ 2 transposable elements which are either wt or lack either the inverted repeats (end-) or the transposase (tnp-)
Element 1. E2. transposition? wt wt 1 & 2 ends- ends- neither tnp- tnp- neither ends- wt only 2 tnp- wt 1 & 2 tnp- ends- only 1 ENDs acts in cis (same) & TNP acts in trans (diff) Needs ends to recognise- no ends, won't transpose
UBE3A
Encodes ubiquitin-protein ligase loss of function mutations in 1-20% sporadic AS cases account for most familial patients disease only associated w/ maternal inheritance protein product adds ubiquitin to wide range of proteins targets them for degradation UBE3A expressed only from maternal allele in brain specifically hippocampal neurons & cerebellar Purkinje cells could explain the neurological clinical features mice lacking maternal UBE3A show neurological defects: impaired motor skills & spatial learning, abnormal EEG ONLY GENE INVOLVED IN AS(?) all classes of AS mutation share "cardinal features" of syndrome large deletions- more severe clinical presentation suggests haploinsufficiency for co-deleted genes that aren't imprinted may contribute to phenotype
Endoreduplication
Endocycling- a form of endoreplication in which the cell avoids mitosis entirely. Endocycling often results in the production of polytene chromosomes containing many sister chromatids bound together. E.g. trichomes, fly salivary glands No cell division- want lots of enzymes. Many polyploidy- lots of chromosomes, e.g. amylase (fly) Endomitosis is a form of endoreplication in which the cell undergoes some aspects of mitosis but fails to complete the process. If mitosis aborts between anaphase A and anaphase B sister chromatids separate but are encapsulated into the same nucleus. If mitosis proceeds a bit farther, multiple nuclei form in the end-replicating cell. E.g. megakaryocytes (platelets), hepatocytes
Life cycle of retrovirus, e.g. HIV
Endocytosis > fusion > core release > reverse transcription > transport > provirus integration > transcription > in host genome > translation > vision maturation
In some species changes in chromosome structure/no. is part of normal development; programmed
Endopolyploidy Polytenization Chromosome elimination/diminution Gene amplification
Gene amp III: amp of chorion genes in Drosophila oocyte nurse cells
Follicle cells Chorion gene region Replication origin (repeated re-use of origins (escape from re-replication controls) ???????????
Transposons
Found in both pro&eukaryotes 3% human genome; no activity in last 37mya but 23,000 elements inserted between 40-63mya Carry genes encoding proteins required for transposition but often need host encoded proteins e.g. DNA polymerase/gyrase Can promote rearrangements directly (transposition event often associated w/ ds break which can trigger BFB cycle) or indirectly (regions of homology that can act as sites for recombination- better characterised in bac than in eukaryotes)
Epigenetics
Heritable change in phenotype that doesn't involve change in DNA seq DNA methylation proposed as one mechanism E.g. tortoiseshell cat
Assay for telomerase allowed purification; start w/ primer
(TTGGGG)4 (primer) + d*GTP + dTTP Think primer telomeric DNA - need activity that produces it - think DNA polymerase w/ primer in there - need assay - need primer ss DNA mol (TTGGGG)4 - mashed Tetrahymena & added primer- radioactively labelled d*GTP & d*TTP - enzyme capable of making telomeric DNA - incorporate d*GTP into polymer extension of initial (T2G4)4 primer - many (T2G4)4 - 4 labelled radioactively - size fractionate gel electrophoresis (products, DNA larger than (T2G4)4 - longer strands than shown (telomerase elongated it) - ID of telomerase incorporation of DNA radioactivity - glow on petri dish ????? Products: (TTGGGG)4(TTG*G*G*G*)n Tetrahymena syst - make many telomeres - macronuclear development Long radioactive G rich strand of telomeric DNA; size fractionate by gel electrophoresis d*GTP = radioactive dGTP Critical observations: A) enzyme B) EST mutants
Telomeres
(ends) diff from centromeres (sometimes near middle) Cap the ends of chromosomes Physical ends of eukaryotic chromosomes Consist of specific DNA seqs & proteins Key structures for cell survival
Life cycle of retrovirus - LTR retrotransposon omits infection
*** 1) mRNA translated into polyproteins for GAG & POL which are proteolytically digested into functional proteins for viral nucleocapsid, RT and IN 2) Spliced to produce message for env polyprotein 3) Proteins make a virus particle which incorporates the mRNA and this is reverse transcribed into dsDNA upon infection gag/pol/env - ORFs encode polyproteins PBS; primer binding site PPT; poly-purine tract tRNA primer The seq of a retroviral genome reveals that it includes 3 main genes, each of which encodes a poly protein that is subsequently cleaved by PR into its constituent proteins R - US ----- gag ------- pol ------- env ---- U3 - R MA/CA/NBP. PR/RT/IN. SU/TM. MA - matrix CA - capsid NBP - nuclear binding protein PR RT IN SU - surface protein TM - transmembrane protein Inhibit protease- fuk up retroviral life cycle
How recombination between the LTRs of an integrated retro-element can generate a solo LTR
*** host genome ~retro-transposon integration~ host genome-LTR--integrated retroelement genome--LTR-host genome recombination between LTRs (homologous) Solo LTR other gets lost (fukked up/circular)
Protein production in E.coli
***DIAGRAM Multiple cloning site- short region (50bps)- restriction enzyme cut sites- DNA mutations here RBS- ribosome binding site Expression vector, e.g. circular ds DNA Selectable marker (antibiotic?) cDNA- NO INTRONS- made by coping mRNA- E.coli- no machinery to splice out introns Promoter- turn on/off
Mitochondrial genome plays important role in energy production
***DIAGRAM many proteins concerned w/ ATP production mitochondrial DNA (own DNA/translation system) individual heteroplasmy mutant COX variation- working enzyme/no functional product not epigenetics- changing genetics 16.6kb gene encodes components of OXPHOS system & 24 RNAs
The CRISPR-Cas9 syst as a gene editing tool
***Diagrams by altering RNA seq you can target the Cas9 nuclease to any seq in the genome you want to change when used as a gene editing tool the 2 RNAs are joined together to make a single mol (gRNA) guide RNA replacing 2 separate mols end of gRNA folds up on itself, incorporates w/ Cas9 protein- active nuclease gRNA containing crRNA & tracrRNA seqs in the same mol expressed in target cells the Cas9 protein is expressed in the target cells & interacts w/ the fusion RNA to make the active nuclease the active nuclease is targeted to the DNA by base pairing between the crRNA & 1 strand of the DNA
Constructing a microarray
Genomic DNA or cDNA Amplify by PCR using gene specific primers Spot PCR products onto poly-lysine coated glass slide Dry DNA onto slide Fix by UV crosslinking (lysine) Denature DNA to make ss Precise position on slide
Chromosome diminution
Germline lineages - chromosome > cell division Somatic cell lineages - chromatin diminution > eliminated DNA is enriched for germline-expressed genes > cell division > altered somatic chromosomes
Chromosome banding
Giemsa (G-banding) Dark bands (replicate late in S phase, contain more condensed chromatin, low gene density) Light bands (replicate early in S phase, less condensed chromatin, high gene content & transcriptionally active) Other types of chromosome banding: Q banding (fluorescent dye), R banding (reverse pattern to G banding), C banding (stains constitutive heterochromatin)
Consequences of movement: insertional mutagenesis by Alu's & L1's
Human: 0.27% of all human disease mutations Duchenne muscular dystrophy Type 2 retinitis pigmentosa Factor VIII of haemophilia A CF (Mouse: LTR retroposons contribute to 10% mutations)
Cytosine deamination
Hydrolytic reaction (amino group to carbonyl) Cytidine deamination potentially mutagenic, sometimes biologically exploited (Cytidine is a nucleoside molecule that is formed when cytosine is attached to a ribose ring via a β-N1-glycosidic bond.)
ChIP process
ID of in vivo binding sites for a specific regulatory protein Express epitope tagged protein in target cells e.g. yeast cells (in nucleus) want to know where sites are, bind to sites on DNA treat cells (formaldehyde- crosslinking agent; crosslinks proteins to DNA/other proteins. Crosslink TRF to all binding sites on genome > protein cross linked to DNA > break up) to stabilise protein: DNA (& protein:protein) interactions isolate DNA+linked proteins break up DNA some small frags crosslink w/ TRF, some don't- identify isolate DNA frags linked to the tagged protein using a specific ab that recognises the epitope tag reverse protein: DNA crosslinks than identify (what the frags are) the presence of individual DNA frags using PCR if not cross linked, no product presence of PCR product- detection 2 PCR primers- DNA seq in mix product- was immunoprecipitated (purified) ChIP-chip isolate DNA frags bound to tagged protein using a specific ab label & identify ALL bound frags by hybridisation to a microarray containing intergenic regions (Which spot they hybridise to)
Segregation at meiosis I in triploids
If x=1, 2 possibilities for pairing and segregation If x=2+, chance of forming balanced gamete = 0.5(to power of x-1) Random segregation Possibilities for pairing Univalent, bivalent, trivalent (difficult) CROSSING OVER E.g. commercial banana x=11; fusion of 2 balanced gametes (n+2n) is 1024 x 1024=1/mil (effectively sterile; no seeds formed) *even numbered allopolyploids can produce balanced gametes at high freq and exist as fully-fertile polyploid species propagated by seed (i.e. all the chromosomes can pair up as bivalents & segregation is orderly) ***TRIPLOID W/ 2 CHROMOSOMES IN KARYOTYPE X=2 DIAGRAM (triploid- 3, 1 in middle can go either way, 4 possibilities for daughter cells)
Retroviral insertion may activate oncogene expression in 4 ways
Importance of LTR as transcriptional activator 1) insertion leading to production of deletion mutant; transcription initiated from 3'LTR 2) LTR enhancer activating transposon 3) insertion & production of fusion protein 4) insertion downstream & production of mRNA w/ altered stability
DNA methylation in vertebrates
In human germ line transitions at CpG 18.2 higher than at non-CpG sites CpG: 2 adjacent nucleotides on same strand (palindromic in ds DNA) C of CpG 70-80% methylated in mammals Generally silences transcription; X inactivation, genomic imprinting, repression of transposable elements DNA methyltransferases convert hemi to fully methylated 5-methyl C hypermutable and CpG is found at only 25% levels calculated on basis of C and G abundance. TpG and CpA elevated. This is called CpG suppression However- 200bp-~1kb stretches of unmethylated genome, show no CpG suppression, elevated C+G; called CpG islands Bulk genome: 40% C+G, CpG islands 65% C+G CpG islands often found around promoters of genes Symmetry of CpG methylation allows for epigenetic inheritance (DMT- DNA methyltransferase flips codons)
Autonomous RT/en transposable elements
In human- L1 elements 500,000 L1's in human genome ~1kb Mostly rearranged by nested deletions of 5' ends but only ~100 are intact In human reference genome ~90 are active but 6 highly active elements have 84% of total activity; active elements often allelic/dimorphic In mouse ~3000 active Often called LINE elements Between 1 in 2 & 1 in 33 humans has new L1 insertion PolyA signal is weak & so transcription can continue to next polyA giving rise to transduction of 3' flanking seqs
Organisation of L1 elements in the human genome; a few are intact (~100), most (500,000) truncated
Intact full length elements contain a promoter and are potentially mobile Flanking direct repeats 5' truncated elements that lack a promoter & are stuck (relic elements) ORF1; nucleic acid binding protein ORF2; ENdo nuclease, & RT
More retroviral lifecycle
Integrase in nucleus- nick in host DNA > viral ds incorporated > RMA polymerase makes mRNA > envelope proteins on infected cell- embeds on membrane > 2 membranes fuse, envelopes infect T helper cells > conformational change > into host cell > protease breaks up polyproteins > form mature components for virion > all on surface released > many virions > repeat cycle
Analysing how genomes work
Isolate genome region as DNA & obtain its seq Make mutations to investigate function If region encodes a protein, express the protein to study its function If region is expressed, check expression level and tissue specificity gene using expression assays Test its effect on rest of the genome- whole genome studies Test its role in vivo- functional genomics
Chromosome no. varies between species
Jumping jack ant, females have 1 pair (males haploid) Adder's tongue fern, 631 pairs Plains viscacha rat, 56 pairs
Rates of integration
L1 (LINE) ~1/20 live births have a new L1 Alu (SINE) ~1/20 live births have a new flu Human variation
LINEs & SINEs
LINEs: L1 elements that when complete can move; include RT/en <5kb, POLII transcript SINEs: short; use LINE machinery to move; said to be non-autonomous LINEs MOBILISE SINEs
RT/en retrotransposons
LINEs: long interspersed nt elements 4-7kb transcribed by Pol II~ in human L1's SINEs: 80-400bp long transcribed by Pol III in human Alu~ in humans Alu Parasitic upon L1's
LTR retrotransposons: make an RNA copy first & then turn it into DNA using a DDE enzyme
LTR retrotransposons- transposons that've learnt how to convert their mRNA to DNA (use DDE enzyme as their integrase) CUT OUT, PASTE IN: delete previous insert COPY OUT, PASTE IN: original conserved, new (copied) Drug companies want to develop transposition activity e.g. HIV first sign0 screen for inhibitors of retroviral integrases Transposition assay, small mols
The IMPACT(tm) system
Lac system Target gene - intein - chitin binding domain expresses the required protein w/ an intein tag & a chitin binding domain at the C terminus Chitin beads in a purification column The expressed protein can be purified by binding to chitin beads in a purification column Add DTT to induce self cleavage of the intein Adding DTT (dithiothreitol) induces the intein to cleave itself & the chitin binding domain from the required protein Elute & collect protein
Morpholino structure
Like phosphodiester DNA (chemically more stable)- same relative orientation phosphorodiamidate linkage diff backbone morpholine ring ***DIAGRAM
Single gene mutations
Loss of function mutation in UBE3A gene in AS (ubiquitin) No single gene mutations found in PWS
Protein purification by affinity tag
Made protein in E.coli- purify N terminal -------------- C terminal 6x histidine residues- incorporate into protein (can include in any of these spaces but only one) DNA encode Isolate histidine by reacting w/ nickel Protein down column Side chains interact w/ nickel Nickel associated w/ it (Ni(2+) -nitriloacetic acid - spacer - resin matrix His tag interacts w/ nickel Elute protein from column His tag purification Histidine, imidazole (use to knock protein from column- competes w/ histidine to bind to nickel) kDA- molecular weight His tag purified efficiently
Mutations & protein function
Make mutant gene Introduce DNA into cells Cells produce mutant protein Test effect of mutation (want to be produced at same level as normal) in vivo Physiological expression Introduce DNA into cells Cells produce mutant protein Purify mutant protein Test effect in vitro High level, regulated expression More common
Macronuclei
Many chromosomes & telomeres Some ciliates: 25x10(6) telomeres Unusual chromosomes: v shirt & each gene occupies 1 chromosome & is amped Means for the ~20,000 genes there're ~1mil chromosomes So high conc. of telomeres
Telomeric DNA
Many repeats of a short unit; length varies between organisms TG rich on DNA strand running 5' to 3' towards the chromsome end Tetrahymena TTGGGG length = ~200bp Oxytricha TTTTGGGG = 36 bases Yeast TGGG(1-3) length = ~700bp Mice (TTAGGG)n length = ~50-100kb Humans (TTAGGG)n length = ~1-20kb
Single stranded RNA
Many viruses; HIV, flu, Ebola, Polio, measles
SS DNA
Many viruses; M13, many sea water viruses
Illumina/Solexa
Market leader ££££ Uses reversible labelled dNTP terminators (can change conditions- reverse terminate) Add DNA polymerase & incorporate 1st nt Wash away unincorporated nts & DNA poly Detect nt incorporated from fluorescent tag Cleave tag, reverse terminator group Cycle repeats Terminator stops - tag colour of base. - reverse terminator continue seq Many frags of DNA seq at same time (in parallel) Each frag attatched in diff position on a flow ell & then amplified in that position by PCR to give many copies of the same frag in the same location The seq is read for each position on the flow cell which generates multiple DNA seqs
Analysing gene expression using microarray (glass slide)
Measure mRNA levels of many genes simultaneously Cellular mRNA converted to labelled cDNA Labelled cDNA is denatured & hybridised to the microarray/chip Amount of labelled cDNA binding to each gene spot is proportional to the original amount of the corresponding mRNA Many mRNA molecules (highly expressed gene; many corresponding) -> convert to labelled cDNA -> bind to array (strong signal) Few mRNA molecules (poorly expressed, not many mRNAs corresponding) -> convert to labelled cDNA -> weak
Microarrays vs. gene chips
Microarrays (more flexible): Can be custom made by individual labs Relatively cheap Straightforward to use May be less sensitive than gene chips May be less consistent Gene chips: Only commercially available More expensive May be less straightforward to use More sensitive than custom microarray More consistent Biochip market 2010 - $3.5bil (70% gene chips) Projected $11 billion by 2018 Biochip 2015 - $7.6bil Projected $17.7bil by 2020 Market grown
Microarrays & chips
Microarrays - cDNA Gene chips - oligonucleotides Allow expression analysis of many genes simultaneously
Polyploidy and mitosis/meiosis
Mitosis: mitotic spindle binds to centromeres of sister chromatids Tension applied across 2 centromeres Separate -> daughter cells Meiosis: chromosomes need pairing partner or inaccurate or random segregation - wrong chromosome no. E.g. trisomy 21 4 daughter cells 1I . 1I . 11I . I I + 1I = 1II (trisomy)
How do eukaryotic cells replicate the ends of their linear chromosomes?
Model organisms: Ciliated protozoa e.g. Tetrahymena thermopila, Oxytricha nova (large single celled eukaryotes w/ elaborate cell structure) Yeast Mice Mammalian cells
Why don't bac show C value paradox?
Most bac chromosomes are circular Genomes with circular chromosomes; always small <10Mb; no C value paradox Linear; very large (up to 30,000Mb)- show C value paradox Large circular chromosomes are unstable, bacteria are small & have large effective population sizes so their genomes aren't going to grow as extra non functional DNA will be effectively removed by selection
Telomeres not like ds breaks so they must be protected
Muller (1941) & McClintock (38) Not recognised as ds breaks in DNA although they are Break- cell either repairs break or dies
Evidence that protein TERT works in vivo
Mutants w/ extra short telomeres 3 groups: EST mutants in yeast - EXTRA SHORT TELOMERES EST 1 2 3 (genes) Protein component of Euplotes telomerase closely homologous to EST2- encodes transcriptase Shares motifs with Its Make motifs within RT domains
HERVs (human endogenous retrovirus)
Nearly all are retrotransposition defective HERV-K (lysine tRNA complementary) polymorphic; transposed since chimp/human split Used as source of seqs for genes, e.g. Syncytin. Promotes trophoblast cell fusion that's necessary for placental development 7.7% of genome; 2.7x10(5) copies Also 0.6% LTR retrotransposons that lack ENV gene Syncytiotrophoblast is formed by cell fusion mediated by syncytin - endogenous retrovirus epithelial covering of the embryonic placental villi
Consequences of having linear chromosomes
Need special machinery to protect and replicate the ends of chromosomes ~ telomeres Replicated seqs are common in eukaryotic genomes: tandem/interspersed
All DNA polymerases
Need template & primer Proceed in 5'-3' direction
Third Gen seq systems - single molecule sequencing
Next gen seq systems generate short reads (~100-300nt) Longer seqs are assembled by piecing these together based on overlaps at ends Long repetitive regions found in multiple copies in complex genomes are a problem- length, location, etc Third gen systems generate much longer reads- higher quality genome seq Pacific Biosciences SMRT (single molecule real time) Single mol seq- long reads RSII & Sequel systems Sequel- half of all reads more than 20kb
Measuring mRNA steady state
Northern blots PCR based approaches Microarrays/gene chips & sequencing assays
Changes in ploidy
Not all cells of an organism always have the same amount of DNA and diff varieties of a single species may contain diff numbers of copies of the same genome A change in the no. of chromosome sets in the germ line can give rise to a new variety or species May also occur somatically by change & be pathological or programmed to be part of the developmental program of an organism This gives rise to ENDOPOLYPLOIDY & POLYTENIZATION Entire chromosomes may also be 'eliminated' during development Specific sequences may be removed; 'chromosome diminution' Many agriculturally important plants are polyploidy- strawberries (octoploid bigger than diploid) Also amphibians (Xenopus laevis 4N & tropicalis 2N) SIZE DIFF
Random PCR mutagenesis
Not sophisticated Error prone PCR More Mg+ See where mistakes are/effect on function Then: pool clone into vector seq
Multi-nucleate cells (temporary endopolyploidy)
Nuclear multiplication w/o cell division; germ line genome conserved e.g. oocytes
Uses of cytogenetic analysis in the clinic
Often used when child suspected of having genetic syndrome often (not always) associated w/ changes in chromosome no. or large deletions Clinical geneticist will ask for chromosome test (standard karyotype/array analysis/fluorescent in situ hybridisation (FISH)) Cytogenetics in prenatal testing- may also be used in preimplantation genetic diagnosis
Replicative transposons
Original cut of transposon only nick/only one strand at each end is initially ligated Element then replicated Produces as intermediate a "co-integrate" structure Co-integrate is resolved by resolvase (as TnpR of Tn3) & at specific site (as res of Tn3) e.g. of site specific recombination Replicative transposons include resolvase: structure of Tn3 Left inverted repeat (38bp) - transposase (tnpA) - resolvase (tnpB) - Beta-lactamase (bla) - right inverted repeat (38bp) mRNAs 2 types of mutation: Res or TnpR cause accumulation of a "co-integrate structure" DIAGRAM*** transposase in inside direct repeat of Tn out resolvase co-integrate Strand transfers in replicative transposition Key point: the 3'OH groups look like primers for DNA replication & function as such ***DIAGRAM ds plasmids, ss nicks caused by transposase, staggered break > ligation > replication of the ligated strand occurs > copy transposon, co-integrate, original transposon > internal resolution site, resolvase acts on the IRS causing ds break > site specific recombination, ds breaks In an IRS mutant the resolvase cannot bind, resulting in a build up of co-integrates Resolvase resolves the single co-integrate into 2 separate molecules Complex transposons in E.coli carry drug resistance genes around
Evolutionary consequences of non-LTR retrotransposition
Past 6myr of human evolution: 2,000L1's, 7,000 Alus & 1000 SVAs = 8Mb extra DNA Ongoing expansion creates polymorphisms; present & absence at orthologous loci useful in evolutionary studies
Treatment as prevention
Patients taking combined anti-retroviral therapy (cART) have 96% reduction in HIV transmission to uninfected partner In communities w/ high levels of HIV infected individuals the use of cART is associated w/ reduced rates of new HIV diagnoses If resistance emerges then shift to diff combination so patients need to be monitored for viral RNA in serum
Cycle sequencing (PCR-like)
Performed directly on ds DNA Used ONE PRIMER only Reactions cycled through rounds of denaturation, annealing and extension Uses dideoxy method Thermostable polymerase - Taq (modified to allow efficient incorporation of fluorescent (instead of radioactive) ddNTPs) Thermal cycle sequencing Known seq on end of unknown > denature DNA & anneal primer to template > extend primer using head stable poly, include one ddNTP/reaction (only labelled when dideoxy incorporated) > generates nested set of products > denature DNA and anneal primer to template > analyse products
Nucleic acid bonds
Phosphodiester, glycosidic 5'-3' direction (5' = 5' ribose sugar) - ALWAYS INDICATE 5' 3' ENDS IN DIAGRAM
DNA sequencing
Plus & minus method (Sanger and Coulson 1975) 50bp DNA in several days Chemical method (Maxam & Gilbert 1977) 500-600bp half day, may chemicals that interact w/ DNA (hazardous) Enzyme method (Sanger, Nicklen and Coulson 1977) dideoxycytidine sequencing (seq up to 2000, used to sequence human genome)
DNA more stable than RNA
Presence of 2'OH group in ribose sugar in RNA makes RNA alkali labile; breaks up. 2'OH acts as catalyst for hydrolysis. DNA has thymine instead of uracil. Cytidine is unstable and the presence of thymine allows the products of cytidine instability to be repaired.
Transposons summary
Pro/eukaryotes Carry genes encoding proteins required for transposition but often need host encoded proteins e.g. DNA polymerase/gyrase Can promote rearrangements both: directly (transposition event often associated w/ ds break which can trigger BFB cycle) indirectly (regions of homology that can act as sites for recombination) Better characterised in bac than eukaryotes Needs both transposase & inverted repeats to move around Intact transposons rare, defective abundant
Telomerase TRT associates w/ telomeres through specific protein interactions w/ TPP1
Probably necessary in order to regulate the process of telomeric DNA synth
Problem of insertional mutagenesis associated w/ gene therapy
Problem- strong enhancer effect of U3 often caused activation of oncogenes Removal of U3 from lTR & use of weak internal promoter solves many probs
What were the selective agents that let to the evolution of the DNA cytidine deaminase?
Probs defence against mobile elements such as LTR and non-LTR retrotransposons
How ss RNA is reverse transcribed into ds DNA
Process of reverse transcription of the retroviral genome 4h per 9kb genome, 1 mis-incorporation per 10(4) to 10(6) residues Error prone so rapid evolution- drug resistance, e.g. HIV. tRNA primer is present in the viral particle (anneals to PBS) DNA synth > RNAase H (degrades template RNA- left w/ ss) (minus strand strong stop DNA- complimentary to RNA, tRNA primer at binding site) > first strand transfer (synth of ss DNA, corresponding to binding site) > DNA synth RNAase H (x2) (RNA DNA hybrid degrades RNA template) > RNAase H > second strand transfer (synth displaces linear ds DNA) > DNA synth
A promoter system for regulated gene expression
Promoter- lac operon LacZ, Y & A (enzymes for lactose) LacI - encodes repressor which binds operator to prevent transcription Presence of lactose- lactose binds lacI & prevents restriction Lac promoter- controls expression Lac repressor- binds to operator, turns off CAP (catabolite activator protein) boosts strength of promoter & only binds in absence of glucose -35 & -10 seqs of lac operon (where RNA pol binds) are diff to the consensus on the RNA pol. Also leaky. When turned off there's some background transcription happening. Has been modified by adding part of the trp promoter in which has match to -35 seq - splice these 2 promoters- makes TAC promoter So regulatable + high strength (matches -35 seq) in practise can't just use the TAC promoter like this Tightly regulated gene expression PET plasmid: has target gene regulated by lac promoter & T7 promoter Lac promoter repressed by presence on lacI (present on the PET plasmid) The T7 promoter requires T7 RNA poly for activation The chromosome of E.coli: Contains T7 RNAI regulated by a lac promoter T7 RNAI encodes T7RNAP Lacl encoded elsewhere in chromosome But bc of leaky lax operon making some T7RNAP PLysS plasmid contains T7 lysosome gene T7 lysozyme inactivates T7RNAP on addition of IPTG (lactose analogue) lac repressor on chromosomal lac operon dissociates T7RNAP made in large amounts Overwhelms the T7 lysozyme T7RNAP BINDS t7 PROMOTER, inducing expression on target gene Lac repressor on PET plasmid dissociates, allowing expression of target gene
Female meiotic cycle
Prophase/first trimester/commitment to meiosis *=(starts in utero) Dictyate arrest/second trimester/birth/some of adulthood/follicle formation/oocyte growth Divisions (ovulation)/meiosis II/second arrest/fertilisation Primordial germ cells undergo mitotic proliferation Meiosis initiated after 11-12wks gestation Enters prophase & completes recombination Homologues remain paired/connected by crossovers Enter arrest stage (dictyate) Oocyte resumes meiosis & completes meiosis I Cell arrests again in meiosis II Fertilisation triggers completion of meiosis II
What's the catalytic component of telomerase?
Purified from Euplotes Involved in replicating telomeres Structure: 230kd Contains an RNA; 66kd & a protein component Protein (isolated) component: 123kd (TERT) & 66kd Reverse transcriptase motifs ion the catalytic subunit of telomerase Lundblad - 3 complementation groups of genes where mutations gave phenotype - shorten telomeres & finite lifespan - won't replicate telomeres if mutant yeast telomerase
Whole plasmid method using a commercial kit
QuikChange method (1996)- still widely used Current kit price £320 for 10 reactions (2017) Uses ds plasmid DNA as template & 2 complementary PCR primers containing the required mutation(s) The circular plasmid DNA is amplified by PCR then the 2 original template strands (unmutated) are specifically digested (restriction enzyme) leaving just 2 complementary mutant strands Can be introduced into E.coli & replicated
Northern blots
RNA blots- transfer RNA from an agarose gel to a nylon membrane Blot probed by hybridisation Use a radioactively labelled ss DNA probe (bind to RNA) to detect mRNA corresponding to a specific gene Sensitivity (not v) Quantitative Low tech- simple Separate (isolate) RNA molecules of diff size by agarose gel electrophoresis Transfer RNA to a nylon membrane by blotting Add radioactive ss DNA- this binds to the mRNA that corresponds to the gene Measure the amount of bound radioactive DNA to measure the amount of the specific RNA in the original sample X ray film detects radioactive bands- autorad
RT-PCR
RT = reverse transcriptase Measure amount of PCR Product to measure amount of mRNA but... PCR reactions only linear for initial cycles 20-30 NOT quantitative mRNA > RT > mRNA/DNA > Taq > (2nd cycle) amplified product
Gal4 activates GAL genes
Regulate- turn on/off Allow utilisation of galactose as C source Transcription activated by Gal4 (binds as dimer) GAL genes highly regulated- expression ONLY when glucose absent & galactose present Gal4 activator dimer on UAS -> transcription complex, core promoter, GAL gene GAL promoter everything before GAL gene Activates transcription from core promoter GAL promoter used for inducible gene expression Gal4 activator dimer on UAS -> transcription complex, core promoter, gene encoding required protein GAL promoter Haploid Yeats strain containing wt GAL4 gene- Gal4 levels limiting- relatively low level expression of the target protein Gal4p produced from own promoter Activator limited Modified yeast strain containing the GAL4 gene linked to a STRONG promoter - Gal4 levels boosted - leads to high level expression of the target protein Gal4p produced form the GAL1 promoted
Uracil glycosylase
Removes U in DNA, breaks bond between sugar and U. First step in BER (Base Excision Repair). This repairs the U product of cytosine deamination. BER also removes oxidised and alkylated bases.
BER
Removes U, restores C A/P endonuclease (apurinic/apyridinic endonuclease) Polβ has dRP lyase activity/removes it (5'dRP = deoxyribose phosphate) Short patch: Polδ Long patch: γ Damaged base > glycosylase > AP site > AP endonuclease > 3'OH & 5'dRP SHORT PATCH: 5'dRP lyase > polymerase (non-displacing synthesis) ligase > dRP lyase; mechanistically distinct from endonuclease LONG PATCH: 3'OH > polymerase displacing synthesis (displaces dRP) > flap endonuclease (let w/ nick > ligase (seals nick)
Endopolyploidy
Repeated chromosome replication w/o cell division (E.g. nurse cells of Drosophila egg chamber; DNA levels up to 4000 X C )
Getting around: intermediates
Replicative: target site cut, don't cut transposon completely, ds break only 1 strand each end cleaved ends ligated into new site Excisive: donor site, ds breaks, transposable element
Typical protein is about 333 aa's in length
Requires ~1kb of coding info
Retroviruses & retroposons are important as mutagens but can slo be used as tools; retroviral vectors
Retroviral vectors often used in gene therapy bc they transduce at v high frequencies DIAGRAM Need to express gag pol, etc or can't move around/infect Need psiΨ seq to infect- codes for viral capsule INSERT > transcription (encodes for particular protein) > reverse transcription (capacity to infect target cell)
Difference between ribose & deoxyribose
Ribose has OH on bottom right, Deoxyribose has H
(Semi) Automated Sanger sequencing (human genome)
Same basic reactions as cycle sequencing Products labelled w/ fluorescent tags (diff tags base, better than radioactive for each) Use a single tube reaction w/ 4 diff tagged ddNTPs plus 4 untagged dNTPs (1 reaction not 4) Labelled fragments detected by laser detector Fluorescent tags - peaks visualised = base
Gene expression assays
Several ways to measure the activity of a gene Measure the amount of the protein product (indirectly via a reporter enzyme assay e.g. LacZ or directly via a protein blot) Measure steady state level of mRNA- the most common way Measuring ongoing transcription
Gene chips
Similar in principle Hybridise cDNA to a series of oligonts synthesised on a silica surface Each chip approx. 2cm(2) Available commercially Can measure levels of 47,000 gene transcripts using series of 16x25mer oligont pairs to measure each gene transcript
How retroviruses integrate III
Similar mech to transposons 1) integrase generates 2 base recessed 3' ends in LTRs 2) integrase generates triggered ends in target DNA 3) integrase links recessed 3' ends of LTR to staggered 5' ends of target
Bac genes range in size from ~0.5Mb to 10Mb and don't show C value paradox
Small genome (470 genes): specialists usually obligate parasites Large genome (10,000 genes): metabolic generalists often undergoing some form of development Mainly circular
Third Gen seq systems - Oxford Nanopore DNA seq
Small/highly portable The minION sequences individual DNA molecules as they pass through protein nanopores on a synthetic polymer membrane, generating sequence data in real time Adapter holds each nt transiently and allows base ID as it passes through the pore Voltage applied so the current flows through each nanopore and is detected by a sensor The DNA to be sequenced is. heard into large sized frags Adapters joined to the ends of each frag Loop adapter (joins the 3' end of one strand to the 5' end of the other) 5' adapter (provides target for the processive enzyme which feeds the DNA through the nanopore) The DNA is added to the flow cell and each frag is guided to a nanopore where it is unwound by the processive enzyme and the resulting ss is passed through the pore 1 nt at a time Each nt is held briefly by the adapter as it passes through the pore, resulting in the flow of current being disrupted The diff nts disrupt the current in diff ways allowing them to be ID'd Actual process- machine reads patterns generated by seqs of 5 nts at a time The seq of the 5 nts is ID'd by comparing the pattern they generate to patterns known to be generated by every diff combination of the 5 nts which are held in the computer memory Homopolymers (long runs of nts w/ the same base) are hard to sequence accurately When 1 strand has passed through the pore the original partner strand follows bc the strands are joined by the loop adapter- this allows the same frag to be sequenced twice in one read The nanopores are deployed on a sensor array chip in a flow cell (currently the sequences generated by 512 nanopores can be recorded per cell) Each nanopore can potentially sequence the whole of a very long DNA frag (reported that sequencing of both strands of a DNA molecule 950kb in length was possible_ When an individual nanopore has finished sequencing one frag it can start on another Each pore generates sequence at a rate ranging from 40 to 500bp/sec The sequence data generated from each nanopore is collected and analysed as the DNA molecule traverses the pore ADVANTAGES: lightweight, portable & easy to use. Data becomes bailable as the DNA is sequenced, no need to wait for a complete sequencing run Generates long sequence reads from individual molecules Cheap set-up cost - no need for expensive sequencing machines DISADVANTAGES: DNA samples require preparation before they can be sequenced Relatively high error rate (esp. homopolymers) Relatively high cost per bp sequenced New developments: promethION - 48 flow cells, each w/ 3000 pores (not commercially available) smidgION - smallest available sequencing device designed to plug into a mobile phone (not yet available)
Use of T as a base in DNA does not solve the problem posed by cytidine deamination completely
Some of the bases in DNA covalently modified 1 important mod is methylation of cytosine on the 5 position Deamination removes U which in DNA can be ID'd as wrong and removed Deamination of 5-methyl cytosine generates T; methylated C is often the site of mutation in genomes where C is methylated T cannot be recognised; mutations don't get repaired as well
DS RNA
Some viruses; blue tongue, rotavirus
Site directed mutations
Sophisticated- targeted Ability to change one or a few specific bases in a cloned DNA frag Change aa seq of a protein Change the seq of a regulatory element/other functional region RNA polymerase II ---------> polyadenylation site Promoter- start codon- exons/introns- stop codon Changes in coding exon Nobel prizes- Michael Smith/Kary Mullis
Telomeres summary
Specialised RT Prevents shortening ends- adds telomeres to end Cells lacking telomerase shorten telomeres gradually Template = only small segment of RNA that it carries by itself Requires 3' end as primer Synth proceeds in 5'-3' direction Synthesises 1 repeat, then repositions itself Specific interactions w/ each telomere
Gene therapy for inherited immune-deficiency
Stem cells > transfect w/ retrovirus > back into patient > could develop leukaemia Lack U3 - no leukaemia Ex vivo culture Transduction w/ γ-retroviral vector > conditioning of patient, infusion of modified stem/progenitor cells > isolation of HSPCs from bone marrow After therapy- long-term reconstitution of lymphoid lineages
Retrovirus virion
Surface envelope protein (SU) Transmembrane envelope protein (TM) Membrane Matrix protein (MA) Capsid (CA) RNA genome, bound by nucleocapsid (NC) *phospholipid membrane Integrase (IN), reverse transcriptase (RT) and protease (PR) in capsid w/ RNA genome
Contribution of meiotic errors to human disease
Syndromes due to abnormal chromosome number (autosomes) Aneuploidy- where an organism/cell has only 1 or a few chromosomes added/missing Trisomy of only 3 autosomes compatible w/ normal human embryological development: Down (21), Patau (13)- v severe, most babies die within 6months, Edwards (18)- v severe, 50% die in first month
Sequencing primers
Synthetic ss DNA oligonucleotide (artificial) ~20 bases in length Complementary to the known DNA seq next to the unknown seq (often the cloning vector) Ss primer at 5' end, complementary to known seq on 3' end of other strand (3'OH end left > right) Make template ss for bottom strand, synthesise, allow to sequence at same time Usually polylinker in cloning vector restriction sites ??????????????????
Analysis of protein function by alanine scanning
Systematically change aa's within a functional domain to alanine (binding) Assay the effect of each mutation on protein function Alanine bc simple side chain- any change from removal not addition of alanine
Making random mutations
TAQ polymerase makes mistakes See what effect is- study function Change conc. of magnesium ions- more mistakes Approx. error rates TAQ poly approx. 1 error per 50,000 bases inserted Tfl poly approx. 1 per 500,000 bases Pfu poly approx. 1 per 1.3mil (heat stable, proof-reading)
Reverse transcription of LI RNA into ds DNA genomic DNA
Target primed reverse transcription RT is often incomplete Most L1's < 1kb Loose target site consensus 5'AAAAATATTT/3' ************ DIAGRAM ds break > nick 3' tail > DNA synth > nick > more synth > acts as primer(?)
Microscopy: karyotype analysis
Technique to visualise human chromosomes Can be applied to other organisms Some techniques use fluorescent DNA probes (FISH) Chromosomes best seen during cell division (metaphase) To visualise chromosomes, cells need to be dividing Chromosomes visualised separately in metaphase; sufficiently separate, best for analysis DNA stained w/ fluorescent compound propidium iodide or DAPI(10) Metaphase- late S phase, paired double helices held at centromeres by residual cohesins (after most removed) Chromosomes = 2n, DNA = 4C (C = amount of DNA in nucleus) M phase meta -> ana Centromere splits 2n, 4C > 4n, 4C (2 chromatids) > 2n, 2C (2 daughter cells, 1 chromatid in each)
Chromosome analysis in medicine
Techs to visualise human chromosomes: DNA-bases methods (array, PCR), microscopy (karyotype, FISH) Test for loss of rain of whole chromosomes, translocations (swap over bits of chromosome), deletions (Kbs-Mbs) or other rearrangements Clinical applications: diagnosis of genetic syndromes, prenatal diagnosis Only large-scale changes, not point mutations (bases)
Telomerase RNA: includes template
Telomerase contains an essential RNA component (template) as well as protein Where's RT? Assay not pure enough to locate The RNA component includes the template complementary to 1.5 copies of the telomeric repeat Info from the RNA template is copied into DNA In vitro mutagenesis experiments - mutate seq - mutate product
New tech - pyrophosphate sequencing
Template + primer + polymerase NO dNTPs Add dATP Used to extend primer? YES - 1st base = A Remove unincorporated dATP Add dTTP Used to extend primer? No - 2nd base NOT T Remove unincorporated dTTP Add dGTP, etc (DNA)n+dXTP > (w/polymerase) > (DNA)n+1 + PPi PPi > (sulfurylase) > ATP ATP > (luciferase) > light FLASH OF LIGHT Break genome into pieces > add DNA adaptors to ends of each piece & bind the pieces to a separate microbead > encase each bead in oil (keeps separate) containing PCR components > amplify original DNA piece x10mil > now each bead contains 10mil identical copies of the original piece of genome > remove oil & denature the DNA on the beads to give ss fragments > load each bead into one well of the seq apparatus > sequence DNA using pyrophosphate method (25mil bp in 4hrs) > 2+ of base incorporated- more intense flash > record order of flash
Points about retroviral integration
Tends to integrate into bend/under wound DNA seqs Once integrated no way out (compare w/ transposons). Our genomes are littered w/ the genomes of the retrotransposons that have been active since the origin of life However, recombination between the LTRs delete most of the viral genome & leave a solo LTR/empty LTR If integrated into the germline then called an endogenous provirus HIV has not integrated into germline. Only genomes of T cells. However, possible that in some individuals HIV has integrated into germline & is endogenous retrovirus.
Other structural chromosome abnormalities
Terminal deletion (to be stable must be capped by a telomere) Interstitial deletion (2 breakpoints, loss of material between) Inversion (bit of chromosome flips round- could addict middle of gene, can still be balanced, probs for next gen) Duplication Ring chromosome (uniting short/long arms in ring chromosome)
Telomerase RNAs
Tetrahymena - 159nt Yeast - ~1150nt Humans (hTR) - 451nt Discontinuous synthesis of telomeric DNA
Purines/pyrimidines
The combination of a 5‐membered carbohydrate ring and a purine or pyrimidine is called a nucleoside. The rings are numbered as shown in the following figure. The two rings of a nucleoside or nucleotide must be distinguished from each other, so the positions of the sugar carbons are denoted with a '(prime) notation. If one or more phosphates exist on the carbohydrate, the combination is called a nucleotide.
Processed pseudogene
The proteins of the L1 element will occasionally bind to mRNA & RT the seq info into the genome generating a processed pseudogene Reverse transcribe- integrate cDNA Adding the 3' end of LINE to mRNA enhanced its retroposition rate in culture
How to look at metaphase chromosomes
To view mitotic chromosomes, cells simulated to divide- simple technique, was routinely used in genetic diagnostic labs, Giemsa produces banding patterns that are chromosome specific 0.5ml blood in culture medium > add phytohemagglutinin > culture 48-72 hrs > add colcemid (arrests cells in metaphase) > add hypotonic KCI; fix in methanol: acetic acid; drop onto microscope slide > brief trypsin digestion strain w/ Giemsa
Transposable elements
Transposons: entirely through DNA (excisive and replicative- look v similar mechanistically) Retrotransposons: RNA intermediate Although these 2 types are diff in many aspects of their mechanisms of movement there are fundamental similarities Together 1-5% of bacterial genomes; 40% human, mouse, rice 75% maize, 85% barley, 98% iris Active elements but many dead elements which cannot move Contribute to spontaneous mutation, genetic arrangements, horizontal transfer of genetic material In bac transposons often associated w/ antibiotic resistance genes Cells must repress transposition to insure genetic stability
Ciliates
Useful for studying telomeres Have 2 nuclei; one big, one small Micronucleus typical eukaryotic chromosomes; many genes, linear & v long; small Macronucleus v unusual chromsomes; one gene linear & v short 1-15kb; each mini-chromosome amplified big
Direct sequencing (RNA-Seq)
Uses new seq tech to seq cDNA (from cells you're interested in) & measure mRNA freq Total RNA (isolate RNA from cell, mRNA will stick- elute - attach to cellulose in cell - polyA tail binding w/ Ts)-> isolate mRNA using oligo dT (mRNA has polyA tail (other RNAs don't have), oligo dT = mix of RNAs (t/c/etc- don't wanna seq all)) -> fragment mRNA & convert to cDNA using random primers (generate lots of cDNA) -> seq DNA (highly expressed, lots of cDNA) -> map each seq to a region of the genome- gene -> measure no. of reads per gene Many mRNA molecules (highly expressed gene) -> convert to cDNA -> seq DNA (occurs many times in seq output) Few mRNA molecules (poorly expressed) -> convert to labelled cDNA -> occurs few times in seq output
Programmed changes in no. of chromosome/genome sets per cell
Usually to provide increased RNA synthesis rates/cell; usually incompatible w/ normal chromosome segregation & maintenance of intact genomes
Euploidy
Variation in the no. of complete sets of chromosome
Aneuploidy
Variation in the no. of particular chromosomes within a set
Tetrahymena/Oxytricha life cycles
Vegetative proliferation > starvation > cyst > refeeding (loops back round) starvation > cell mating > macronuclear development (all chromosomes amped x100s & broken into small pieces) > vegetative proliferation Mating=2 cells, each w/ new micro/macro nucleus Organisation of Tetrahymena macro nuclear rDNA; particularly abundant <---rDNA gene--- ---rDNA gene---> <---telomeric DNA (TTGGGG)n (tenderly repeated 5'-3') Purified & used to define the basic features of telomeric DNA Seq organisation; conserved in evolution
Individual genome seqs cost
Venter (2007) - 4yrs, $100mil Watson (2008) - Next Gen, 4.5 months, $1.5mil Individual human genome (2010) - $25,000-50,000 2013 - less than $5,000 Illumina HiSeq X Ten (2014) - ~£800 per genome seq, 320 human genomes per week NovaSeq 5000 & 6000 (2017) - machine $985,000, promise to reduce cost per genome below $100
End replication problem
Watson (1971) & Olovnikov (73) RNA primer near end of the chromosome on lagging strand can't be replaced w/ DNA since DNA polymerase must add to a primer seq Do chromosomes get shorter w/ each replication? ***DIAGRAM Lagging strand template/leading strand template Discontinuous & continuous DNA synth (RNA primer) RNA primers removed, creating gaps One gap can be filled, the other can't
Next Gen Sequencing
Whole genome sequenced in 4.5 months (original human genome project took 13), cost about $1.5 million (original $2.7 million), 454 sequencing generates short reads (about 330bp), needs reference sequence to assemble these
Life tech/applied biosystems - SOLiD Ion Torrent (Roche)
Works in similar way but detects H+ released- causes slight decrease in pH, increase in acidity
Non-coding RNA & X inactivation
XIST gene essential for X inactivation product of XIST = an RNA (function at RNA level), no protein produced only expressed in cells containing at least 2 X chromosomes higher expression in cells w/ more X chromosomes cells "count" no. of Xs only expressed from inactive chromosome- not only gene from inactive X
Not all genes on Xi silenced
XO produces Turner's syndrome XXY - Klinefelter's some genes escape X inactivation genes on pseudoautosomal region of the short arm of the X chromosome these have homologues on the Y chromosomes (so 2 copies required per cell) some other genes also escape inactivation leads to gene dosage effects in aneuploidy of X chromosome
Interspersed repeats promote genome rearrangements
XX males derived from pseudoautosomal region exchanges usually Alus Often found at junctions of unequal sister chromatid exchange in segmental duplications
Telomere length
Yeast 300bp Mice 20-100kb Humans 5-20kb In almost all organisms w/ telomerase the DNA is GC rich & G rich strand runs towards chromosome end
Protein production in yeasts
Yeast provides eukaryotic environment for protein production Haploid & diploid life cycle Haploid lab strains don't switch mating type Selectable markers- LEU2 (leucine), TRP1, URA3 complement mutations in chromosomal genes- allow transformed cells to grow in absence of leucine, tryptophan or uracil respectively
PCR-based gene targeting in yeast (active syst. for homologous recombination)
allows manipulation of any seq gene disruption, partial deletion & tagging are all possible PCR amplifies selectable marker gene disruption uses plasmid (e.g. pFA6A-KanMX4 (Kan cassette- resistance to Kanimycin) as template for PCR) amp a selectable marker (the Kan cassette) w/ targeting ends (allow replacement of part of genome w/ selectable marker) ***DIAGRAM OF pFA6a-KanMX4 template for PCR E.coli ori / Amp R (resistance gene) downstream (correspond to DNA) long PCR primers upstream G418 RES KanMX4- linear DNA, transform into yeast cells PCR product (KanMX4) & seqs that flank gene you're trying to delete crosses over (homologous recombination event- breaks) w/ genomic DNA (gene 1)- breaks yeast chromosome, yeast cells die check if right gene disrupted- not random integration (PCR)- extract DNA, use PCR primers for up/down stream product has KanMX4 if correct 2 either side integrates > replaced by KanMX4 > test w/ ab
ChIP Seq
an alternative to ChIP-chip DNA frags isolated by chromatin IP are ID'd by high throughout sequencing rather than from hybridisation to a microarray
Anti-sense morpholinos
artificial ss "DNA-like" molecules bp w/ other nucleic acids e.g. RNA & ss DNA introduced into embryos by microinjection (pair w/ specific mRNA- blocks translation of mRNA, protein product no long produced) pair w/ complementary RNA & inhibit functions e.g. translation
In bac vs. as gene editing tool
bac: protospacer/CRISPR repeats > crRNA + tracrRNA > crRNA tracrRNA hybrids > Cas9: crRNA- tracrRNA complex > target DNA site cleavage by Cas9: crRNA- tracrRNA complex gene editing tool: crRNA/tracrRNA fusion > gRNA > cas9: gRNA complex > target DNA site cleavage by Cas9: gRNA complex single RNA mol targeting is achieved by base pairing between 20nts from the gRNA & 20 nts in the genome the genome seq must be adjacent to a PAM seq (NGG for Cas9) for recognition to occur the syst cleaves seqs that match the 5' 20nts of the gRNA & which are adjacent to a PAM seq the RuvC & HNH domains of Cas9 each cleaves a diff DNA strand modified Cas9 protein w/ 1 or the other domain mutated forms ss breaks (nicks)- nickase activity introduce break @ specific site in chromosome (extra DNA at certain point) > homology directed repair in presence of homologous seq > homology directed repair in presence of manipulated seq > manipulated seq inserted into chromosome
CRISPR-Cas9 system
based on RNA-targeted nuclease from Streptococcus pyogenes bac- primitive I.S. biological function as part of an immune system to protect the bacterium from foreign nucleic acids - viruses, etc developed to allow DNA breaks to be introduced at specific seqs in humans mice, rats, frogs, zebra fish, Drosophila & various plants (many cells) introducing a specific DNA break allows genome editing introduce DNA break at a specific position in vivo both strands broken 2 repair mechanisms: cell repairs using non-homologous end joining pathway > mis-repair results in insertions & deletions cells repairs using homology-directed repair (in presence of suitable template to copy) > if template DNA is provided it can be used to introduce precise changes such as point mutations
Genomic imprinting
both paternal & maternal alleles of autosomal gene expressed? for most genes true ~100 human genes only paternal OR maternal copy expressed imprinting: diff in expression of alleles according to parent of origin (nothing to do w/ DNA seq) maternally/paternally inherited allele repressed epigenetic phenomenon (to do w/ marks on chromatin) understanding stems from study of rare genetic conditions parent specific imprint imposed on gamete/early zygote > imprint persists in somatic cells > imprint erased in germ cells > cycle repeats genomic imprinting- pattern of methylation differential "marking" of maternally & paternally inherited alleles only happens for small set of genes (will either be maternal/paternal) associated w/ parent-of-origin, allele-specific methylation of CpG residues (if 1 mutated- healthy 1 may be repressed) leads to differential expression of alleles after fertilisation "imprint" marks parental origin of each allele (1 active, 1 silenced) if 1 copy of the gene inactivated by mutation, effect of deletion will depend on parent of origin imprint "reset" during gametogenesis (can't be inherited by next gen, not erased in early embryo) so that parent of origin is correctly marked
Chromosome translocations
can be identified by karyotyping often "balanced"- not always pathogenic (can still be prob if child inherits) not balanced- gain/loss of material if the breakpoint (ends of arms swapped) within a gene- can cause problems (through- disrupt function) may be associated w/ specific cancers (esp. leukaemias) often cause probs in meiosis
Microdeletion syndromes
chromosomal microdeletions (1-5Mb) deletion causes haploinsufficiency- one copy not enough not visible on karyotype but detected by molecular techniques also known as "contiguous gene deletion syndromes" arise due to unequal crossover events in meiosis genetic disorders: DiGeorge, Williams, Angelman & Prader-Willi diagnosis: clinical symptoms karyotype can appear normal suspect particular deletions FISH (too small to see using chromosome banding- deletion 3-5Mb) FISH w/ DNA probe to part of commonly deleted region confirm diagnosis
RISC
contains ribonuclease ("slicer") which cleaves target RNA & destroys it an argonaute protein (Ago) is key component most organisms contain multiple argonaute proteins encoded by a fam of genes Ago proteins about 100kDa & contain 2 key domains: PAZ & PIWI PAZ near N terminus, binds the 2 nt 3' overhang of the siRNA PIWI near C, endonuclease activity in some Ago proteins e.g. Ago2 in humans = SLICER
Williams-Beuren Syndrome (WBS)
contiguous gene disorder abt 1 in 15,000-20,000 births caused by 1.4Mb micro deletion on chromosome 7q symptoms: supravalvular aortic stenosis (SVAS), hypercalcaemia, mild retardation, distinctive facial features (elfin), distinctive behavioural abnormalities clue to case of WBS from family w/ inherited SVAS but no other WBS symptoms cytogenetics showed all affected individuals carried balanced translocation t(6;7)(p21.1;q11.23) disrupting elastin gene on chromosome 7q11 individuals w/ SVAS but not WBS have mutations of the elastin gene haploinsufficiency of elastin results in SVAS in WBS deletion is abt 1.4Mb & many genes deleted haploinsufficiency of 1+ of these additional genes responsible for other WBS characteristics diagnosis confirmed by FISH (tests for deletion of elastin- gene in deleted region)
Genome wide ID of protein: chromatin interactions
custom made microarrays can be used to ID all the genome binding sites for a known protein (seqs correspond to intergenic regions as well as genes) the microarray contains seqs corresponding to intergenic regions DNA frags isolated by ChIP are labelled & hybridised to the array array seqs that bind to ChIP products correspond to intergenic regions bound by the transcription factor (bind to microarray)
Biological role of co-suppression?
defence mechanism helps protect cell against RNA viruses that replicate via a dsRNA intermediate helps protect against transposable element (stop spread) movements that involve dsRNA intermediates components also involved in processing small RNAs (microRNAs) used in development & other cellular mechanisms e.g. miRNAs (microRNAs)
Comparative analysis using a 2 channel array
differentially expressed DNA 2 diff fluorescent labels at same time cDNA that corresponds a) growth condition 1 cells - RNA - cDNA labelled w/ Cy3 cells - RNA - cDNA labelled w/ Cy5 BOTH mix & hybridise to microarray - green (expressed only in condition 1), yellow (expressed in 1 & 2), red (expressed only in condition 2) no fluorescence; not expressed which genes turned on in response to growth conditions
In female mammals, 1 X inactivated
dosage compensation (over 1000 genes on X chromosome) X chromosome inactivation (XCI)- first proposed by Mary Lyon in 1961 1 X silenced, then forms dense body (Barr body) stably maintained in silent state inactive X highly methylated (keeps quiet) female mammals = mosaics seen in coat colour patterning in cats (many genes that control fur colour on X chromosome) patches due to clonal expansion of embryonic cells carrying diff alleles patches inherited from original cell inactivation humans: females w/ null mutation in gene for glucose-6-phosphate dehydrogenase (enzyme function) (G6PD) are mosaics random inactivation during early development masks effects of deleterious X chromosome mutations in females
Recombination important for correct segregation in meiosis
during meiosis I prophase homologous chromosomes pair & synapse recombination (crossing over) occurs during meiosis I at chiasmata required for correct separation of homologous chromosomes at anaphase & as byproduct also generates genetic diversity
RNAi in C. elegans
first developed for C. elegans worms can be injected w/ long dsRNA, fed bac expressing a specific dsRNA or soaked in solution continuing dsRNA- all result in specific silencing ***??? most common (feed bac)- plasmid contains DNA correspondingly to mRNA you want to make ds RNA taken up flanked by 2 promoters for T7 polymerase phage- express 1 at both end transcription of both strands between promoters, get dsRNA dsRNA taken up by cells Dicer, RISC loading- cut up siRNAs on multiple RISC complexes target corresponding RNA
Human embryos- high freq of aneuploidy
gene dosage effects: normal function needs 2 copies (some need to be present in right amount, sensitive to too much/little, e.g. 1 or 3 chromosomes not viable) trisomies associated w/ increased maternal age most cases of Turner syndrome (45, X) have single, maternally derived X chromosome spontaneous abortions- some compatible w/ life, some seen in embryos not life still births overrepresented- compatible w/ life, produce phenotype
Telomerase needs other proteins to find the telomere
hTERT is synthesised in the cytoplasm & associates w/ the chaperones HSP90 & p23 hTR con transcriptionally binds dyskerin, NOP10, NHP2 & NAF1, w/ NAF1 subsequently being replaces by GAR1 ***DIAGRAM hTERT into nucleus hTR there - combine via Reptin+Pontin? Into cajal body in nucleus S phase- telomere Then G2/M/G1 Mutations in any- disease w/ short telomeres: Dyskerin-NHP2-NOP10 NAF1 GAR1 TCAB1 p23 HSP90 Telomerase ~250 copies/cell, needs to be recruited to the telomeres, role of TPP1
Epigenetic effects
heritable genetic changes but don't depend on primary DNA sequence changes heritable; cell to daughter cell epigenetic changes important in development, X inactivation & imprinting associated w/ DNA methylation
DNA methylation
in mammals, methyl groups added to 5 position of some cytosines to form 5-methyl cytosine (5MeC) almost entirely restricted to CpG (methylated) nts (not found at GpC- not methylated) 5MeC bps w/ guanine in same way as unmodified nt DNA methylation- signal for controlling gene expression
Functional genomics
inhibiting gene function ID function/what products do/which part of genome from observe effects on whole genome PCR based gene targeting (yeast) anti-sense morpholinos (animal embryos) RNAi (C. elegans) mammalian cell culture the CRISPR-Cas9 system (animals, plants, fungi- many diff types of cells)
RNAi in mammalian cells
long dsRNAs cannot be used bc they produce an anti-viral response (general reduction in translation & RNA (RNA degrades, shut down protein synth) breakdown) to avoid this, individual siRNAs are introduced or expressed directly loaded onto RISC complex, targeted to corresponding mRNA
Heteroplasmy
makes it difficult to predict severity of disease in child inheriting mutation primordial germ cell (some mutant, some normal mitochondrion) > primary oocytes > oocyte maturation & mtDNA amp > mature oocytes > fertilisation EITHER high level of mutation (affected offspring) intermediate (mildly affected) low (unaffected) Leber hereditary optic neuropathy (LHON) type of eye problem, rapid disease progression in males also heart rhythm abnormality? 50% patients have G to A substation at nt 11778 (in ND4 gene) these mutations are missense & occur in genes that encode OXPHOS components
Mammalian meiosis/gamete formation diff between males/females
males - 4 gametes per meiosis females - 1 gamete (of 4 possible products- 1 oocyte, 3 polar bodies: cells not oocyte extruded as polar bodies)
Many medically important chromosome changes arise in meiosis
meiosis I maternal & paternal homologue > prophase I/MI (homologues pair up/recombine) > anaphase I (1 in each direction-correct segregation, normally equal division, approx. diploid in DNA content) reductional division: separates homologous chromosomes, produces secondary gametocytes, sexually dimorphic meiosis II P/M II > AII (needs to split at centromere, 4 haploid gametes) equal division separates sister chromatids: produces gametes, sexually dimorphic
Mitochondrial mutations & disease
mitochondrial genome small (16.6kb) mutation rate higher than nuclear genome mitochondrial diseases: broad spectrum, often affect CNS, heart, muscle (high energy) matrilineal inheritance heteroplasmy can occur (a mixed pop. of normal & mutant genomes within same cell; extent of heteroplasmy (heterozygous mixed cytoplasm) can vary between mother & child (0-100%) somatic mutations may contribute to ageing
Deletions in chromosome 15q11-q13 cause PWS or AS
most cases of PWS & AS due to de novo deletions 1 of the most common recurring constitutional deletions in humans deletion on paternal allele- PWS (M repressed normal, P expressed) maternal- AS (vice versa) within this deletion, 2Mb region contains cluster of genes, some imprinted usually diagnosis by FISH not all patients have deletions so another diagnostic test may be necessary
Syndromes due to abnormal sex chromosome no.
much less deleterious than wrong no. of autosomes Y carries few genes X inactivation reduces effect of extra X chromosomes Turner syndrome (45, X) - females short stature, infertility, normal intelligence, increased risk of organ abnormalities Klinefelter syndrome (47, XXY) - males infertile, tall w/ female distribution of body fat (47, XXX) - females, taller, mostly undiagnosed (47, XXX) - males, tall, other features uncertain
Methylation patters inherited in cell division
palindromic CG strands maintenance methylase methylates CpG in new strand if the CpG in the template strand is methylated no methylation in CpG part go strand hemimethylated- DNMT1 maintenance methylase sorts out DNMT1 responsible for maintenance of methylation other DNA methylases carry out de novo methylation
Co-suppression/quelling
phenomenon seen in plants & certain fungi (Neurospora) introduction of extra copies of a gene (transgenes) leads to "turning off" (gene already there + extra) of both the transgenes & the equivalent plant gene e.g. purple colour gene (Chalcone synthase) in petunia (expect darker/deeper purple- either white/mixed)
Co-suppresion results from the (accidental) production of dsRNA
production of dsRNA as consequence of aberrant transcription of an introduced transgene (incorporated but not transcribed properly- transcription from both ends of transgene- results in 2 RNA complementary pair) is the major trigger for co-suppression dsRNA contains paired antiparallel RNA strands (similar to dsDNA) held together by H bonds dsRNA formation transcription of both strands of same region of DNA will generate complementary RNA molecules- these can pair to form dsRNA can also get from... dsRNA formation (most used for RNA interference) ssRNA containing 2 complementary regions (hair pin/fold up)
Gene amp II: maturation of micronucleus in ciliates
rDNA Extrachromosomal replication from single copy Generates 15,000 linear rDNA palindromes Associated w/ rearrangement and multiplication of macronuclear genome Stable ends to linear DNA molecule (telomere-like function)
Gene amp I: rDNA in Xenopus oocytes
rDNA (genes for 18S, 5.8S & 28S rDNA and spacer seqs) Growing oocytes e.g. Xenopus extrachromosomal rolling circle replication Rolling circle: 1) DNA 'nicked' 2) 3' end elongated using un-nicked DNA as leading strand template; 5' end displaced 3) displaced DNA = lagging strand and is made ds via series of Okazaki frags 4) replication of both 'un-nicked' and displaced DNA 5) Displaced DNA circularises 3' end primes DNA synth Replication from 3' end displaces 5' end - displaced strand is copied Somatic cell 900 rRNA genes Oocyte 1mil rRNA genes 31pg rDNA vs 3pg C-value
Meiosis as mechanism for generating diversity
recombination during meiosis I closer polymorphisms (DNA variants) are, the less likely a crossover is to occur between them also random assortment of chromosomes into gametes gametes have diff combinations of gene variants can use DNA markers to track inheritance of these polymorphisms & this is used in linkage analysis 4 products of meiosis: 2 non-recombinant chromosomes, 2 recombinant
Analysis of reciprocal translocation
suspect inherited chromosome abnormality chromosome test in someone (child) w/undiagnosed syndrome/someone (mother) w/history of miscarriage mother- balanced translocation child- inherited 1 normal & 1 translocated chromosome (not both); has 3 copies of 1, only 1 of other CGH- extra genetic material shown as dots above line loss is dots below mother constitutional carrier of balanced translocation pedigree suggests translocation present in 1 of maternal grandparents mitotic division normal, prob in meiosis possible outcomes of chromosome segregation in 1st meiotic division: for fam- risk of probs in future pregnancies (more likely unbalanced), risk substantial... can it be quantified? what kind of testing needed?
Large scale RNAi studies
systematic functional analysis of the C. elegans genome using RNAi 16,757 dsRNAs specific to C. elegans genes (more than 85% of the genome) each expressed in a diff bac strain each strain fed individually to C. elegant & the resulting phenotype determined allowed the functions of 1722 genes to be determined
RNA interface (RNAi)
technique that exploits preexisting cellular machinery to switch off expression of their specific genes in order to study their function Fire 1998- synthesised complementary ssRNA in vitro corresponding to unc22 gene injected either alone (ssRNA)/paired (dsRNA) into C.elegans only ds RNA had significant effect on unc22 expression causing twitching phenotype (consequence of low level unc22 protein) a dsRNA (200nt+) corresponding to the gene of interest may be introduced into plant, C. elegant or Drosophila cells leads to dramatic reduction in protein product of targeted gene dsRNA (in target cells) cleaved by Dicer to generate siRNAs (small interfering RNAs) siRNAs (also ds) 21-25nt in length & contain 2 unpaired nt at the 3' end of each strand (overhangs) Dicer is a member of the RNase III family- a dsRNA specific nuclease that requires ATP siRNA duplex incorporated into RNA-induced silencing complex (RISC) (w/ other proteins) siRNA duplex unwound in an ATP-dependent step 1 strand remains associated w/ RISC (guide strand) the other is degraded (passenger strand) method acts on product the ss RNA in the active complex then targets it to a complementary mRNA which is destroyed (halves) POST-TRANSCRIPTIONAL GENE SILENCING cleavage occurs approx. in middle of the region where the guide RNA is paired to the target RNA siRNA pathway: viral infection > transcription of sense & antisense strands (dsDNA) > Dicer > siRNAs > RISC/mRNA > mRNA halves
Diagnosis of Down's
tradition; karyotype (advantage: distinguishes translocation Down's) 2 chromosome 21 joined- transmission interphase FISH (decondensed chromosomes in clump, 21 probe, count fluorescent dogs (3 not 2)) array analysis PCR of short tandem repeat (micro satellite) typing (3 alleles or 2 unequal alleles for chr21 STRs) STR: homozygous = 1 peak Down's = 3 peaks or 2:1 unequal
No single gene mutation in PWS
unlike AS, probably combination of genes involved v complex- difficult to assign genes to aspects of the phenotype, severe neuroendocrine disturbance, syndrome difficult to treat
Trisomy 21
usually occurs due to errors meiosis 70% cases due to non-disjunction in 1st meiotic division in the mother (could also happen in father) down's phenotype aspects common to all patients: cognitive impairment, characteristic facial appearance present in many but not all individuals: congenital heart defect, acute megakaryocytic leukaemia- rare in general pop. (also higher freq of leukaemia of all types) association w/ alzheimer disease 20% 40-49yrs, 100% at 70 extra copy of gene for amyloid precursor protein
Use of microarrays & sequencing assays to study protein function (where proteins interact within genome) Chromatin immunoprecipitation (ChIP)
widely used for analysis of interactions between regulatory proteins & the genome e.g. ID of transcription factor protein binding at promoters & enhancers transcription factors, e.g. TRF that interacts w/ lots of genes to direct their expression (ID those genes, etc) e.g. ID all binding sites in genome for protein you're interested in compare under diff conditions ChIP often uses an epitope tagged protein tag protein of interest w/ epitope (saves growing antibodies for lots of diff proteins- epitope relies on abs) use an antibody (against epitope) to isolate the protein (+ interacting factors- proteins & DNA) from cell extracts- immunoprecipitation (use ab against epitope- purify) the epitope tag is usually put at 1 terminus of the protein to avoid disrupting function (then add DNA that encodes aa seq @ 5' end of gene aa's incorporated @ n terminus usually incorporates several copies of seq into protein e.g. of epitopes Myc tag/Flag tag epitope = short seq of aa's which can easily/strongly be recognised by commercial abs
CpG islands CGI
~25000 per haploid human genome 50% associated with transcriptional start sites