Lecture 11- DNA Sequencing
• ABI377 - Still polyacrylamide gel-based system - Did allow computer calling of bases - Expensive to buy and maintain - Image manipulation possible
-Cycle Sequencing.
• The same technologies can be applied to RNA - simply convert to cDNA and proceed to adaptor • ligation to provide a transcriptional snapshot of a cell - temporally and spatially relevant information
-RNA Sequencing- RNAseq.
• 4 iterations of sequencing, same basic methodology: - Sanger Di-Deoxy sequencing - Cycle sequencing (di-terminator method) - Capillary sequencing - Next Generation Sequencing (NGS)
-Sequencing methods.
• Fragment gDNA as for NGS • Generate library of 'probes' where • sequence maps to region of interest • Attach bead/hook to probe • Hybridise probes to gDNA and then • pull out of solution via bead • e.g. magnetic • Disassociate targeted gDNA from • Bead and perform NGS on targeted region
-Target sequencing.
1 • Sequencing is very similar to PCR, utilising the incorporation of nucleotides to elongate from a ssDNA template. By using di-deoxy nucleotides however, we can 'terminate' this process, which is fundamental to the sequencing methodology • The lack of an OH group on the ddATP prevents further elongation.
1 -Sanger Di-Deoxy Sequencing.
1 • Rather than use a slab gel, capillary sequencing uses a thin tube filled with polymer and • DNA runs through this, past a detector which reads the fluorescence and provides a digital output of signal intensity as a chromatogram - Much higher voltage (9000v) - faster run time
1 -Capillary Sequencing.
1 • Genomic DNA is fragmented, by enzyme digestion or sonication, and sequence specific adaptors are ligated onto fragments with DNA ligase to form the sequencing library
1 -Next Generation Sequencing.
1 • 1977 - Fredrick Sanger develops rapid DNA sequencing (Nobel Prize for Chemistry, 1980) • 1983 - PCR developed for amplification of DNA • 1990 - Human Genome Project (HGP) begins • 1995 - Haemophilus influenza is the first full genome sequenced • 1996 - Bermuda principals drafted on free access for HGP data • 1999 - Human chromosome 22 sequence released
1 -Sequencing timeline.
1 • All sequencing methodologies use a variation of the Sanger method, which itself mimics • DNA replication seen in vivo - only the detection method and scale differs • Sequencing can be very accurate and used for mutation detection in some instances
1 -Summary.
6 • 4 lanes per sequence • Loaded A-C-G-T so if gel read wrong way round, opposite strand is read • This sequence reads: • 5' - TTCACCTAATCATACTCTCACAATGAATATGTGTTAAAAATAATAAAG - 3' • Resolution of Sanger sequencing is approx. 700-900bp • Limit is the resolution of the gel, not the polymerase • Gels remain radioactive - half life of α32P-dCTP = 14 days - half-life of α35S-dATP = 87 days • Intensity of autoradiograph modulated by length of exposure of photographic film to gel
6 -Sanger Di-Deoxy Sequencing.
7 • Exome - the coding portion of the genome, ~ 1% or 30,000,000bp • Whole Genome Sequencing (WGS) remains cost prohibitive for large scale analyses • Whole Exome Sequencing (WES) is far more cost effective but only examines coding variants • Targeted Sequencing also remains attractive for in-depth study of specific regions:
7 -Next Generation Sequencing.
7 • You can sequence unknown DNA sequences by cloning them first.... - Fragment genomic DNA into pieces - Clone fragments into a DNA plasmid - Use primers that bind to the plasmid - Sequence through the cloned insert • Week 7 (Fri) and Week 9 will cover in detail • how to cut, clone and manipulate DNA fragments
7 -Sanger Di-Deoxy Sequencing.
2 • The key to Sanger sequencing was the incorporation of a radio-nucleotide in conjunction with non-labelled bases. Typically α32P-dATP, α32P-dCTP or α35S-dATP are used
2 -Sanger Di-Deoxy Sequencing.
2 - Capillaries allow a longer run than a slab gel - Ability to run from a 96 well plate - Fully automated for up to 96 samples - Output measured as an intensity, not a colour - Allows calibration against internal standards - Mutation detection possible
2 -Capillary Sequencing.
2 • 2000 - Drosophila melanogaster genome sequenced • 2001 - First draft of human genome released (composite of several individuals) • 2002 - Mouse genome sequenced • 2003 - Human Genome Project completion announced (cost $3 billion dollars) • 2007 - James Watsons genome released; first single human genome ever sequenced • The first human genome took 13 years and cost $3 billion dollars, we can now do the same for • $1,000 in a matter of hours
2 -Sequencing timeline.
2 • Next Generation Sequencing (NGS) is the current pinnacle of sequencing technology, • allowing us to sequence an entire genome for $1,000 in a matter of hours • RNA sequencing (RNAseq) is the latest frontier in sequencing technology, but we are • now coming up against the issue of sheer volume of data and how to interpret it
2 -Summary.
2 -Group of automated techniques used for rapid DNA sequencing
2 Next generation sequencing
3 • Signal intensity is converted to a peak - the higher the peak the higher the intensity. These are plotted in a graph called a 'sequence trace' or 'chromatogram' • Resolution breaks down as sample sizes increase, with sequencing resolution still being around 700-900bp • Colours of original dyes maintained, except orange (G) does not show up well, so converted to black in the computer generated chromatogram
3 -Capillary Sequencing.
3 • Sequencing requires: - DNA template (from PCR) - Sequencing Primer (1, not a pair) - Radio-nucleotide - dNTPs - Polymerase • 4 tubes per sequencing read, each with a different ddNTP - ddATP, ddCTP, ddGTP, ddTTP • 4 reactions are heated to generate ssDNA, annealed to allow primer binding, and extended to allow the polymerase to incorporate nucleotides
3 -Sanger Di-Deoxy Sequencing.
3 • Extension 1 base at a time sequentially allows the incorporated base to be monitored in one of several ways, depending on the methodology. All rely on adding single dNTPs in a sequential manner:
3-Next Generation Sequencing.
4 • The use of signal intensity allows 'dual peaks' to be shown accurately - something not possible in autoradiographs or cycle sequencing • When sequencing DNA, you are actually sequencing both chromosomes of the individual you isolated DNA from - if these are heterozygous at a base then this will show on a chromatogram, as will a deletion or insertion
4 -Capillary Sequencing.
4 • Extension 1 base at a time sequentially allows the incorporated base to be monitored in • One of several ways, depending on the methodology: • The Illumina sequencing method uses fluorophores attached to dNTPs and cleaves these after each extension, releasing fluorescence
4 -Next Generation Sequencing.
4 • Polymerase can incorporate: - Correct dNTP and continue extension - Radio-nucleotide and continue extension - ddNTP and stop extension • dNTPs are in excess, so termination is not immediate
4 -Sanger Di-Deoxy Sequencing.
5 • A homozygous deletion will only show when compared to a control • Like with PCR, you should always sequence a positive and/or negative control sample as a reference
5 -Capillary Sequencing.
5 • Extension and reading 1 base at a time reveals fragment sequence, multiple fragments are then aligned bioinformatically to produce sequence. Number of cycles equates to 'read length'
5 -Next Generation Sequencing.
5 • The dNTPs, radio-nucleotide and ddTTP form a 'pool' of nucleotides which can be incorporated by the DNA polymerase (can be Taq, but remember what we said about proof-reading ability and error rates) • With many copies of the starting ssDNA, you will end up with a ddNTP incorporated at every position, giving many fragments each differing by a single base in length.......
5 -Sanger Di-Deoxy Sequencing.
6 • Error rate estimates in NGS vary from 0.1% to 1% depending upon technology. • 99.9% accuracy sounds good, but with 3,000,000,000bp in the genome, this is 3 million errors in a • single pass! Increased coverage therefore negates these errors - 10x - 30x coverage usual • Of course we do not sequence the entire genome - repetitive elements, telomeres, GC rich etc
6 -Next Generation Sequencing.