Chapter 9: DNA Sequencing (Textbook)

¡Supera tus tareas y exámenes ahora con Quizwiz!

The sequencing ladder

After a sequencing reaction using fluorescent dye terminators, excess dye terminators are removed with columns or beads or by ethanol precipitation. Spin columns or bead systems bind the sequencing fragments to allow removal of residual sequencing components by rinsing with buffers. Alternatively, the dye terminators are bound onto specially formulated magnetic beads, and the sequencing ladder is recovered from the supernatant as the beads are held by a magnet applied to the outside of the tube or plate. The fragments of the sequencing ladder are completely denatured before running on a gel or capillary. Denaturing conditions (50C to 60C formamaide, urea denaturing gel) are maintained so that the fragments are resolved strictly according to size. Secondary structure affects migration speed and lowers the quality of the sequence. Before loading in a gel or capillary instrument, to remove residual dye terminators, precipitated, and resuspended in formamide. The ladders are heated to 95 C to 98 C for 2 to 5 minutes and placed on ice just before loading.

Bisulfite DNA sequencing

Aka specific sequencing, is chain termination sequencing designed to detect methylated cytosine nt. Methylation of cytosine residues to 5-methylcytosines in DNA is an important part of regulation of gene expression and chromatin structure, affecting cell differentiation and diseases including certain types of cancer. rest of details pn pg.237 if feel important The PCR amplicons are sequenced by Sanger sequencing or pyrosequencing.

Sequence interpretation

Base calling - process of identification of bases in a sequence by sequencing software. Analogous to the inspection of gel bands for quality, clarity, and separation. Interpretation of seuqneicng data from a dye terminator reaction depends on the quality of the electropherogram. which in turn, depends on the quality of template, the efficiency of the sequencing reaction, and the cleanliness of the sequencing ladder. Problems seen on electropherogram 1.Failure to clean the sequencing ladder properly results in bright flashes of fluorescence (dye blobs) that obliterate parts of the sequence read. The seq read around this area is not accurate. 2.Poor starting material results in poor-quality sequence that cannot be read accurately. Clean baseline is seen on good sequence where only one color peak is present at each nt position. Automatic seq reading software will not accurately call a poor seq. Sequencing software also shows the certainty of each base call in the sequence. When the base call is not clear the letter N will replace A,C,T,G. Less than optimal sequences are not accurately readable by software but may be readable by an experienced operator.

Automated Fluorescent Sequencing

Basically in manual you use radioactive nucleotides to label the primers or sequencing fragments.Also you use all 4 lanes of a gel. All fragments are the same radioactive signal color on a gel not diff fluoresce like below. In automated you use fluorescent ddNTP nucleotides in primers (Dye primer) or the ddNTPs (Dye terminator) to label the sequence. You use one lane of a gel or a capillary as the diff nt fluoresce diff colors. Same as that desrcibed for manual sequencing, using ds templates and cycle sequencing. Because cycle sequencing (unlike manual sequencing) does not require the sequential addition of reagents to start and stop the reaction, cycle sequencing was more easily adaptable to early high-throughput applications and automation. Universal systems combined automation of DNA isolation of the templatr and set up of the sequencing reactions. Electrophpresis and reading of the sequencing ladder were also automated. A requirmenet fo automated reading of the DNA sequence ladder is the use of fluoreescent dyes instead of radioactive nt to label the primers or sequencing fragments. Fluorescent dyes used are fluorescein, rhodamine, and Bodipy due derivatives. Automated sequence readers excite the dyes with laser and detect the emitted fluorescence at specific wavelengths. Fluorescent dyes used for sequencing have distinct "color" or peak wavelengths of fluorescence emission, that can be distinguished by automated sequencers. The advantage of having four distinct colors is that all four of the reaction mixes can be read in the same lane of a gel or on a capillary. Fluorescent dye color rather than lane placement will assign the fragments as ending in A, T, G, or C in the sequencing ladder. 2 Approaches to Automated Sanger Sequencing book also calls it automated fluorescent sequencing 1. Dye Primer 2. Dye terminator The goal of both is to label the fragments synthesized during the sequencing reaction according to their terminal ddNTP. Thus fragments ending in ddATP read as A in the sequence will be labeled with a "green" dye; fragments ending in ddCTP read as C in the sequence wil be laebeled with blue dye. G will be black or yellow and T will be red. This facilitates reading of the sequence by automated sequence. In dye primer sequencing, the four diff fluorescent dyes are attached to four separate aliquots of the primer. The dye molecules are attached covalently to the 5'end of the primer during chemical synthesis resulting in four version of the same primer with diff dye labels. The primer labeled with each "color" is added to four separate reaction tubes, one each with ddATP, ddCTP, ddGTP, ddTTP. After addition of the remaining components of the sequencing reaction and of a heat stable polymerase, the reaction is subjected to cycle sequencing in a thermal cycler. The product of the sequencing reaction are then labeled at the 5' end using dye color associated with the ddNTP at the end of the fragment as each tube only has one color dye primer and one type of ddNTP. The primer being labeled means the products will be labeled. The products are resolved together in one lane of a gel or in a capillary. Dye terminator sequencing is performed with one of the four fluorescent dyes covalently attached to each of the ddNTPs instead of to the primer. The primer is unlabeled. Major advantage is that all four sequencing reactions are performed in the same tube (or well of a plate) instead of in four separate tubes. This is because the fragments can be distinguished directly by the ddNTPs. After addition of the rest of the reaction components and cycle sequencing, the product fragments are labeled at the 3'end. As with dye primer sequencing, the color of the dye corresponds to the ddNTP that terminated the strand. Has become the sanger sequencing method of choice.

Bioinformatics

Bioinformatics is the merger of biology with information technology. Part of the practice in this field is biological analysis in silico, that is, by computer. Bioinformatics dedicated specifically to handling sequence info is a form of computational biology. In some cases, such as heterozygous mutations, there may be more than one base or mixed bases at the same position in the sequence. Polymorphic or heterozygous sequences are written as consensus sequences, or a family of sequences, with proportional representation of the polymorphic bases. There is a universal nomenclature for mixed, degenerate, or wobble bases. The base designations in the IUB code are used to communicate consensus sequences and for computer input of polymorphic seq data. In addition to the interpretation of sequence variants, sequence information is also used in epidemiology, to speciate organisms or to find homologies within or between species. These applications involve database searches with comparisons of large regions of DNA. The Basic Local Alignment Search Tool (BLAST) is a system for homology searches. BLAST searches GenBank, a large database maintained by NCBI. Searches can be made of nucleic acid and amino acid sequences. Searches are performed by selection nt or protein search and entering a sequence (Query). Limits and parameters on the search can be added, such as the type of organism to search (human, mouse or other), exclusions and limits of organism or sample type, and the program. The program can optimize for highly similar sequence matches (megablast) or imperfect matches. Selecting less than perfect matches will also allow cross-species matches of phylogenically conserved sequences, which can lead to the id of important proteins domains or clues to protein fn. The search will generate a number of matches or hits, with a diagram showing the alignments of the matching sequences and a color code indicating the best matches. Another section of the search results in E-values. The E-value (Expect value) describes the numbers of matches to the query by chance when searching a database of a particular size. It decreases exponentially with the quality of a match. Very low E-values (10^-12) would be associated with a perfect match for a given query sequence. (My thinking so a low E value means pretty confidant it is a match not a lot of stuff matching it and high e value means a lot of stuff matching it low confidence there is a match). google:The E-value (expectation value) is a corrected bit-score adjusted to the sequence database size. The E-value therefore depends on the size of the used sequence database. Since large databases increase the chance of false positive hits, the E-value corrects for the higher chance. Further info, including the matched gene name and its organism, the the source of the matched sequence and the location within that sequence, comparison of base to base or a.a to a.a and plus or minus strand of the matched nt seq are accessed by selecting the sequence name. In addition to the id of new sequences, queries such as these are also useful for test and primer design. Whenever a new primer or probe seq is chosen it is useful to query the primer or probe seq to confirm that it belongs to the correct species and is not duplicated in multiple places in a genome. Primers and probes with multiple potential binding sites will produce mis-primes and off-target products. Bioinformatics includes handling and updating of information for software tools and databases. is the driving force behind development of high-powered, reliable computer systems for storage as well as organization.

Advanced Concept

DNA sequences with high GC content can be difficult to read due to intrastrand hybridization in the template DNA. Reagent preparations that include 7-deaza-dgTP or dITP instead of standard dGTP improve the resolution of the bands (peaks) in regions that exhibit GC band compressions, or bunching of peaks close together so that they are not resolved, followed by several peaks running farther apart. I dont get pg. 232

Pyrosequencing (on study guide)

Designed to determine a DNA sequence without having to make a sequencing ladder Relies on the generation of light (luminescence) when nts are added to a growing strand of DNA. No gels, flourescent dyes, or ddNTPS. Reaction Mix: ss DNA template, sequencing primer, sulfurylase, and luciferase, plus the two substrates adenosine 5'phopsphosulfate (APS) and luciferin. One of the four dNTPs is added in a predetermined order to the reaction. If nt is complementary to the base in the template strand next to the 3'end of the primer, DNA polymerase extends the primer. Pyrophosphate (PPi) is released with the formation of the phosphodiester bond between the dNTP and the primer. The PPi is converted to ATP by sulfurylase that is used to generate a luminescent signal by luciferase-catalyzed conversion of luciferin to oxyluciferin. Process is repeated with each of the four nt again added sequentially to the reaction. The generation of a signal indicates which nt is the next correct base in the sequence. The results from a pyrosequencing reaction, a pyrogram, consist of peaks of luminescence associated with the addition of the complmentary nt. If the seq contains a repeated nt like GG or CC the results dG peak (Double height) and dC peak (double height). The nt seq is called based on the order of the nt bases introduced to the seq reaction and the peak heights. One primer is biotynlated for isolation of ss template. The system is regenerated by the addition of apyrase that degrades residual free dNTP and dATP. Most useful for short to moderate seq analysis. It is therefore used mostly for detection of previously known mutation or single nt polymorphism (SNP) and typing (re-sequencing) rather than for generating new sequences. Less throughput capacity than chain termination method (Which is more popular for determing dna seq) Applications: mutation detection, infectious disease typing, and dna methylation analysis advanced concept: Pyrosequencing requires ss sequencing template. Methods using streptavidin-conjugated beads have been devised to prepare template. First the region of DNA to be sequenced is PCR-amplified with one of the PCR primers covalently attached to a biotin molecule. The ds amplicons are then immoblizxed onto the beads and denatured with NaOH. After several washes to remove the non-biotynlated complementary strand (And all other reaction components), the sequnecing primer is added and annealed to the pure ss DNA template.

RNA Sequencing

Early methods to sequence RNA made use of ribonucleases to cut non labeled RNA at specific nts. Another approach was to infer mRNA sequence from a.a. sequence. The RNA transcript seq can be determined from the sequencing of its complementary DNA; however, sequencing error may occur mostly from the cDNA synthesis step. Direct sequencing of RNA has been proposed based on single nt sequencing technology and virtual terminator nt. mRNA is captured by immbolized polydT oligomers. For those RNA species without polyA tails, an initial treatment with polyA polymerase is performed to add a 3' A-tail. The 3' ends of the captured RNA are chemically blocked to prevent extension in the sequencing step. Four reversibly dye-labeled nt are then sequentially added. An image is taken, the extension inhibitors are cleaved and alternating C,T,A,or G nt are added with imaging, cleavage, and rinsing between each nt addition. After repeating this process many times the collected images are aligned and used to build the seq from each poly(dT) anchor

Sequence Quality

Instrument collection and sequencing software will batch the sequences for each sample, based on the bar codes and identify the nt order in the process of base calling. Each base is assessed for quality of imaging (or conductance detection) and given a Phred score of 2 to 3 (100- to 1,000 fold certainity of a correct call) is acceptable. Each seq is tehn compared to a ref seq through read alignment. Ref sequences are considered "normal" in that there are no known significant variants; however, there is no real "normal" seq. Variations from the ref may be the majority allele in the population, with the ref seq carrying the minor allele. In humans reference genome hg19 is frequently used. Ref sequences are free of known disease-related alleles, at least those found in the target panels. Next step is variant identification based on comparison with the reference sequence. There are diff types of variants, including single-nt variants(SNVs), small insertion and/or deletion of nts (indels), rearrangement of sequences (e.g. translocations), and copy-number variants(CNV; amplification or deletion of larger regions). Each of these types is handled differently by comparison software. -Constitutional (genetically inherited) SNVs are identified in some programs based on based on a specific range of expected allele frequencies (variant allele/reference allele) for homozygosity or heterozygosity. -Indels (up to 20 bp) can be identified by realignment, that is, multiple alignments (offset by one or more bases) that minimize base mismatches. Indels and even larger rearrangements can be detected by overlapping reads of paired end-primed sequences or by points of seq diversions from 5' and 3' end reads (Split-read analysis). -Translocation breakpoints are often within introns or repetitive DNA sequences, or they contain overlaying sequences changes at the breakpoint, posing further challenges for variant identification Once aligned, sequence variations from the reference (variants) are arranged in a variant call file (VCF). The VCF is a textual file that may be archived for further reference.

Allele Dropout

Loss of library fragments from the sequenced regions. This will cause inaccurate assessment of variant allele frequencies. Primers can be designed to produce overlapping sequences to cover less optimal regions. Paired-end or mate-pair primers produce coupled sequence fragments separated by 30-50kb. By overlapping these reads, large variations not detectable in a few hundred bp such as translocations can be detected. Both primer- and probe- based selections Both primer- and probe-based selections are affected by GC-rich sequencing targets. Secondary structure lowers the binding of primers and probes. GC rich sequences also "clamp" primers in amplicon based enrichment, lowering PCR efficiency. AT-rich regions may also be subject to poor hybridization leading to loss of sequencing template fragments. illumina paired end sequencing https://seekdeep.brown.edu/illumina_paired_info.html

Sequencing platforms

NGS was introduced as a pyrosequencing technology. The two most frequently used methods in clinical applications are ion-conductance and reversible dye terminator sequence. Both involve sequencing by synthesis and can be compared, chemically, to pyrosequnecing and Sanger sequencing. Ion-conductance sequencing tldr: Amplification on beads by ePCR then beads placed on chip for ion based sequencing where H ions given off as nt added and drop in pH identifies nt sequence. Ion-conductance sequencing, indexed libraries (gene panels) are amplified using prumers immbolized on micrparticles (beads) in an aqueous oil emulsion using adapters on the library fragments complementary to the immbolized primers. The beads carrying the amplicons (sequence templates) are placed on a solid surface (gene chip). The captured fragments are subhected to the addition of nt in a predetermined order. If the nt is complementary to the sequencing template, DNA poly will catalyze the formation of a phosphodiester bond. A hydrogen ion is released (along with pyrophosphate) upon formation of the phosphodiester bond. The hydrogen ion will lower the pH of the reaction by a specific amount recorded by the sequencer. This will identify the nt. Reaction occurs hundreds of thousands of times producing sequence information from millions of sequencing panel library fragments. Reversible dye terminator sequencing tldr: The panel is amplified by bridge PCR through primer-binding sites complementary to primers immobilized on the flow cell. Amplification in place on the solid surface produces batches or polonies of sequencing templates distributed evenly across the flow cell. Then add flourescently labeled nt to flow cell and take images to determine seq. Reversible dye terminator sequencing, captured or amplified fragments are hybridized to immobilized primers on a solid surface (flow cell). The fragments hybridize to the immobilized primers and are amplified by branch PCR into collections of products or polonies. Proper conc (6 to 20 pMol) of the library DNA introduced to the flow cell will ensure that the polonies are evenly spaced on the flow cell. The polonies are sequenced in place by the sequential addition of fluorescently labeled nts. If a nt is complementary to the template next to the primer, DNA poly will extend the primer. As in sanger sequencing, each nt is labeled with a specific color of fluor. An image is taken of the flow cell after each nt addition (cycle), recording the presence of each added nt color and location. After imaging, the fluorescent dyes are removed, and the next nt is added. Simultaneously, hundreds of thousands of polonies are sequenced in this way. Other sequencing platforms such as sequencing by ligation and nanopore sequencing are used in research applications. Sequencing by ligation uses a pool of labeled oligonucleotide DNA ligase to identify the template sequence through the known probe sequences. It uses short fluorescently labeled oligomers that hybiridze in short increments if they are complementary to the DNA template. The DNA template is anchored to a glass slide. If the oligo is complementary to the template, it is ligated and then two bases are detected at a time. The oligonucleotide is cleaved followed by the next round of ligation. Each time two new nts are detected. Nanopore sequencing ( or long-read single-molecule sequencing) has the advantage of not requiring fragmentation and amplification fo the template DNA. One strand of long dsDNA molecules (up to 1Mb) is drawn through protein pores. Each nt is identified by a disruption in current as it passes through the pore. Can also be used for direct RNA sequencing. So uses protein ion channels through which one strand of each ds DNA template is drawn. Each nt passing through the pore changes the current in a characteristic way.

Next-Generation Sequencing (Massive Parallel Sequencing) Examples

Pyrosequencing, Bead Array, Ligation, Phospholinked Fluorescent Nucleotides, Ion conductance, Reversible dye terminator Single-molecule long range

Chemical Maxam-Gilbert

Required a double or single stranded version of DNA region to be sequenced with one end radioactively labeled. The labeled fragment, or template, was aliquoted into four tubes. Each aliquot was treated with a diff chemical with or without high salt. Upon addition of a strong reducing agent, such as 10% piperdine, the ss DNA would break at specific nt. Each base modifier has a reaction it does. Base Modifieres are dimethylsulphate, formic acid, hydrazine, hydrazine + salt. After the reactions, the fragments are separated by size on a denaturing polyacrylamide gel. The sequence can be inferred from the bands on the film. The lane in which the band appears identifies the nt. Gel is read from the bottom or 5' to the top 3' of the sequence. The size of the fragments gives the order of the nucleotides. Bands in the purine G+A or C+T lane were called based on whether they were also present in the G or C only lanes. Pro: efficient to determine short runs of sequence data con:not practical for high-throughput sequencing of long fragments. Also hazardous chemical hydrazine and piperdine require more elaborate precautions for use and storage. Thus replaced by dideoxy chain termination. advanced: To make a radioactive sequence template 32P ATP is added to the 5'end of a DNA fragment using polynucleotide kinase or the 3'end using terminal tranferase plus alkaline hydrolysis remove excess adenylic acid residues. DS fragments labeled at onlu one end are also produced by using RE to cleave a labeled fragment asymmetrically, and the cleaved products are isolated by gel electo. Alternatively, denatured ss are labeled separately or a "Sticky" end of a RE is filled in incorporating radioactive nt with dna polymerase. side note:polyacrylamide gels (6-20%) are used for sequencing. Bromophenol blue and xylene cyanol loading dyes are used to monitor the migration of fragments. Run times range from 1 to 2 hours for short fragments (up to 50bp) or 7-8hr for longer (more than 150bp)

Targeted libraries

Routine clinical sequencing of human DNA does not include the entire genome. The regions to be sequenced such as a few genes or the whole exome determine the gene panel and the regions to be sequenced are enriched by probe hybridization or by amplification with region-specific primers. Probes are biotinylated oligonucleotides complementary to specific gene regions. Targeted fragments to be sequenced are selected by hybridization with the biotynlated probe and captured with streptavidin-coated beads. The captured regions are ligated to adapters carrying primer binding sites (or amplified with primer binding sites included with short oligo probes) so that all reactions can proceed under same amplification conditions in a single PCR reaction. Probes may be short oligomers that can be extended across the region to be sequenced. The selected regions can then be amplified with tailed primers to add barcodes and sequencing primer-binding sites. Probe-based enrichment has the advantage of capturing sequences surrounding the region of interest and providing information from neighboring sequences. The presence of surrounding regions should be balanced because too much additional sequencing will affect the accurate sequencing of the targeted regions. The balance will depend on the average length of the DNA fragments. Amplicon-based target libraries are selected by multiplex PCR with gene-specific primers tailed with binding sites for a secondary primer set. After amplification, the secondary primers are tailed with index sequences that will identify (bar code or index) fragments from multiple samples in the same sequencing reaction and adapter sequences complementary to immobilized oligonucleotides anchored in the sequencing platform. These steps may be combined by tailing the initial multiplex PCR primers with the index and adaptor sequences.

NGS Library Preparation

Sequencing library - a collection of DNA fragments to be sequenced Reversible dye terminator and ion-conductance sequencing are performed on DNA fragments less than 1,000 bp in length. Genomic DNA is fragmented by a number of methods, including shearing with high frequency acoustic energy, sonication, nebulization (forcing DNA molecules through a small opening), or enzymatic treatments. Particular methods and how they are used produce differently sized fragments (100 to 1,000bp). The median fragment size can be checked by gel electrophoresis or microfluidics. Starting DNA conc and the DNA conc of the library is best measured by fluorometry. Fragmented DNA produced by enzymatic or physical methods may be used directly for whole-exome or whole-genome sequencing. The fragments will have a mixture of 5' and 3' overhangs, some phosphorylated. To facilitate ligation to synthetic adaptors, ss fragment ends are removed or filled in with nuclease or polymerase treatment. The ss ends need to be repaired to ds ends by end repair after fragmentation. The 5' ends are phosphorylated. The 3' ends can be adenylated to further enable ligation to adapters with T overhangs. The adaptors carry primer-binding sites for PCR amplification of the library. The adaptors allow seq to bind to chip. see https://www.youtube.com/watch?v=-kTcFZxP6kM Adapters are synthetic short dsDNA pieces carrying sequences complementary to a single primer pair. The adaptors may also contain short sequences that will identify the the sample (indexing or barcoding). This allows analysis of multiple samples in the same reaction as the sequencing software will put together sequences from fragments with the same barcode. Indexes can be on tailed primers for PCR or included in adapters. Small genomes such as those of microbes or plasmids can be simultaneously fragmented and ligated to sequencing adapters in a single reaction tube.

Sequencing DNA input (Advanced concept)

Sequencing technologies differ with amount of required input genomic DNA. The lower limits range from 10 to 50 ng of DNA. For sequencing tumor DNA from fixed tissue, 140 mm^2 tissue with at least 30% tumor is recommended. Not enough starting material increases PCR artifacts.

Annotation

TLDR: Every variant is not of biological or clinical consequence. Some variants are synonymous or silent with regard to protein sequences. Others are common polymoprhisms found in the pop. Therefore, annotation are performed to identify critical variants. There are several components of annotation. The confidence in the variant call is determined by the sequence quality and coverage. Coverage is critical for confident detection of variants that are of low frequency in the sample such as somatic mutations in heterogeneous tumor tissue. Coverage of at least 500x (total of forward and reverse sequences) is recommended for detection of somatic variants. The chromosomal and sequence location of the variant in context with the reference sequence is identified, along with the type of variant (SNP, insertion, deletion, or complex). Variant is then subjected to filtering (TLDR filters so only find the variants that affect protein and not just common polymorphisms). SNPs are compared to previously reported variants identified as human genome polymorphisms with the SNPs rs idenitifcation number. Variants may be categorized as genetic or somatic in origin, and if genetic, as homozygous or heterozygous with the reference allele. For gene panels and exome sequencing, variants will likely be found in gene-coding regions and adjacent to intronic sequences, although intergenic areas may also be covered. The particular gene affected and the location of the variant in the exon, intron, or intergenic sequences are noted. For variants found in introns, any effects of splicing are assessed. Variant effects on protein can also be estimated using alogrithims such as PolyPhen and SIFT. Silent variants will not cvhange the a.a. seq, but codon usgae may have an effect on trasnlation efficiency. Conservative a.a. substitutions or those late in the protein seq have less effect on protein fn than nonconservative mutations located early in the protein seq. Alogrithims provide scores to indicate the degree of damage to proteins structure or fn caused by the seq variant. Variants that remain after filtering may be annotated by searching in disease-specific databases, such as Cancer Genome Atlas (TCGA), the Catalogue of Somatic Mutations in Cancer (COSMIC), My Cancer Genome, the Leiden Open (Source) Variation Database (LOVD), and the human Gene mutation Database (HGMD). These databases and others contain population and clinical data associated with previously observed variants. The info from these databases can assist with the interpretation of the clinical effect of a variant. Final reports of variants may contain info from databases, including effects on therapeutic treatments, especially target therapies, clinical trials, and progosis. The clinical significance of a variant may differ with the heterogeneity of disease states as well as patient characterisitcs and demographics (Age/gender).

Dideoxy Chain Termination (Sanger) Sequencing On study guide

TLDR: Modification of replication process. A short, synthetic, ss DNA fragment (primer) complementary to sequences just 5' to the region of DNA to be sequenced is used for priming dideoxy sequencing reactions. Have 4 reactions tubes each with diff ddNTP. Run gel then expose to x ray film. The sequence is read from the bottom of the gel (smallest, 5'most)to the top (largest, 3'most)fragments across or within lanes to determine the identity and order of nucleotides in the sequence. For detection of the products of the sequencing reaction, the primer is attached covalently at the 5' end to a 32P labeled nt or fluorescent dye-labeled nt. Manual dideoxy sequnecing requires a ss version of the fragment to be sequenced (template) and a primer. The sequence of the template will be determined by extension of the primer in the presence of dideoxynucleotides. The primer provides a free 3' OH. Primer anneals prior to sequence you are sequencing. A previously used alternative detection strategy was to incorporate 32P or 35S-labeled deoxynucleotides into the nt sequencing reaction mix (internal labeling). Modified dideoxynucleotide (ddNTO) derivatives are added to the reaction mixture. These lack hydroxyl group found on the 3' ribose carbon of the deoxynucleotides (dNTPs). DNA synthethis will stop upon incoporation of a ddNTP into tje growing dna chain (chain termination) because without the hydroxyl group at the 3' carbon sugar, the 5' to 3' phosphodiester bond cannot be established to incorporate a subsequent nt. The newly synthesized chain will terminate with the ddNTP. That 3' hydroxyl is required for forming bond with phosphate group of another nt. A 1:1 mixture of template and radioactively labeled primer is palced into four separate reaction tubes in sequencing buffer containing the sequencing enzyme and ingredients necessary for the polymerase activity. Mixtures of all four dNTPs and one of the four ddNTPS are then added to each tube, with a diff ddNTP in each of the four tubes. The ratio of ddNTPs/dNTPs is critical for the generation of a readable sequence. If the conc of ddNTPs is too high polymerization will terminate too frequently early along the template. If the ddNTP conc is too low, infrequent or no termination will occur. With the addition of DNA poly to the four tubes the reaction begins. After about 29 mins the reactions are terminated by the addition of a stop buffer, which consists of 20mM EDTA to chelate cations and stop enzyme activity, formamide to denature the products of the syn, and gel loading dyes (bromopehnol blue and or xylene cynaol). All four rxns are carried out for equal times to provide consistent band intensities in all four lanes of the sequencing gel sequence. Components required for DNA syn (template, primer, enzyme, buffers, dNTPs) are mixed with a diff ddNTP in each of four tubes. The newly synthesized strands of DNA will terminate at each opportunity to incorporate a ddNTP. The resulting syn products are a series of fragments ending in either A (ddATP), G ddGTP, T ddTTP, C ddCTP. This collection of fragments is the "sequencing ladder". The sets of synthesized synthesized fragments are then loaded onto a denaturing polyacrylamide gel. The products of each of the four sequenincg reactions are loaded into adjacent lanes labeled A C G or T corresponding to ddNTP in the four reaction tubes. Once gel is dried and exposed to x ray film the fragment patterns are visualized by the signal of the 32P labeled primer (or incorporated deoxynucleotide). All fragments from a given tube will end in the same ddNTP. The four lane gel electrophoresis pattern of the products of the four sequencing reactions is called a sequencing ladder. The ladder is read to deduce the DNA sequence. From the bottom of the gel, the smallest (fastest-migrating) fragment is the one in which synthesizes terminated closest to the primer. The identify of the ddNTP at a particular position is determined by the lane in which the band appears. If smallest band is in the ddATP lane then the first base is A. The next larger fragment is the one that was terminated at the next position on the template. The sequence is read from the bottom of the gel (smallest, 5'most)to the top (largest, 3'most)fragments across or within lanes to determine the identity and order of nucleotides in the sequence. Capacity for a sequence read averages over 500 bases per read. con:larger bands on a sequencing gel can be compressed limiting the length of sequence that can be read on a single gel run. Advancement: Using heat-stable enzymes, the sequencing reaction took place in a thermal cycler (cycle sequencing). With cycle sequencing, timed manual starting and stopping of the sequencing reactions were not necessary. Increased number of reactions that could be done. Could run 96 sequencing reactions (sequence 24 fragments) in a 96-well plate. Advanced concepts: PCR products are currently used as sequencing templates. Residual components of the PCR rxn, esp primers and nt can interfere with the sequencing reaction and lower the quality of the sequencing ladder. PCR amplicons can be cleaned using solid-phase (column or bead) matrices, alcohol precipitation, or enzxymatic digestion with alkaline phosphatase. Alternatively, amplicons can be run on an agarose gel and the bands eluted. This method provides not only a clean template but also confirmation of the product being sequenced. It is especially useful when PCR reactions are not completely free of primed bands or pirmer dimers.

The Human Genome Project

TLDR: Sequenced genome order Herpes Virus, Bacteria, The first complete genome seq of a clinically important organism was that of Epstein-Barr virus. 170,000 bp seq was determined using the M13 template preparation/chain termination manual seq method. The idea came about to seq human genome in 1985 and 1986. Was controverisal because of the risk that $2 to $5 billion cost of the projecty might not justify the info gained, most of which would be sequences of "junk" or non-gene coding DNA. There was also no available technology up to the massive task. The sequencing automation and the computer power necessary to assemble the 3 billion bases of the human genome into an organized sequence of 23 chromosomes had not yet been developed. 1987 the first automated DNA sequencing machine was announced. Advances in the chemistry of the sequencing procedure were accompanied by advances in the biology of DNA mapping, which methods such as pulsed-field gel electrophoresis, restriction fragment length polymorphism analysis, and transcript identification. Methods were developed to clone large (500kbp) DNA fragments in artificial chromosomes, providing, providing long contigous sequencing templates. Finally application of capillary electrophoresis to DNA resolution made the sequencing procedure even more rapid on cost efficient. NIH established the Office of Human Genome Rsearch. Plan was to complete 20Mbp of seq of model organisms by 2015. To organize and compare the growing amount of sequence data, the BLAST and Gene Recognition and Assembly Internet Link (GRAIL) alogrithms were introduced. For the human sequence, the decision was made to use a composite template from multiple individuals rather than a single genome from one donor. Human DNA was donated by 100 anonymous volunteers; only 10 of these genomes were sequenced. Not even the volunteers knew if their DNA was used for the project. To ensure accurate and high-quality sequencing, all regions were sequenced 5 to 10 times. 1990-2000 (working draft) was human genome project. Final seq was 2003. Second project with the same goal started. In 1992 Institute for Genomic Research (TIGR). Which compelted the first seq of a free-living organism (haemophilus influenzae) and the seq of the smallest free-living organism (mycoplasma genitalium). Celera company started and Venter proposed to complete the uman genome seq in 3 yrs for $300 million, faster and cheapter than the NIH project. Thus competitive effort on two fronts to seq the human genome. The two projects approached the sequencing differently. The NIH method (hiearchical shotgun sequencing) was to start with sequences of known regions in the genome and "Walk" further away into the chromosomes, always aware of where the newly generated sequences belonged in the human genome map. With Celera their approach (Whole-genome shotgun sequencing) was to start with 10 equivalents of the human genome cut into small fragments and randomly sequence the lot. Then, powerful computers would find overlapping sequences and use those to assemble the billions of bases of sequence into their proper chromosomal locations. This method saw skepticism as human genome contains large amounts of repeated sequences (Which are difficult to sequence and map properly). A random sequencing method would repeatedly cover areas of the genome that are more easily sequenced and miss more difficult regions. Moreover assembly of the whole seq from scratch with no chromosomal landmarks would take a prohibitive amount of computer power. Evenetually the NIH project modified its approach to include both methods. The result of the competition was that the rough draft of the seq was completed by both projects earlier than either group jad proposed, in June 2000. Both groups piblished their versions of the genome. This 2000 draft was a rough draft of the genome as there were still areas of missing seq and sequences yet to be placed. Only chromosomes 21 and 22, the smallest of the chromosomes had been fully completed. over the years the other chromosome sequences were finished. Remaining errors, gaps, and complex gene rearrangements will take years to resolve. The size of the enitre genome is 2.91 Gbp (2.91 billion bp). Genome was initially calculated as 54% AT and 38% GC with 8 % of the bases still to be determined. Chromosome 2 is the most GC rich chromosome 66% and chromosome X has the fewest GC bp (25%). The number of genes estimated to be from20,000 to 30,000 was much lower than expected. The average size of the human gene is 27kbp. Chromosome 19 is the most gene rich per unit length (23 genes/Mbp). Chromosomes 13 and y have the fewest genes per bp (5 genes per Mbp). Only about 2% of the sequences code for genes. Between 30-40% of the genome consists of repeat sequences. There is one single base difference between two random individuals found approx every 1,000 bases along the human dna seq. With proper mapping info, a gene for any disease can now be found by computer, already sequenced, in a matter of minutes. However, most diseases and normal states are driven by a combination of genes as well as by enviornmental influences so not just by a single gene.

Variant Associations with Phenotype

The Human Haplotype Mapping Project: aka Hap Map Project. Goal: to find blocks of sequences that are inherited together, marking particular traits and possibly disease- associated genetic lesions. Would reduce the number of polymorphisms required to examine the entire collection of genome/phenotype associations from the 10 million polymorphism that exist to roughly 500,000 haplotypes. Revealed more than 1,000 disease-associated regions of the genome, covering commonly occurring conditions such as coronary artery disease and diabetes. However retired as other resources like 1,000 Genomes project has become more comprehensive reference for population genomics. The 1000 Genomes Project: Provides a resource of structural variants in diff populations. The project has reconstructured the genomes of over 2,504 individuals from 26 populations by whole-genome sequencing, deep exom sequencing, and dense microarray genotyping in labs in US, UK, China, Germany. Over 88 million variants (84.7 million SNPs, 3.6 million short insertions/deletions, and 60,000 structural variants) were verfied. The resulting database includes more than 99% of single-nt variations with a frequency of greater than 1%. Data from the 1000 Genomes Project is a component of NGS variant assessment, providing more patient-specific interpretation of the clinical significance of variants. All variants from the 1000 Genomes Project are submitted to archives such as dbSNP. The technology developed as part of the Human Genome Project made sequencing a routine method in the clinical lab. In the clinical lab, sequencing is actually resequencing, or repeated analysis of the same sequence region, to detect mutations or to type microorgabisms, making the task even more routine. HapMap uses microarray technology to detect SNPs. NGS projects use next gen sequencing and is more broader as covers more SNPs. Thus there are a lot of SNPs from NGS projects that are not reported in HapMap Massive parallel. or next generation sequencing has supplemented or replaced Sanger sequencing in many critical labs. Accurate and comprehensive sequence analysis is one of the most promising areas of molecular diagnostics.

Electrophoresis

The fluorescent dye colors rather than lane assignment distinguish which nt is at the end of each fragment. Running all four reactions together increases throughput and also eliminates lane to lane migration variations that affect accurate reading of sequence. The migrating fragments pass a laser beam and a detector in the automated sequencer. The laser beam excites the dye attached to each fragment, causing the dye to emit fluorescence that is captured by the detector. The detector converts the fluorescence to an electrical signal that is imaged by the computer software as a flash or peak of color. Fluorescent detection equipment yields results as an electropherogram, rather than a gel pattern. Just as the gel sequence is read from the smallest (Fastest-migrating) fragments to the largest, the sequencing software reads or "Calls" the bases from the smallest (fastest-migrating) fragments that first pass the detector to the largest based on the dye emission wavelength; that is the software calls he base by the "color" of the fluorescence of the fragment as it passes the detector. The electropherogram is a series of peaks of the four fluorescent dyes as the bands of the sequencing ladder migrate by the detector. The software assigns one of four colors-red,black,blue, or green associated with each of the flourescent dyes and a text letter to the peaks for ease of itnerpretation. As with manual sequencing, the ratio of ddNTPs/dNTPS is key to the length of the sequence read (how much of the template sequence can be determined). Too many ddNTPs will result in a short sequence read. Too low a conc of ddNTPs will result in loss of seq data close to the primer but give a longer read because the sequencing enzyme will polymerize further down the template before it incoporates a ddNTP into the growing chain. The quality of the seq (height and separation of the peaks) improves away from the primer and begins to decline at the end. At least 400 to 500 bases can be easily read with most sequencing chemistries.

Gene Panels

The size and application of the sequencing library depend on the selection of genes to be sequenced or gene panels. Gene panels are probes or primer sets designed to amplify specific genes, regions, or entire exomes (All protein coding sequences. NGS might also be performed to compare sequences of many organisms (rRNA genes on microbial speciation) or to detect large numbers of possible base differences in a highly polymorphic gene such as CFTR. Gene panels have been deisnged for disease states such as cardiomyopathies or muscular dystrophy or cancers. These panels range from a few (less than 20) target genes to more than a thousand target genes such as those used for solid organ cancers. "Hot-spot" panels target regions of specific genes known to affect treatment response, disease state, or clinical condition. Variants in these regions are referred to as "actionable" mutations that is, therapeutic or medical measure might be taken as a result of the presence of the variant. Targeted panels include critical genes in particular diseases such as hematological-cancer-specific panels for lymphoid or myeloid disorders or solid-tumor-specific panels for lung, colon, breast, or other cancers. Very large panels of up to 3,000 genes may produce variants of unkown significance that must be assessed by pathologists and oncologists on a pateint specific basis. Whole exome sequencing is a method of gene discovery. This more challenging approach with regard to interpretation has proven beneficial in cases of suspected inherited gene variants. Initially, beyond the scope of clinical analysis, whole-exome and even whole-genome sequencing have been increasingly incorporated in special cases. For routine clinical laboratory work small to medium size 15-500 gene panels account for the majority of sequencing procedures.

Sequence Software Programs

These programs can compare two sequences or test sequences with reference sequences to identify mutations or polymorphisms. Regardless of whether a seq variant (change from a refernece seq) is found, it is improtant to seq both complementary strands of DNA to confirm seq data. This is especially critical for confirmation of mutations or polymoprhisms in a sequence. What mutations/polymorphisms look like 1.Alterations affecting a single base pair may be subtle on an electropherogram, especially if the alteration is in the heterozygous form, or mixed with the normal reference seq. Ideally a genetically heterozygous mutation appears as two peaks of equal height but diff colors directly one on top of another that is, at the same position in the electropherogram. The overlapping peaks should be about half the height of single base peaks. 2.Heterozygous deletions or insertions (e.g. BRCA frameshift mutations) affect all positions of the sequence downstream of the mutation. Ex: heterozygous dinucleotide deletion will see two sequences overlaid: the normal seq A followed by G and then the normal seq minus the two bases so will see a T signal as that is next nt after the deletion. Somatic mutations in clinical specimens are sometimes difficult to detect because they may be diluted by normal sequences that mask the somatic change. Software for capillary electrophoresis comes with instrument or can get online. Software that interprets, compares, or otherwise manipulates seq data is sometimes supplied with instrument or is online. To interpret disease association and pathopgenic significance requires the use of seq databases and clinical trial information. Availble on public websites and instituitional "data commons" collections.

Manual Sequencing

Two Types: Chemical(Maxam-Gilbert) Sequencing and Dideoxy Chain Termination (Sanger) Sequencing

Direct Sequencing

Two types are Manual and Automated. Direct determination of nucleotide sequence or DNA sequencing is most definitive molecular method to identify genetic lesions or polymoprhisms especially when looking for changes affecting only 1 or 2 nt.

Difficult areas to sequence:

centromeres, polymorphic regions (major histocompatibility complex [M H C])

next-generation sequencing for

genomic or population analyses

Link comparing popular types of sequencing such as sequencing by synthesis and pyrosequening

https://www.youtube.com/watch?v=jFCD8Q6qSTM&t=122s sequencing by synthesis illumina https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html

Next-Generation Sequencing (on study guide)

need to watch! https://www.youtube.com/watch?v=jFCD8Q6qSTM&t=122s Oligo array did not provide genomic-scale sequence data with single bp resolution. Next generation sequencing (NGS) also called massive parallel sequencing was designed to sequence large numbers of template carrying millions of bases simultaneously, in a run that takes a few hours. NGS technology has achieved gigabytes of sequencing data for a minimal cost making genomic studies a routine component of both research and clinical analysis. NGS technologies include pyrosequencing, reversible dye terminator sequencing, ion-conductance sequencing, single-molecule sequencing, and sequencing by ligation. NGS requires novel methods of template preparation, such as emulsion PCR and bridge PCR, or single molecule capabilities. NGS requires strong computer support as well as terabytes of storage space to accommodate large raw data sets. To prepare for NGS, clinical laboratories establish secure information channels. and allocate space for preparation, loading, and operation of sequencers. Report templates are designed by the laboratory or commercial vendors and bioinformatics services. Two NGS technologies account for the majority of clinical sequencing applications: ion-conductance (pH) and reversible dye terminator sequencing. Both methods require the preparation of a sequencing library, sets of 100- to 500-bp size fragments representing the regions to be sequenced. A library can represent a whole genome or a few specific gene regions where critical variants are likely to occur.


Conjuntos de estudio relacionados

ADN240 "COMMUNICATION" [ISB QUIZ]

View Set

Economics Final, Economics Quiz 1, Economics Test (1), Economics Quiz 2, Plato Economics

View Set

Microbiology-Chapter 1 (Microbiology Introduction)

View Set