Genomics Test 1

Ace your homework & exams now with Quizwiz!

If I sequenced 53,000 plasmids of approximately 800 bp, what is the probability that I have sequence any given sequence of a 1000 MB genome

0.0415 N=ln(1-P)/ln(1-a/b) N= 53,000 (# clones) P= don't know. Prob. Library containing desired piece of DNA A= 800 (avg. size of DNA insert) B= 1000MB (convert to bases--(size of genome)

Describe three key differences between Illumina and Sanger sequencing.

1. In Illumina sequencing small fragments must be created in order to be bridge amplified, but the total output is very large (Gb) though composed of small fragments, whereas in Sanger sequencing much larger fragments can be used and amplified, resulting in longer reads, but a smaller total output 2. Higher coverage in Illumina than Sanger. 3. Illumina is parallel but Sanger is one sequence at a time 4. they have two different amplification methods. Illumina uses flow cell bridge amplification and Sanger uses PCR-like amplification.

How does a phred value of 10 compare to 30?

10: 1/10 error (calling wrong base, 90% accuracy) 30: 1/1000 error (99% accuracy)

How many plasmids of 1.5 kb are needed to cover a 145 kb BAC at 15x coverage (not a trick question)?

1450 ((145/1.5)x15)

Differences between 454, sanger, and Illumina

454-- cheapest and fastest. DNA is fragmented into shorter reads from 400-600bp. Bead based sequencing. Luciferase emitted light. Illumina-- fragmented into shorter reads. Uses bridge amplification and synthesis sequencing. Flow cell based sequencing. Fluorescent reversible sanger-- DNA is copied and fragmented. ddNTPs based sequencing. fluorescent terminating bases.

Was the H. influenzae genome assembled only from the shotgun clones? Explain.

78% was covered by lambda clones. This is because some frags of the genome would be nonclonable in high copy plasmid because they would produce deleterious proteins in E. coli host cells.

If a genome is sequenced at 30x coverage, approximately what percentage of the genome remains unsequenced?

9.36x10-14. Most is sequenced. Probability of missing spot= e^(-coverage)

pcr duplication

A bias originating from PCR. During PCR some fragments may be preferentially amplified and thus more present within your library. These duplicate fragments can waste sequencing resources by having the same fragment sequenced multiple times.

chimeric sequence

A chimeric sequence has pieces of DNA from 2 distinct genomic positions. This usually takes place when a DNA strand under amplification is terminated early and binds to a foreign DNA and replication continues.

Cluster generation

A cluster of bridge amplified DNA (very similar to PCR, but with bridge amplification).

What is a flow-cycle/ sequencing cycle?

A cycle where a set of the same dNTP is washed over beads containing template DNA within wells. If the dNTPs bind, fluorescent light is released, detected by computer and base call is recorded. Remaining dNTPs are removed and another set is washed.

Phospholinked nucleotides

A different colored linker (fluorophore) is attached to the triphosphate chain which is naturally cleaved as the polymerase incorporates the nucleotide

homo-polymer

A homopolymer is formed from multiple repeats of the same monomer (AAAAAAAAAAAAAAAAAAAA...).

nucleotide ambiguities

A location on a DNA sequence where the nucleotide at a specific position is not clear. Instead we use ambiguities showing it could be any combination of nucleotides depending on the letter abbreviation used.

Bioanalyzer

A machine which quantifies the amount and size range of DNA or RNA in a sample using chip-based electrophoresis. Also gives quantification of concentration but not reliable.

What is a physical map and how does it differ from a genome sequence?

A physical map is constructed from restriction enzyme cutting sites, STS markers, FISH, etc to see through a map how contigs overlap. A genome sequence relies on the combination of DNA sequence letters to determine overlap.

PTP or chip

A plate that has many microscopic wells that allow for sequencing of millions of fragments in a 454 reaction.

Chromatogram

A visible record showing the result of separation of the components of a mixture by chromatography. Higher peaks show higher confidence scores, in order to determine nt.

sff file

An SFF file encodes flowgrams from 454 pyrosequencing

What step used human manipulation during the construction the FPC fingerprint data?

Assembling contigs together by comparing end clones from each contig to other end contigs and allowing a lower (50%) stringency. This reduced the number of contigs by five fold. Also identifying chimeric contigs and getting rid of them.

BAC library

Bacterial Artificial Chromosome. Bacterial clones with DNA inserted from a single organism.

DNA shearing

Breaking genomic DNA into small multi-hundred bp fragments.

Gap closure

Closing the gaps left in the consensus sequence after assembly

CMOS factory

Complementary metal-oxide semiconductor. The type factory that makes the common semiconductor chips which encode the sequence data from the sequencing run.

Sanger seqencing

DNA is amplified then denatured to produce template strand. Primer is annealed to 5' end of this strand and then the DNA is dispersed into four reaction vessels. DNA pol and all four dNTPs are added into the four vessels as well. Modified ddNTPs are added, one specific to each vessel (ddATP,ddTTP,C,G). DNA Pol attached dNTPs to template until ddNTP is paired. This allows for DNA frags to be diff lengths in all reaction vessels. Polyacrylamide gel sequences DNA (because can separate DNA strands of 1 bp length difference). Sequence read from bottom up.

How does pyrosequencing work?

DNA pol adds nts. ATP sulfurylase generates ATP and luciferase (an ATPase) breaks ATP bonds to convert to light by catalyzing lucriferin to oxyluciferin. Magnetic beads are coated with ssDNA frags and then PCR is run on each and each bead is placed into separate microscopic wells. These wells receive DNA pol, ATP, sulfurylase, lucferase and luciferin.

Pyrosequencing/ 454

DNA sequencing technology that is based on the generation and detection of a pyrophosphate group liberated from a nucleotide triphosphate. The pyrophosphate is converted to ATP and then energy is released as light. Remaining unincorportaed nts and ATP are degraded by apyrase.

Dephasing and sequence phase

Dephasing is "noise" within the fluorescence. It could happen from a portion of the templates on a bead not incorporating the maximal number of nucleotides, so they are one step behind the other templates on the same bead. This can happen successively causing more and more dephasing.

How is nt label different with PacBio technology than other platforms?

Each nt is labeled with diff colored flourophore creating natural DNA strand which produces light pulses when base is held in detection volime. . During incorporation the phosphate chain is cleaved and the attached fluorophore is released

emPCR

Emulsion PCR takes place when sheared gDNA fragments are attached to adaptors, which are bound to beads containing complementary adaptors (one fragment per bead). Oil is then added to the water/DNA/buffer mixture to compartmentalize each bead. The typical PCR process of denaturation, annealing, and replication then takes place on each bead.

FASTA file and quality file relationship

FASTA file shows the nucleotides sequence indicated by single letters. Quality file shows the error rate of the incorporated nts by the DNA pol. These together show the sequence and the accuracy of it.

FPC

Fingerprint data - the pattern of DNA migration in a gel to determine which stretch of DNA is found in the gel (size, restriction enzyme cuts).

Why is amplification of templates often required for the fluorescent detection of an added base during sequencing?

Fluorescent imaging system is not sensitive enough to detect signal from a single template molecule and needs the amplification of template molecules on a solid surface.

HMW DNA

High Molecular Weight DNA (genomic DNA larger than 150Kb)

What are sequencing adaptors?

In the case of Illumina sequencing, they are the Y-shaped adapters that have a double-stranded end and a non- basepaired end (the Y-end) that are ligated onto both ends of your sheared and repaired DNA. The Y-shape insures that after PCR amplification, each molecule will have a P5 and a P7 adaptor on their respective ends. In general, sequencing adapters are DNA that is added to the ends of genomic DNA fragments that have a known sequence that is required for the high-throughput sequencing platform. Y-shaped adaptors for Illunima (or P5 and P7), A and B adaptors for 454 and Ion Torrent, etc.

Why is a large amount of glutamate needed to culture H. influenzae?

It is directed into the TCA cycle but, in the absence of TCA cycle enzymes (which genes aren't found in the genome), glutamate serves as the source of carbon for biosynthesis of amino acids that branch from the TCA cycle to allow for energy so the free-living organism H. influenzae can stay alive and produce energy in the form of ATP

What is two base encoding in SOLiD sequencing and why is this superior to other methods?

It is encoding based on ligation sequencing rather than sequencing by synthesis. This changes the color of two adjacent color space calls, therefore in order to miscall a SNP, two adjacent colors must be miscalled.

SMRT sequencing

It is literally Single Molecule Real-Time sequencing. The polymerase is immobilized at the base of the ZMW (zero-mode wavelengths) chamber and each of the four types of nucleotides is labeled with a different fluorophore, attached to the gamma phosphate of each dNTP. The fluorophore is cleaved off upon nucleotide incorporation. It's fluorescence is detected by the computer.

Why is size selection used in preparation of DNA libraries?

It is used to have a defined size range of DNA molecules that you use to make your libraries so that you know that the DNA is the correct size for your sequencing platform and also you know the distance between paired-end reads during genome assembly. No sequencing platform could sequence whole intact chromosomes thus they need to be broken down into known size fragments.

What does a phred value of 24 mean?

It means that there is a less than 1 in 251 chance that that base call is wrong. The software is over 99% confident in its base call.

In sanger sequencing, which nts are fluorescently labeled?

Label can be attached to 5' end of primer so all bands appear at same intensity OR dNTP can be labeled which results in smaller frags showing up as dimmer bands.

Two types of DNA shearing

Mechanical shearing (nebulizing), and acoustic shearing (Covaris)

minimum tiling path

Minimum set of BACs that contain a whole chromosome with minimal overlap

Describe the enzymatic activities that take place on each Illumina colony.

On each Illumina colony there is a lawn of forward and reverse oligonucleotides with single stranded DNA bound to some of them. These single stranded DNA molecules bind to the opposite oligonucleotide and DNA polymerases duplicate the strands, which are then denatured so those strands can bind again and duplicate again with the DNA polymerase. Enzymes also play a role in the sequencing reaction by adding single nucleotides with blockers with reversible terminator fluorescent molecules. These fluorescent molecules are cleaved by other enzymes to allow the next nucleotide to add.

How many colors of light are used in pyrosequencing/ 454?

One color because bases are washed across one at a time and there is only one enzyme doing the fluorescing.

cyclic reverse termination

Only one base is added per round. A reversible terminator is on every nucleotide to prevent multiple additions in one round. Using the four-color chemistry, each of the four bases has a unique emission, and after each round, the machine records which base was added, the nt is cleaved, and then washing is performed before next round.

PGM

Personal genome machine. Meant for small genomes and targeted sequencing

STS marker

Sequence Tagged Site are short sequences of gDNA that can be uniquely identified and recognized in a DNA sequence.

Why does DNA need to be sheared?

So the NGS machine has fragments small enough that it can sequence the length of the fragment.

SPRI beads

Solid phase reversible immobilization beads. For purification of PCR amplified colonies to prep for sequencing. Magnetic only in magnetic field so they don't clump and fall out of a solution.

What needs to be reversed on the incorporated nt?

The 3' blocker and fluorescent dye attached to the incorporated nucleotide need to be cleaved off after each round of incorporation and detection to allow a free 3'-hydroxyl and the 3' end to continue extension.

Why do we need large amounts of template for each individual fragment?

The chain will extend a longer amount before the terminating analogue is inserted due to probability.

SOLiD sequencing does not use DNA polymerase. What does it use instead?

The complementary probe hybridizes to the template sequence and is ligated by ligase.

flow cell

The glass base plate with 8 channels upon which oligonucleotides are bound which then are where the sheared DNA fragments are immobilized.

How were lambda libraries used in the assembly process?

The lambda clones were used to close the gaps in the assembly consensus sequence by looking at the ends of the contigs surrounding a gap and designing oligonucleotide probes for those sites. Upon finding two probes a good distance apart, primers were ordered based on those sequences to close the gaps from the lambda clone library DNAs.

What are the differences between 454 and Ion Torrent sequencing?

The library prep methods for the two are almost identical involving emPCR and harry beads. Both use PTPs to trap the beads for sequencing, both flow in one type of nucleotide at a time (SNA) and can have multiple bases added at once, but the difference is the method of detection. 454 converts the pyrophosphate released from the incorporation of a nucleotide into light that is detected, whereas Ion Torrent detects hydrogen ions that are released upon nucleotide incorporation and measures a pH change

max. read length

The maximum length of a sequence reads output by Illumina.

PostLight

The method of sequencing involving semiconductor chips. This greatly reduces the cost of sequencing because light and fluorescence is not involved, a chemical change is sensed.

What is meant by 'PostLight' sequencing?

The method of sequencing involving semiconductor chips. This greatly reduces the time of sequencing because light and fluorescence is not involved. It is directly written to a computer chip.

chip density

The more dense the chip the more information it can hold.

Ionogram

The name of the format you can view base calls in after an Ion Torrent sequencing run. Shows # of bp incorporated per well and flow # with corresponding bp.

color space

The three-dimensional space, established because color perception is based on the outputs of three cone types, that describes the set of all colors. Translated to nt or base pair to be understood.

What is the max output of an Illumina run?

The total output of an Illumina run on a HiSeq 2500 HT v4 is ~1 Tb and an individual lane can sequence ~62.5 Gb.

What are the three types of beads used in 454 sequencing?

There are four types of beads: The first is the streptavidin coated beads that are used to insure that only DNA fragments with both an A and a B adapter are in the library. The second type is the primer coated capture beads that capture the original strand of DNA and are the base of replication for emPCR. The third type of bead is enzyme coated with ATP Sulfurylase and Luciferase to create the fluorescent light used for base detection. The fourth type are packing beads used to fill remaining space in all the wells of the pico-titer plate.

In addition to 'gluatamate', name two other unique things identified in the genome sequence.

There was more than a sixfold redundancy across the genome. H. influenzae is rich in AT pairs. G+C nt content is 38%.

bridge amplification

When a DNA fragment on an Illumina colony with attached ends binds to other identical attached ends in the same colony. The sequence between these attached ends is duplicated (forming a bridge between the two) and then the ends are released and they bind to other similar ends.

A-tailing

Where a sheared DNA sequence is adenylated with an overhanging 3'- A at each end at which site the adaptors will bind.

Scaffolds

a compilation of DNA sequence contigs into one digital chromosome, or section of a chromosome.

physical map contig

a contig made up of physical map markers (STS, FISH, Restriction Enzyme Cut Sites) NOT DNA sequence data

sequence contig

a contig that is made of DNA sequence data

Physical gap

a gap between contigs/scaffolds whose DNA sequence is not found within clone library

Sequence gap

a gap in the DNA sequence whose clone is in our library

Sequence contigs

a group of overlapping cloned segments that have been assemlbed by a computer system to form contigs

restriction patterns

a pattern from a BAC that is cut by a restriction enzyme and run out on a gel to determine the exact cutting sites (and position/amount of overlap) in relation to the other contigs

vector sequence

a plasmid that contains a priming site for sequencing primer for where your unknown fragment would be inserted.

How does Illumina prepare a DNA library?

a. HMW genomic DNA is sheared (see question 2), size selected and quantified. b. The ends of the DNA fragments are repaired so they are blunt and have a 5'-phosphate on each end. c. An additional adenosine is added to the 3'ends d. Y-shaped adaptors are ligated to each end of the DNA molecules e. A few rounds of PCR are performed so each molecule now has a double stranded P5 adaptor on one end and a double stranded P7 adaptor on the other. f. The complete library is now quantified and then denatured to make it single stranded g. The library is added to the flow cell and bridge amplification or cluster generation is performed on the cBOT machine.

biotinylated-adapters

attaching a biotin molecule to the 5' end of the adapter so that end will bond to the "streptavidin coated beads" to select for correctly linked templates.

primer-coated capture beads

beads coated with oligonucleotides that are complimentary to the adapters that bind to the bead. This allows the bead to bind the DNA allowing duplication. These beads after replication (when they are 'hairy') are deposited on the picotiter plate and covered in enzyme coated beads and packing beads.

ddNTP

dideoxyribonucleoside triphosphate lack a hydroxyl group (OH) at 3' carbon and prevents phosphodiester bind formation

FISH

fluorescent in situ hybridization. Allows fluorescent probes to be hybridized to DNA sequence to show location of gene on chromosome

coverage

number of times a sequence was sequenced

flowgram/pyrogram/ionogram

output file that shows base calls for the 454 reaction (usually in SFF file)

contig

overlapping sequence reads that have been assembled by a computer program to form contigs

minION

portable, real-time device for DNA and RNA sequencing. Each flow cell can generate as much as 30Gb of DNA sequence data. new third-generation sequencing format that uses nano-pores and the detection of electrical changes as single stranded DNA passes through the pore one nucleotide at a time to accomplish single molecule sequencing. This format is portable, plugs into a usb port and can sequence long read lengths which are determined by the size of the template DNA and the amount of time the sequencing is allowed to run.

Phred value

property logarithmically related to base-calling. phred=30, error rate is 1/1000.

SBL

sequence by ligation,The method of sequencing used by SOLiD where no DNA polymerase is used, but instead, DNA ligase is used to ligate labeled oligonucleotides that reveal the sequence through hybridization.

SBS

sequence by synthesis. Sequencing like Illumina, Sanger, 454, Ion torrent where DNA polymerase is used to add nucleotides to the growing DNA molecule which are then detected to indicate the sequence (with fluorescence, pyrophosphate production or voltage/pH change).

singleton

sequence reads that don't assemble to any contig. Probably debris or errors.

SNA

single nt addition. when one type of nucleotide is added at a time and then washed off

FASTQ file

text file containing sequence data from clusters that pass filter on flow cell. Show nt sequence and quality score. Line with @ contains sequence ID. On eo more lines contain sequence. New line starting with + is empty or repeats sequence ID. Line contains quality score.

FASTA file

text-based format for representing nt sequence. nts are represented as single letters. Each record starts with header line > followed by sequence ID. Next line is actual sequence.

consensus sequence

the assembled sequences with ambiguities resolved by coverage

assembly

the assembly of all the contigs sequenced to create one long continuous strand of DNA composed of many contigs and scaffolds.

DNA library

the library of sheared DNA fragments that will be attached to the flow cell base plate

316/318

two types of chips for the PGM

Thermocycler sequencing

use thermophilic DNA pol and heat denaturation to separate dsDNA. Primers also have to withstand high temps.

Ion Torrent Sequencing

uses semiconductor chip to translate chemical information into base calls. A sample is cut up into millions of fragments and then each fragment is attached to its own bead. Frags are replicated on their respective beads. Beads flow across chip, one in each well. Chip is flooded with one dNTP. The pairing allows for H+ ion to be released which ion torrent system recognizes because the pH changes the solution in the well. Washed out and then repeated with the next nt.

Dephasing

when a sequence becomes out of phase. In the case of Illumina sequencing this is usually when a nucleotide either lacks a fluorophore or a terminator so the extention doesn't stop with only one base and it skips a space in the frame. This is also caused by homopolymeric runs inn 454 sequencing

WGS

whole genome shotgun sequencing. Genome is fragmented and then sequences in small frags into larger whole


Related study sets

Lesson 8: The Geography of Latin America

View Set

Google Display Ads Study Questions

View Set

EAQ #1 Peripheral & Central IV Therapy

View Set

Psychology & Health: Adherence (Studies) [A-Level Psychology 9698]

View Set

Medical Terminology Final - CHAPTER 8

View Set

PATHO: Chapter 10: Altered Neuronal Transmission

View Set

RNSG 1513 Foundations of Nursing Exam 2

View Set