MmBio 468 Exam 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

Sanger sequencing

(Chain terminator method): method in which DNA is amplified, denatured, primers are added, and ddNTPs are allowed to bind to the segments of DNA

Why is size selection used in the preparation of DNA libraries?(Illumina)

-defined size range of DNA is the correct size for your sequencing platform -know the distance between paired-end reads during genome assembly -No sequencing platform could sequence whole intact chromosomes thus they need to be broken down into known size fragments.

If I sequenced 53,000 plasmids of approximately 800 bp, what is the probability that I have sequence any given sequence of a 1000 MB genome (see slides 8-10)?

0.0415 or 4.15% chance that you sequenced it. • N=ln(1-P)/ln(1-a/b) - N is the number of clones (53,000) - P is the probability of the library containing the desired piece of DNA (?) - a is the average size of the DNA insert (800) - b is the size of the genome (1000MB or 1,000,000,000 bases) • P=1-(1-a/b)N • P = 1-(1-800/1000000000)53000 • P = 1-.958486 = .0415 or 4.15% chance of piece of DNA being in the library • Or • use (p) probably of mission a spot = e-coverage • Coverage = (53,000 x 800)/1,000,000,000 = .0424 • p = e-0.0424 • Probability of mission a spot = 0.9584 or 95.84% • 100% - 95.84% = 4.16% chance of piece being there

If a genome is sequenced at 30x coverage, approximately percentage of the genome remains unsequenced?

1-9.36x10^-14 which results to essentially zero percent data left unsequenced. Probability of missing spot = e-coverage

How many plasmids of 1.5 kB are needed to cover a 145 kB BAC at 15x coverage (not a trick question)?

1450 = ((145/1.5)x15)

pcr duplication(Illumina)

A bias originating from PCR. During PCR some fragments may be preferentially amplified and thus more present within your library. These duplicate fragments can waste sequencing resources by having the same fragment sequenced multiple times.

Chimeric sequence(Illumina)

A chimeric sequence has pieces of DNA from 2 distinct genomic positions. This usually takes place when a DNA strand under amplification is terminated early and binds to a foreign DNA and replication continues.

Cluster generation(Illumina)

A cluster of bridge amplified DNA (very similar to PCR, but with bridge amplification). Takes place inside the cBOT machine.

What is a flow-cycle (or more generally a 'sequencing cycle')?(454)

A cycle in which a set of the same dNTP is washed over the beads containing the template DNA within wells in the PTP and if they bind to the template DNA, light is released, detected by the machine, and the base call is recorded. Remaining dNTP's are removed and another set is washed.

Phospholinked nucleotides (PacBio)

A different colored fluorescent molecule is attached to the gamma phosphate of each nucleotide, which is naturally cleaved off as the polymerase incorporates the nucleotide

homo-polymer(454)

A homopolymer is formed from multiple repeats of the same monomer (AAAAAAAAAAAAAAAAAAAA...).

nucleotide ambiguities(WGS)

A location on a DNA sequence where the nucleotide at a specific position is not clear. Instead we use ambiguities codes (R, Y, W, S...) showing it could be any combination of nucleotides depending on the letter abbreviation used. These ambiguities can be resolved by looking at enough individual sequences that cover the ambiguity to see what is most common nucleotide (the consensus)

Bioanalyzer

A machine which quantifies the size range of DNA or RNA in a sample. It also give a quantification of the concentration, but is not reliable, and other methods should be used

Pyrosequencing (454)

A method of DNA sequencing based on the sequencing by synthesis principle developed in 1996. Pyrosequencing relies on the production of pyrophosphate after a nucleotide is added to the DNA copy of the template. This pyrophosphate is converted to ATP and then the energy of the ATP is released as light as the result of many different enzymatic reactions. The remaining unincorporated nucleotides and ATP are then degraded by apyrase so the next nucleotide wash and sequencing by synthesis can begin.

cyclic reversible termination(Illumina)

A method that comprises nucleotide attatching, fluorescence, and cleaving as a repeatable-stacking process (see question 5 below).

What is a physical map and how does it differ from a genome sequence?

A physical map is based on the use of restriction enzyme cutting sites, STS markers, FISH, and other data to map which contigs overlap, whereas genome sequences rely on the combination of DNA sequence 'letters' to determine overlap.

PTP or chip (Pico-titer Plate)(454)

A plate that has many microscopic wells that allow for sequencing of millions of fragments in a 454 reaction.

Describe the file format definition of a FASTA file, a quality file, and their relationship.

A quality file shows all of the quality scores that correspond to sequence in a fasta file. A fasta file has line of the sequence name followed by the sequence in new lines. A quality file has a line of sequence name, then line of sequence, then a line with a "+", followed by lines of symbols that have numeric values.

SPRI beads

A simple method of DNA cleanup and purification using paramagnetic beads to prep for sequencing. It is simple and reproducible enough to be automated. SPRI beads are also used for size selection.

Why is amplification of templates often required for the fluorescent detection of an added base during sequencing?(Illumina)

A single fluorescent molecule isn't visible to the Illumina computer sensor, so multiple amplified templates are necessary to create a fluorescence signal strong enough for the computer to visualize.

sff file(454)

An SFF file encodes flowgrams from 454 pyrosequencing.

DNA shearing

Breaking genomic DNA into small multi-hundred bp fragments.

Gap closure

Closing the gaps left in the consensus sequence after assembly

Describe the activities that take place during the pyrosequencing reaction (enzymes, beads, and substrates).(454)

DNA Polymerase - Adds Nucleotides. Only one nucleotide type is added at a time ATP Sulfurylase - Generates ATP from pyrophosphate released by incorporation of dNTPs Luciferase - ATP bonds broken and converted to light Apyrase - Degrades excess nucleotides and ATP Beads (see question 5 below)

Dephasing and sequence phase(454)

Dephasing is "noise" within the fluorescence. It could happen from a portion of the templates on a bead not incorporating the maximal number of nucleotides, so they are one step behind the other templates on the same bead. This can happen successively causing more and more dephasing.

How is the nucleotide label different with PacBio technology than other platforms? (PacBio)

Each of the four types of nucleotides is labeled with a different fluorophore, attached to the gamma phosphate of each dNTP. The fluorophore is cleaved off upon nucleotide incorporation by the polymerase anchored at the bottom of the ZMW well. It's fluorescence is detected by the machine.

emPCR(454)

Emulsion PCR takes place when sheared gDNA fragments are attached to adaptors, which are bound to beads containing complementary adaptors (one fragment per bead). Oil is then added to the water/DNA/buffer mixture to compartmentalize each bead in tiny aqueous droplets. The typical PCR process of denaturation, annealing, and replication then takes place on each bead.

DNA polymerase

Enzyme that adds nucleotides to a strand of DNA primer: short single strand DNA that is a template for DNA synthesis quality value: scores that informs the percentage that the nucleotide is correct

FPC

Fingerprint data - the pattern of DNA migration in a gel to determine which stretch of DNA is found in the gel (size, restriction enzyme cuts).

FISH

Fluorescent In Situ Hybridization - a technique which allows fluorescent probes to be hybridized to a complimentary DNA sequence to show the location of that site or gene on a chromosome.

How does Sanger Sequencing work?(Sanger)

For Sanger sequencing, you must amplify the DNA, denature it, add primers, then add nucleotides as well as ddNTPs. Different lengths of DNA are produced and separated via electrophoresis. You can determine the sequence via the gel.

HMW DNA

High Molecular Weight DNA (genomic DNA larger than 150Kb)

Describe three key differences between Illumina and Sanger sequencing.

In Illumina sequencing small fragments must be created in order to be bridge amplified, but the total output is very large (Gb) though composed of small fragments, whereas in Sanger sequencing much larger fragments can be used and amplified, resulting in longer reads, but a smaller total output. Also they have two different amplification methods. Illumina uses flow cell bridge amplification and Sanger uses PCR-like amplification. You also get much higher coverage in Illumina sequencing than in Sanger. Illumina is massively parallel whereas Sanger is one sequence at a time.

What are sequencing adapters?(Illumina)

In the case of Illumina sequencing, they are the Y-shaped adapters that have a double-stranded end and a non- basepaired end (the Y-end) that are ligated onto both ends of your sheared and repaired DNA. The Y-shape insures that after PCR amplification, each molecule will have a P5 and a P7 adaptor on their respective ends. In general, sequencing adapters are DNA that is added to the ends of genomic DNA fragments that have a known sequence that is required for the high-throughput sequencing platform. Y-shaped adaptors for Illunima (or P5 and P7), A and B adaptors for 454 and Ion Torrent, etc.

SMRT sequencing

It is literally Single Molecule Real-Time sequencing. The polymerase is immobilized at the base of the ZMW chamber and each of the four types of nucleotides is labled with a different fluorophore, attached to the gamma phosphate of each dNTP. The fluorophore is cleaved off upon nucleotide incorporation. It's fluorescence is detected by the machine.

SOLiD sequencing does not use DNA polymerase. What does it use instead? (solid sequencing)

It uses DNA ligase (SBL) to ligate hybridized oligonucleotides to the growing DNA molecule during sequencing. Unlike most other sequencing technologies, no DNA polymerase is involved.

Name two types of DNA shearing(Illumina)

Mechanical shearing (nebulizing), and acoustic shearing (Covaris)

Was the H. influenzae genome assembled only from the shotgun clones?

No. Approximately 78% of the genome was covered by lambda clones.

Describe the enzymatic activities that take place on each Illumina colony.(Illumina)

On each Illumina colony there is a lawn of forward and reverse oligonucleotides with single stranded DNA bound to some of them. These single stranded DNA molecules bind to the opposite oligonucleotide and DNA polymerases duplicate the strands, which are then denatured so those strands can bind again and duplicate again with the DNA polymerase. Enzymes also play a role in the sequencing reaction by adding single nucleotides with blockers with reversible terminator fluorescent molecules. These fluorescent molecules are cleaved by other enzymes to allow the next nucleotide to add.

PGM

Personal Genome Machine, Ion Torrent's current sequencing machine meant for small genomes and targeted sequencing.

STS marker

Sequence Tagged Site are short sequences of gDNA that can be uniquely identified and recognized in a DNA sequence.

Color space (solid)

Sequences that are detected and encoded by different colors rather that are then aligned by color and not sequence, or directly translated into sequence space (As, Ts, Gs and Cs). Color space is used in SOLiD sequencing and assembly.

Why does DNA need to be sheared?(Illumina)

So the NGS machine has fragments small enough that it can sequence the length of the fragment.

BAC library

Stands for Bacterial Artificial Chromosome. It is a group of bacterial clones that have the DNA from a single organism inserted into their own. These libraries can cover any amount of DNA, proportional to the number of colonies. Each individual BAC can contain an ~130kb fragment of genomic DNA

What needs to be 'reversed' on the incorporated nucleotide?(Illumina)

The 3' blocker (3′-O-azidomethyl) and fluorescent dye attached to the incorporated nucleotide need to be cleaved off after each round of incorporation and detection to allow a free 3'-hydroxyl and the 3' end to continue extension.

flow cell (or channel)(Illumina)

The glass base plate with 8 channels upon which oligonucleotides are bound which then are where the sheared DNA fragments are immobilized.

How were lambda libraries used in the assembly process?

The lambda clones were used to close the gaps in the assembly consensus sequence by looking at the ends of the contigs surrounding a gap and designing oligonucleotide probes for those sites. Upon finding two probes a good distance apart, primers were ordered based on those sequences to close the gaps from the lambda clone library DNAs.

What are the differences between 454 and Ion Torrent sequencing? (ion torrent)

The library prep methods for the two are almost identical but the difference is the method of detection. 454 converts the pyrophosphate released from the incorporation of a nucleotide into light that is detected, whereas Ion Torrent detects hydrogen ions that are released upon nucleotide incorporation and measures a pH change.

max. read length(Illumina)

The maximum length of a sequence reads output by Illumina.

PostLight (ion torrent and 454)

The method of sequencing involving semiconductor chips. This greatly reduces the cost of sequencing because light and fluorescence is not involved. It is directly written to a computer chip.

What is meant by 'PostLight' sequencing? (Ion Torrent)

The method of sequencing involving semiconductor chips. This greatly reduces the time of sequencing because light and fluorescence is not involved. It is directly written to a computer chip.

chip density

The more dense the chip used in the 316 or 318 format the more individual beads and sequences you can get off a single chip/run.

Ionogram

The name of the format you can view base calls in after an Ion Torrent sequencing run.

pyrogram/flowgram/ionogram(454)

The output file that shows base calls for the 454 reaction (usually in the sff file format described above).

minimum tiling path

The tiling path is the minimum set of BACS that contain a whole chromosome with the minimum overlap possible.

What is the maximum output of an Illumina run? Per lane?(Illumina)

The total output of an Illumina run on a HiSeq 2500 HT v4 is ~1 Tb and an individual lane can sequence ~62.5 Gb.

CMOS factory

The type factory that makes the common semiconductor chips. These chips are used in Ion Torrent sequenicing.

2-base encoding

The type of encoding for sequencing done with the SOLiD method where each incorporation of an oligonucleotide give you the sequence for two nucleotides instead of just one.

How many types of beads are used in 454 sequencing and what is their function?

There are four types of beads: The first is the streptavidin coated beads that are used to insure that only DNA fragments with both an A and a B adapter are in the library. The second type is the primer coated capture beads that capture the original strand of DNA and are the base of replication for emPCR. The third type of bead is enzyme coated with ATP Sulfurylase and Luciferase to create the fluorescent light used for base detection. The fourth type are packing beads used to fill remaining space in all the wells of the pico-titer plate.

What does a phred value of 24 mean?

There is a .39% chance that the nucleotide is wrong.

How many fluorescent colors are used in pyrosequencing? Why?(454)

There is just one color of light because the bases are washed across one at a time and there is only one enzyme producing light. In fact it is not even fluorescent light that is emitted.

In addition to 'gluatamate', name two other unique things identified in the genome sequence.

There were none of the NtrC class regulators found in E. coli, suggesting a different regulatory system from E. coli. Also there was no CpxR regulator found. (Anything from the paper was a correct answer here).

Why is a large amount of glutamate needed to culture H. influenza?

They found in the paper that it lacks specific genes that code for enzymes in the TCA (Tricarboxylic acid) cycle required to gain carbon for the synthesis of amino acids. Glutamate can be converted by alpha-ketoglutarate to be usable in that cycle.

What is two base encoding in SOLiD sequencing and why is it theoretically superior to other methods? (Solid sequencing)

Two base encoding is that each oligonucleotide that hybridizes to the unknow sequence you are sequencing matches two consecutive bases at a time, thus the color of the fluorophore that is detected represents not one nucleotide, but a combination of two nucleotides, and has to be translated into sequence by knowing the identity of the previous nucleotide. This is theoretically superior in that each base is actually sequenced twice using the SOLiD method, thus increasing the confidence of the base call and especially SNPs that may be in the genome.

316/318

Two types of chips for the PGM

bridge amplification(Illumina)

When a DNA fragment on an Illumina colony with ligated adaptor ends hybridizes to other complementary attached primers in the on the plate. The sequence between these attached primers is duplicated (forming a bridge between the two) and then the ds molecules are denatured and they hybridize to other complementary primers, thus amplifying again and again.

A-tailing(Illumina)

Where a sheared DNA sequence is adenylated with an overhanging 3'- A at each end at which site the adaptors will bind.

WGS

Whole Genome Shotgun sequencing method, in which the genome is fragmented and a specific size range of fragments is sequenced.

Scaffolds

a compilation of DNA sequence contigs into one digital chromosome, or section of a chromosome. This still contains sequence gaps or physical gaps

physical map contig

a contig made up of physical map markers (STS, FISH, Restriction Enzyme Cut Sites) NOT DNA sequence data

sequence contig

a contig that is made of DNA sequence data

Fastq

a file format used to store a nucleotide sequence and its quality scores

Physical gap

a gap between contigs/scaffolds whose DNA sequence is not found within our clone library

Sequence gap

a gap in the DNA sequence whose DNA is in our clone library

MinION

a new third-generation sequencing format that uses nano-pores and the detection of electrical changes as single stranded DNA passes through the pore one nucleotide at a time to accomplish single molecule sequencing. This format is portable, plugs into a usb port and can sequence long read lengths which are determined by the size of the template DNA and the amount of time the sequencing is allowed to run.

ddNTP (Sanger)

a nucleotide missing both oxygens on the 2' and 3'. No regular nucleotides can bind to it

restriction patterns

a pattern from a BAC that is cut by a restriction enzyme and run out on a gel to determine the exact cutting sites (and position/amount of overlap) in relation to the other BACS

Vector sequence(Sanger) and Thermocycler

a piece of DNA that is used to transfer DNA into a cell Thermocycler sequencing: lab machine used to amplify DNA using PCR

Describe the process of Illumina DNA library preparation (i.e. up to, but not including the sequencing cycles).(Illumina)

a. DNA is sheared, size selected and quantified. b. The ends are repaired so they are blunt and have a 5'-phosphate on each end. c. adenosine is added to the 3'ends d. Y-shaped adaptors are ligated to each end e. rounds of PCR are performed so each molecule now has a double stranded P5 adaptor on one end and a double stranded P7 adaptor on the other. f. complete library is quantified and denatured to make it double stranded g. The library is added to the flow cell and bridge amplification or cluster generation is performed on the cBOT machine.

biotinylated-adapters(454)

attaching a biotin molecule to the 5' end of the adapter so that end will bond to the "streptavidin coated beads" to select for correctly linkered templates.

primer-coated capture beads(454)

beads coated with oligonucleotides that are complimentary to the adapters that bind to the bead. This allows the bead to bind the DNA allowing duplication during emPCR. These beads after replication (when they are 'hairy') are deposited on the picotiter plate and covered in enzyme coated beads and packing beads.

In Sanger sequencing what types of dNTPs are used? What is their function? Which are fluorescently labeled?(Sanger)

ddATP, ddCTP, ddTTP, and ddGTP are used. They are nucleotides that are missing both oxygens at 2' and 3'. All four are fluorescently labeled to determine where each nucleotide is in the sequence.

Large amounts of template for an individual fragment are needed. Why?(Sanger)

ddNTPs will stop the sequences at many different places. This way we can determine where all the nucleotides are.

Fasta file

format in which there is a seuqnce name, end of line, sequence, and hard return. For nucleotides or amino acids

Chromatogram(Sanger)

graph that shows the peaks of each radioactively labeled nucleotide

Phred value

measure of the quality of each chosen nucleotide during DNA sequencing. Q= 10, 1 in 10. Q=20, 1 in 100

capillary electrophoresis(Sanger)

molecules are separated based on their charge within a capillary tube

Contig

overlapping sequence reads that have been assembled by a computer program to form a continuous sequence

Sequence contigs

overlapping sequence reads that have been assembled by a computer program to form contigs

Describe three differences between 454, Ion Torrent, Illumina, and Sanger sequencing.

see pic

Singleton

sequence reads that do not assemble to any contig. Often these are contaminants or sequencing errors

SBL

sequencing by ligation. The method of sequencing used by SOLiD where no DNA polymerase is used, but instead, DNA ligase is used to ligate labeled oligonucleotides that reveal the sequence through hybridization.

SBS

sequencing by synthesis. Sequencing like Illumina, Sanger, 454, Ion torrent where DNA polymerase is used to add nucleotides to the growing DNA molecule which are then detected to indicate the sequence (with fluorescence, pyrophosphate production or voltage/pH change).

SNA

single nucleotide addition, sequencing like 454 or ion torrent when one type of nucleotide is added at a time and then washed off. Homopolymeric runs in the sequence would result in multiple nucleotide of the same kind being added in a row giving off twice, three times, four times, etc. the amount of light or pH change.

Consensus sequence

the assembled sequences with ambiguities resolved by coverage.

Assembly

the assembly of all the contigs sequenced to create one long continuous strand of DNA composed of many contigs and scaffolds.

DNA library(Illumina)

the library of sheared DNA fragments that will be attached to the flow cell base plate

Coverage

the number of times that a sequence was sequenced. (individual overlapping reads)

size selection(Illumina)

the selection of fragment sizes that will be used in the Illumina sequencing reaction

Dephasing (Illumina)

when a sequence becomes out of phase. In the case of Illumina sequencing this is usually when some of the templates in an individual cluster incorporate nucleotides that either lack a fluorophore or a terminator so the extension doesn't stop with only one base and it skips a space in the frame. Thus a portion of the cluster will incorporate the wrong nucleotide in the next round. This is also caused by homopolymeric runs in 454 sequencing


Conjuntos de estudio relacionados

Drugs, Society and Behavior Test 2

View Set

Biology. Proteins, Enzymes, and Nucleic Acids.

View Set

Pharm Module 10 Practice Questions

View Set

The Cold War and the Civil Rights Movement

View Set

Noun and Pronoun CP English quiz

View Set

AP Biology Chapter 8 Photosynthesis

View Set

Medication administration quiz 4

View Set

Honan-Chapter 12: Nursing Assessment: Cardiovascular and Circulatory Function

View Set

"The Medium is the Message" (Review)

View Set

Create, Imagine, Play, Human Development in the Art Final

View Set