Chapter 8 Flash Cards
*8.15 Some restriction enzymes leave sticky ends, while others leave blunt ends. It is more efficient to clone DNA fragments with sticky ends than DNA fragments with blunt ends. What is the best way to efficiently clone a set of DNA fragments having blunt ends?
use a restriction site linker, a short segment of double stranded DNA that contains a restriction site. The linker can be efficiently ligated onto blunt-ended DNA fragments. -digestion of the resulting DNA fragment with the restriction enzyme will then produce fragments with sticky ends. their sticky ends allow for efficient ligation into plasmids digested with the same restriction enzyme. -to clone DNA fragments that have the restriction site found in the linker use and adapter... a short, double-stranded piece of DNA with one sticky end and one blunt end.
*8.4 A new restriction endonuclease is isolated from a bacterium. This enzyme cuts DNA into fragments that average 4,096 base pairs long. Like many other known restriction enzymes, the new one recognizes a sequence in DNA that has twofold rotational symmetry. From the information given, how many base pairs of DNA constitute the recognition sequence for the new enzyme?
(The average length of the fragments produced indicates how often, on average, the restriction site appears) -if DNA has equal amounts of ATCG, the chance of finding one specific base pair at particular site is 1/4 -chance of finding two specific base pairs at a site is (1/4)^2 -chance of finding n specific base pairs at a site is (1/4)^n (ex. (1/4)^6= 1/4096.... enzyme recong. 6-base-pair site)
8.6 About 40% of the base pairs in human DNA are G-C. On average, how far apart (in base pairs) will the following sequences be? a. two BamHI sites b. two EcoRI sites c. two NotI sites d. two HaeIII sites
(graph on pg 125) -prob. of finding G-C or C-G base pair is .20 -prob. of finding T-A of A-T base pair is .30
8.3 Restriction endonucleases are naturally found in bacteria. What purposes do they serve?
- to protect their hosts from infection by invading viruses and degrade any potential infectious foreign DNA taken up by the cell -digest DNA at specific sites, and foreign DNA will be cut up (to prevent a bacterium modifies the sites recognized by r. enzymes and prevents cleavage at sites)
8.1 Before a genome is sequenced, its DNA must be cloned. What is meant by a DNA clone, and what materials and steps are used to clone genomic DNA?
-A DNA clone is a section of a DNA molecule that has been inserted into a vector molecule (ex. plasmid, phage, BAC, YAC) then replicated to form many identical copies when the vector is transformed into cells and the cells are propagated -genomic DNA isolated, manipulated by cleaving DNA so it can be inserted into cloning vector by using a restriction enzyme to produce molecules with sticky ends, and selecting size by agarose gel electrophoresis. The sticky ends are annealed to complementary sticky ends in a similarly cleaved cloning vector. The nicks in the phosphodiester backbone are sealed using DNA ligase, and the recombinant molecule is transformed into a host cell. The host cell is propagated, and the DNA clone can be purified from the cells.
*8.35 How has genomic analysis provided evidence that Archaea is a branch of life distinct from Bacteria and Eukarya?
-Sequencing has show that they are not uniformly similar to those of Bacteria and Eurkarya. -Archean genes are similar to bacteria as far as cell division energy production, and metabolism. -Archean genes are similar to Eukarya's DNA replication, transcription, and translation are similar to their counterparts in Eukarya.
8.38 The C-value paradox (see Chapter 2, pp. 23-24) states that there is no obvious relationship between an organism's haploid DNA content and its organizational and structural complexity. Discuss, citing data from the genome sequencing, whether there is also a gene-number paradox or a gene-density paradox.
-The table showed no straightforward relationship between the gene number and organizational complexity. Rice has more genes that human, therefore there is a gene-number paradox. -Gene density is not constant in eukaryotes. -The limited number of species for this information that exists makes it difficult to accurately sate that organization complexity is related to gene density.
8.9 The plasmid pBluescript II is a plasmid cloning vector used in E. coli. What features does it have that makes it useful for constructing and cloning recombinant DNA molecules? Which of these features are particularly useful during the sequencing of a genome?
-it has an origin of replication that results in it being present in a high copy number that facilitates purification of plasmid DNA. -it contains many unique restriction sites in a polylinker or multiple cloning site that facilitates cloning fragments of DNA obtained after cleavage with a variety of restriction enzymes. -its polylinker is inserted near the 5' end of the lacZ gene which encodes B-glactosidase. -useful to work with a plasmid present in a high copy number so that plasmid DNA can be more easily purified. -useful when preforming blue/white screening to detect colonies harboring plasmids with inserts (use a set of universal sequencing primers to easily obtain the sequence inserted into different clones.)
8.36 The genomes of many different organisms, including bacteria, rice, and dogs, have been sequenced. Choose three phylogenetically diverse organisms. Compare the rationales for sequencing their genomes, and describe what we have learned from sequencing each genome.
...
*8.2 The ability of complementary nucleotides to base pair using hydrogen bonding, and the ability to selectively disrupt or retain accurate base pairing by treatment with chemicals (e.g., alkaline conditions) and/or heat is critical to many methods used to produce and analyze cloned DNA. Give three examples of methods that rely on complementary base pairing, and explain what role complementary base pairing plays in each of these methods.
1. Binding of complementary sticky ends present in a cloning vector and a DNA fragment prior to their ligation by DNA ligase... binding of sticky ends present in a cloning vector and DNA fragment position the fragment and the vector so that DNA ligase can covalently attach the insert and vector molecules 2. Annealing of labeled nucleic acid to a complementary single-stranded DNA fragment on a microarray... based on complementary base pairing, it allows the microarray be used for the deletion of specific sequences. 3. annealing of an oligo (dT) primer to a poly(A) tail during the synthesis of cDNA from mRNA... requires complementary base pairing between the primer and mRNA, which defines where reverse transcriptase will initiate RNA-directed DNA synthesis. 4. annealing of a primer to a template during a DNA sequencing reaction... defines where the DNA sequencing reaction will start each example... base paring allows for nucleotides to interact in a sequence specific manner essential for the procedure's success.
*8.13 Genomic libraries are important resources for isolating genes and for studying the functional organization of chromosomes. List the steps you would use to make a genomic library of yeast in a plasmid vector. In what fundamental way would you modify this procedure if you were making the library in a BAC vector?
1. Isolate high-molecular-weight yeast genomic DNA by isolating nuclei, lysing them, and gently purifying their DNA 2. Cleave the DNA into fragments that are 5-10kb, and appropriate size for insertion into a plasmid vector. This can be done by cleaving the DNA with Sau3A for a limited time and then selecting fragments or an appropriate size by either sucrose density centrifugation or agarose gel electrophoresis. 3. digest a plasmid vector like pBluescriptII with BamHI. This will leave sticky ends that can pair with those left by Sau3A. Treat the digested plasmid with alkaline phosphatase to prevent it from recirularizing when mixed with DNA ligase. 4. mix the purified, Sau3A-digested yeast genomic DNA with the prepared plasmid vector and DNA ligase. 5. transform the recombinant DNA molecules into E. coli 6. recover colonies with plasmids by plating on media with ampicillin and with X-gal. Each colony will have a different yeast DNA insert, and all of the colonies comprise the yeast genomic library. ----- in a BAC vector, much larger DNA fragments-200 to 300 kb in size would be used
*8.10 A colleague has sent you a 2-kb DNA fragment excised from a plasmid cloning vector with the enzyme PstI (see Table 8.1 for a description of this enzyme and the restriction site it recognizes). a. List the steps you would take to clone the DNA fragment into the plasmid vector pBluescript II (shown in Figure 8.4), and explain why each step is necessary. b. How would you verify that you have cloned the fragment?
1. prepare pBluescript II vector so it has the same sticky ends as the DNA fragment (so it does not recircularize when you mix the vector with DNA ligase) 2. linearize the circular pBluescript II vector by digesting it with the enzyme PstI. 3. treat the digested vector with alkaline phosphatase (to remove its 5' phosphates leaving only 5'-OH groups at its two ends and prevents recircularization when mixed with DNA ligase, if not treated most colonies will not have inserts) 4. mix it with 2kb DNA fragment, and add DNA ligase... since inserte DNa has not been treated with phosphatase, it retains 5' phosphate groups and its 5' ends can be ligated to the sticky ends of the digested vector. 5. transform e.coli with ligation reaction, and plate the cells on medium containing ampicillin and x-gal. (presence of ampicillin in medium ensures that only bacteria containing the pBluescripts II plasmid will grow. the presence of x-gal allows colonies with inserts to be identified, if fragment was not inserted into the PstI site the lacZ gene will function, B-galactosidease will be made, x-gal will be cleaved, and the colony will be blue.... if fragment was inserted into the PstI site, it will have disrupted the lacZ gene, no B-galactosidase will be made, and the colony will be white.) B. independently confirm the results obtained using the blue/white screening.... select white colonies and prepare plasmid DNA from each colony. Digest the prepared DNAs with PstI, and separate the digestion products by size using agarose gel electrophoresis. A colony with the correct insert should produce two bands: a 2kb band corresponding to the insert and a 3kb band corresponding to the pBluescript II vector.
*8.18 When Celera Genomics sequenced the human genome, they obtained 13,543,099 reads of plasmids having an average insert size of 1,951 bp, and 10,894,467 reads of plasmids having an average insert size of 10,800 bp. a. Dideoxy sequencing provides only about 500-550 nucleotides of sequence. About how many nucleotides of sequence did cetera obtain from sequencing these two plasmid libraries? To what fold coverage does this amount of sequence information correspond? b. Why did they sequence plasmids from two libraries with different-sized inserts? c. They sequenced only the ends of each insert. How did they determine the sequence lying between the sequenced ends?
18. a ) 500 x(13,543,099+10,894,467) =1.22x10^10 part 2 ( 1.22x10^10/3x10^9 (haploid human genome sequence)= fourfold coverage b. Clones must be sequenced from libraries with different sized inserts s otht sequces can be assembled in locations where repetitive DNA elements are inserted If a plammid with 2-kb insert has a uique sequence at one end but repepetive sequnce at the other end it will not e possible to continue to assemble the sequcne past with this plasmid. The 10kb plasmid will have a unique sequence in the plasmid with the 2-kb insert as well s sequence the other end that lies past the repetiivei element and can be assemeble with unique sequcne from other plasmids. C The sequcne fo the central region is obtained from the sequcne of overlapping clones during the sequence assembly
*8.19 a. What features of pBluescript II facilitate obtaining the sequence at the ends of an insert? b. Devise a strategy to obtain the entire sequence of a 7-kb insert in pBluescript II. c. Devise a strategy to obtain the entire sequence of a 200-kb insert in pBeloBAC11
19. of an insert in pBluescript 2 is unkown it is not possible to design and sythesize a sequencing primer targeted directly to it. To circumvent this issue, the pbluescript 2 vector has universal sequencing primer sites tht flank the multiple cloning sites sequencing primers are positions so that DNA polymerase can extend fro the primer to obtain the sequence fo the ends of the insert. b.) If di-deoxy sequening is used only several hundred bases of sequence are obtained from one sequencing reaction. First obtain sequence of entire 7-kb insert by getting the sequence of the iends of the insert using the universal primers present in the pbluescript vector. Second step use the sequnece made to use to obtain an additional 500 bases of DNA sequence. Assemble this sequence with that previously obtained Annealing of a sequencing primer to one strand of a double stranded DNA fragement defines the point from wich DNA sequence can be obtained. If the sequence based on overlap between sequences. Youll have 950 bases of the sequence at each end. Design primers that re bout 900 bases from the ends of the insert. Use them in third set of sequencing reactions the sequence obtained fomr one end of the insert will be reversed and complementary to the sequence obtained from the other end of the insert. c.) Use prince primer walked explained in part B though it would be tedious and time consuming. In addition if there were repretitive sequenes you might run into problems. If you inadvertenetly designed a primer within a repetitive sequence you would not obtain unambiguous sequence info from that primer. It is more effiecent to obtain sequence by using whole genome shotgun cloning approach. Make plamid library with 2kb insert and 10-1kb in p beloBAC11 clone, sequence the ends of the inserts with enough blones to obtain sevenfold coverage. And then assemble that sequence using computerized algorithems.
*8.25 A set of hybrid cell lines containing a single copy of the same human chromosome from 10 different individuals was genotyped for 26 SNPs, A through Z. The SNPs are present on the chromosome in the order A, B, C, . . . Z. Table 8.C lists the SNP alleles present in each cell line. State which SNPs can serve as tag SNPs, and which haplotypes they identify. What is the minimum number of tag SNPs needed to differentiate between the haplotypes present on this chromosome?
25 A tag SNP is a SNP having alleles tht can identify the alleles at other snps in a haplotype. to identify a tag SNP, identify the haplotypes present in a set of samples and then determine wheter a SNP within that haplotype can uniquely specify the haplotype. In this dataset start to identify haplotypes by comparingpairs fo samples Focusing initially on two columns of the table proceed down the columns row by row to identify pairs of cell lines that share neighboring SN alleles in one chromosomal region at some point as you proceed down the column stwo cell lines that sharded neighboring snp alleles in earlier rows will no longer share the same SNP allele this likely marks the end of a sharded haplotype draw a line underneath the last row where snp alleles are shrd and scan the rows above the line to se if any other samples share the haplotype you hae identified. Shade or color code the squares within the table to keeep track of the haplotype you have identified shade or color code the squares within the table to keep track of the haplotype. Once you have identified a haplotpe in one set of djacent rows examine the reminaing columns in a similr manner to identify the other haplotypes in that region shade or color code these and then proceed furtherdown the table analyzing additional rows In the same manner as before to identify the next set of haplotypes after you have completed an analysis of all rows fo the table go back and see if any neighboring haplotypes can be combined and extended finally examine the haplotype regions to identify tag SNPs (snps with alleles that are uniquely specify th ehaplytype. One approach to this is to exlude snps with alleles that are seen in more than one haplotype since these annot sere as tag snps. The 26 snps define 5 sets of haplotypes so a minimum of 5 tag snps are needed to differentiate between them. The percentage of snps that are shared between individuals assighned to different racial groups can be determined by comparing the results of hybrdidztions with target dna individuals from different racial groups. If thag snps were detected using this approach one could quantify the percentage of haplotypes that are shared in different racial groups.
*8.5 An endonuclease called AvrII ("a-v-r-two") cuts DNA whenever it finds the sequence 5'-CCTAGG-3' 3'-GGATCC-5' a. About how many cuts would AvrII make in the human genome, which contains about base pairs of DNA and in which 40% of the base pairs are G-C? b. On average, how far apart (in base pairs) will two AvrII sites be in the human genome? c. In the cellular slime mold Dictyostelium discoidium, about 80% of the base pairs in regions between genes are A-T. On average, how far apart (in base pairs) will two AvrII sites be in these regions?
A. 40%=G-C (chance of finding G-C or C-G= 0.20, and A-T or T-A= 0.30) (chance of finding 6bp is (.20)^4 x (.30)^2=0.000144) (a genome with 3x10^9 will have 3x10^9 different 6bp sequence) number of sites in human genome is (0.000144)x(3x10^9) =432,000 B. 3x10^9 bp/ 432,000 sites= 1/0.000144=6,944 bp between sites C. Chance of finding 6 bp with sequence of 80% A-T bp is (.10)^4x(.4)^2=0.000016 so two Avrll sites will be 1/.000016= 62,500 bp apart
*8.7 The average size of fragments (in base pairs) observed after genomic DNA from eight different species was individually cleaved with each of six different restriction enzymes is shown in Table 8.B. a. Assuming that each genome has equal amounts of A, T, G, and C, and that on average these bases are uniformly distributed, what average fragment size is expected following digestion with each enzyme? b. How might you explain each of the following? i. There is a large variation in the average fragment sizes when different genomes are cut with the same enzyme. ii. There is a large variation in the average fragment sizes when the same genome is cut with different enzymes that recognize sites having the same length (e.g., ApaI, HindIII, SacI, and SspI). iii. Both SrfI and NotI, which each recognize an 8-bp site, cut the Mycobacterium genome more frequently than SspI and HindIII, which each recognize a 6-bp site.
A. Average 65,536 bp in size 25% each AGCT chance of finding 6 bp site is (1/4)^6= 1/ 4,906 -what sequences ApaI, HindIII, SacI, SspI produce chance of finding 8 bp site is (1/4)^8= 1/ 65,536 -what sequences SrfI and NotI produce B.i. could reflect... 1. nonrandom arrangements of bp in the different genomes 2. different base compositions of the genomes (genomes rich in A-T bp will have fewer sites for enzymes recognizing sites containing only G-C bp.) ii. could reflect... 1. nonrandom arrangement of bp in that genome 2. the base composition of that genome iii. possibilities 1. genome of mycobacterium genome is rich in G-C bp and poor in A-T bp 2. nonrandom arrangement of bp so that 5'-AA-3', 5'-TT-3', 5'-AT- 3' and/or 5'-TA-3' sequences are rare. Data for SacI suggest that 5'-AG-3' and 5'-CT-3' that are part of HindIII site are not rare.)
8.33 Annotation of genomic sequences makes them much more useful to researchers. What features should be included in an annotation, and in what different ways can they be depicted? For some examples of current annotations in databases, see the following websites: http://www.yeastgenome.org/ http://flybase.org (Drosophila) http://www.tigr.org/tdb/e2k1/ath1/ (Arabidopsis) http://www.ncbi.nlm.nih.gov/genome/guide/human/ (humans) http://genome.ucsc.edu/cgi-bin/hgGateway (humans) http://www.h-invitational.jp/
An annotation can vary in its level of completeness depending on what can be inferred based on homology and what experimental data are available. An annotation could include: 1. The location of a transcribed region depicted graphically with links embedded within the symbols used depicted transcript. a. Physical map coordinates i. Links to cDNA clones, SNPs, and other DNA marker in the region. b. Genetic map coordinates i. Links to genes, mapping data in the region. c. Location of introns, exons, alternative spice site, alternative promoters, poly(A) addition sites, etc. i. Physical map coordinates ii. Graphical depictions of transcript structures 2. The types of evidence for a evidence region a. Prediction based on computer algorithms that assess possible sequences of a promoter, conceptual ORF(s), appropiate splicing site, homology with other genes, etc. b. Analysis of cDNA c. Documentation of splice site, alternative transcript forms by comparison of cDNA and genomic sequences, etc. 3. The inferred function of the gene produce a. Link to reports, database entries, publications b. Evidence supporting the inferred function i. Inference based on homology ii. Experimental evidence c. Information on pathways and processes involving the gene. 4. Information on gene expression levels a. Information on where the gene is expressed b. Information on levels of expression in different tissue 5. Information about gene regulation a. Genes regulated in a similar manner. b. Unique feature on the gene and it regulation.
8.22 How does pyrosequencing differ from dideoxy chain-termination sequencing? What advantages does it have for large-scale sequencing projects?
Both start with DNA template dna polymerse and a squencing primerdi-deoxy detects sequence of nucleotides chains using chain terminatino mechanism while pyro doesn't and di-deoxy 4 flouresc dyes are present in dna synthesis. when colored bases are added chain termination occurs. Uses capillary gel electropheresis to determine sequence. Pyro determine sequencing by enzymatic detection the pyrophospate when a base is incorporated by dna polymerase. pryo a single dna molecule is attatched to a solid microscopic bead and placed into a microscopic well of the pyro sequencer. A sequecne reaction mixture contains a primer dna polyermase and 3 other enymes. the 4 dntps are added and removed so that only one dntp is present in the reactino at one time. If dntp matches next base polymerase will add it to the primer then a molecule pryophosphate is release. second enzyme uses pyro phosphate to produce atp and third enzyme uses atp to produce light. pyro sequencer detects amount of light release and mashes it to which dtp was present in the reactoin. this is how growing dna is determined. advantages of pyro sequencing over deoxy advantagous for large scale sequencing projects becuase allows for massive sequencing data. a pyro sequencing has 2000 microscopic wells which can carry out each its own pryo sequencing reaction. sequeucer can be used to obtain 20 million nucleotides of genome sequencing in about 6 hours.
*8.8 What features are required in all vectors used to propagate cloned DNA? What different types of cloning vectors are there, and how do these differ from each other?
Features.... 1. ability to replicated w/in a host cell conferred by an origin of replication. 2. a dominant marker that allows for their selection in a host cell 3. one or more unique restriction sites for DNA insertion 4. YACs also have CEN sequences to ensure their proper segregation during cell division. Vectors... 1. plasmids... hold les than 10kb of DNA, can replicate at high copy number w/in bacteria cells. 2.BACs... hold up to 300kb of DNA, present in a single copy in bacteria cells, preferred vector for large clones in physical mapping studies of genome because they do not undergo rearrangements. (disadvantages of e.coli cloning vectors are their AT-rich sequences are difficult to clone in e.coli, and some sequences are poisonous to e.coli when cloned. 3.YACs... hold between .2 and 2 Mb of DNA, present in one copy per cell, hold large inserts and have been useful for the construction of physical maps of the genome. they are limited because that they can undergo rearrangements and often chimeric (holding DNA from more than one site in the genome).
8.31 What is the difference between a gene and an ORF? Explain whether all ORFs correspond to a true gene, and if they do not, what challenges this poses for genome annotation.
Genes encode more than just polypeptides and ORF is a sequence in MRNA that just encodes polypeptide. ORF has a start codon and ends at at stop codon. Not all ORF function as genes.
8.37 In which type of organisms does gene number appear to be related to genome size? Explain why this is not the case in all organisms.
Genes make up most of the genome of Bacteria and Archea, since gene density is high there is a general relationship between the number of genes and genome size. In Eurkayotes there is a wide range of gene density with increasing complexity. So the number of genes is not always related to gene size.
*8.16 The human genome contains about 10 X 3^9 bp of DNA. How many 200-kb fragments would you have to clone into a BAC library to have a 90% probability of including a particular sequence?
N=ln(1-p)/ln(1-f) N-- necessary # of recombinant DNA molecules p--prob. of including one particular sequence f--fractional proportion of the genome in a single recombinant DNA molecule p=.90 f=(2x10^5)/(3x10^9) n=34,538
No Neurospora questions on tests
No Answer Needed
8.23 Do all SNPs lead to an alteration in phenotype? Explain why or why not.
No... some are silent. (ex. if SNP does not lie in a DNA sequence that is transcribed -if does lie in a transcribed sequence but after mRNA processing does not alter the amino acid inserted into a polypeptide chain. it will not cause a missense or nonsense mutation and could be silent. -if SNP does also not lie in a gene regulatory region, it will not affect a gene function and could also be silent.)
*8.27 Mutations in the dystrophin gene can lead to Duchenne muscular dystrophy. The dystrophin gene is among the largest known: it has a primary transcript that spans 2.5 Mb, and it produces a mature mRNA that is about 14 kb. Many different mutations in the dystrophin gene have been identified. What steps would you take if you wanted to use a DNA microarray to identify the specific dystrophin gene mutation present in a patient with Duchenne muscular dystrophy?
Prepare a dna microarray consisting of oligonucleotides that collectively represent the entirety of the normal dystophin gene including snps known to be present in normal indivuduas as well as known point mutatins. Isolte dna from the blood of an individuals affected with MS label it with dye and hybridize the chip with the labeled dna under onditions that require a precise match. Site of mutatin can be located by identifying region of gene were no hyrid signal is seen in any of oligonnucleotide. If mutation corresponds with previously known mutations. 28 during assembly the raw sequence obtained from genome sequencing projects are pieced together in the order they are found in the gemone this is done by aligning overlapping sequenes and when a seuencing read containing unuqe sequence at one end terminates with repetitive seuqcne identfiynnig other clones containing unique sequences that span the lengt of repeitie element assembly prcess produces working draft of genominc sequence. A working draft of a genome seqence contains many gaps that must be filled as well as sequencgin errors. The finishing process fills in these gaps and addresses the errors it results in a highlyaccurae sequence with less than one error per 10,000 bases and as many gaps as possible filled in. After the complete seunce of a genome s obtained it is annotated during annotatins genes and other important seunce features are identified genes cn be identified by compring genomic sequnce cDNA obtained from sequencing clones in cDNa librarires genes and other important sequence features can lso be predicted thouh the analysis of sequences using computerized algorithms. SNPs can be identified yby comparing the sequences of different indiducusla in a population. Anootaitno beings the process of assigning funcionts to all the genes of an organism.
*8.30 Eukaryotic genomes differ in their repetitive DNA content. For example, consider the typical euchromatic 50-kb segment of human DNA that contains the human T-cell receptor. About 40% of it is composed of various genome-wide repeats, about 10% encodes three genes (with introns), and about 8% is taken up by a pseudogene. Compare this to the typical 50-kb segment of yeast DNA containing the HIS4 gene. There, only about 12% is composed of a genome-wide repeat, and about 70% encodes genes (without introns). The remaining sequences in each case are untranscribed and either contain regulatory signals or have no discernible information. Whereas some repetitive sequences can be interspersed throughout gene-containing euchromatic regions, others are abundant near centromeres. What problems do these repetitive sequences pose for sequencing eukaryotic genomes? When can these problems be overcome, and how?
Repetitive sequences pose two problems. It is unclonable unless they are extremely complex. Small inserts make it harder to clone (2kb) one end of a clone can be a unique sequence while the other may be repetitive. A larger insert is likely to have a complicated sequence flanked by a repetitive one. (10kb)
*8.21 In a sequencing reaction using dideoxynucleotides that are labeled with different fluorescent dyes the DNA chains produced by the reaction are separated by size using capillary gel electrophoresis and then detected by a laser eye as they exit the capillary. A computer then converts the differently colored fluorescent peaks into a pseudocolored trace. Suppose green is used for A, black for G, red for T, and blue for C. What pattern of peaks do you expect to see on a sequencing trace if you carry out a dideoxy sequencing reaction after the primer 5'-CTAGG-3' is annealed to the following singlestranded DNA fragment? 3'-GATCCAAGTCTACGTATAGGCC-5'
The primers anneal to the fragment dna polymerase extends at the 3 prime end by adding bases to the complimentary template. chains will be prematurely terminated when a ddntp is incorporated since ddntps are labeled with dyes the extension product that terminates with that base will have the same flourescent label this will be detected by labeled bands after separation using capillary gel electrophoresis
*8.34 One powerful approach to annotating genes is to compare the structures of cDNA copies of mRNAs to the genomic sequences that encode them. Indeed, a large collaboration involving 68 research teams analyzed 41,118 full-length cDNAs to annotate the structure of 21,037 human genes (see http://www.h-invitational.jp/). a. What types of information can be obtained by comparing the structures of cDNAs with genomic DNA? b. During the synthesis of cDNA (see Figure 8.15), reverse transcriptase may not always copy the entire length of the mRNA and so a cDNA that is not full length can be generated. Why is it desirable, when possible, to use full-length cDNAs in these analyses? c. The research teams characterized the number of loci per Mb of DNA for each chromosome. Among the autosomes, chromosome 19 had the highest ratio of 19 loci per Mb while chromosome 13 had the lowest ratio of 3.5 loci per Mb. Among the sex chromosomes, the X had 4.2 loci per Mb while the Y had only 0.6 loci per Mb. What does this tell you about the distribution of genes within the human genome? How can these data be reconciled with the idea that chromosomes have gene-rich regions as well as gene deserts? d. When the research teams completed their initial analysis, they were able to map 40,140 cDNAs to the available human genome sequence. Another 978 cDNAs could not be mapped. Of these 978 cDNAs, 907 cDNAs could be roughly mapped to the mouse genome. Why might some (human) cDNAs be unable to be mapped to the human genome sequence that was available at the time although they could be mapped to the mouse genome sequence? (Hint: Consider where errors and limited information might exist.)
a). Comparison of cDNA and genomic DNA seuences can defnine the sructure of transcription unit by elucidating the location of interon-extron boundaries, poly(A) site, and the approximate locations of promoter regions. Comparison of different full length cDNAs representing the same gene can identify the use of alternative splice site, alternative poly(A) sites, and alternative promoters. b). Analysis of full length cDNAs provides information about the entire open reading frame, information about the site at which transcription starts and here the promoter lies, and the location of the poly(A) tail. Partial lengths only provide some insight. Multiple cDNA sequences could be compared and assembled for more info but they are hard to aseble bc of he use of alternative splice site, alternative promoters, and/or alternative poly(A) sites. c). Some chromosomes have more genes that others do. Some regions of chromosome have gene rich regions and some have gene desert regions. More data is needed to know the relation ship between the density of genes on a chromosome and how gene rich it is. d.)2 possible explanations: 1. some regions of the genome sequence are incorrectly assembled so the cDNA's are unable to be mapped to just one region 2. Some of the genes are in regions that have not yet been assembled, maybe they are too difficult to clone. As the genome sequence is revised these issue should be resolved.
8.17 A biochemist studies a protein with antifreeze properties that he found in an Antarctic fish. After determining part of the protein's amino acid sequence, he decides he would like to obtain the DNA sequence of its gene. He has no experience in genome analysis and mistakenly thinks he needs to sequence the entire genome of the fish to obtain this information. When he asks a more knowledgeable colleague about how to sequence the fish genome, she describes the whole-genome shotgun approach and the need to obtain about 7-fold coverage. The biochemist decides that this approach provides far more information than he needs and so embarks on an alternate approach he thinks will be faster. He decides to sequence individual clones chosen at random from a library made with genomic DNA from the Antarctic fish. After sequencing the insert of a clone, he will analyze it to see if it contains an ORF with the sequence of amino acids he knows are present in the antifreeze protein. If it does, he will have found what he wants and will not sequence any additional clones. If it does not, he plans to keep obtaining and analyzing the sequences of individual clones sequentially until he finds a clone that has the sequence of interest. He thinks this approach will let him sequence fewer clones and be faster than the whole-genome shotgun approach. He must decide which vector to use in building his genomic library. He can construct a library made in the pBluescript II vector with inserts that are, on average, 7 kb, a library made in the vector pBeloBAC11 with inserts that are, on average, 200 kb, and a library made in a YAC vector with inserts that are, on average, 1 Mb. He assumes that any library he constructs will have an equally good representation of the 2x10^9 base pairs in a haploid copy of the fish genome, that the antifreeze gene is less than 2 kb in size, and that (somehow) he can easily obtain the sequence of the DNA inserted into a clone. a. Given the biochemist's assumptions, what is the chance that he will find the antifreeze gene if he sequences the insert of just one clone from each library? Based on this information, which library should he use if he wants to sequence the fewest number of clones? b. When he tries to sequence the insert of the first clone he picks from the library by a calleague suggested by a colleague in (a), he realizes that he does not enjoy this type of lab work. So, he hires a technician with experience in genomics, assigns the project to her, and goes to Antarctica to catch more fish. He tells her to sequence the inserts of enough clones to be 95% certain of obtaining at least one insert containing the antifreeze gene and says he will analyze all of the sequence data for the presence of the antifreeze gene after he returns. How many clones should she sequence to satisfy this requirement if he constructed the genomic library in a plasmid vector? a BAC vector? a YAC vector? c. What advantages and disadvantages does each of the different vectors have for constructing librariefs with cloned genome DNA? d. Suppose the Antarctic fish has a very AT-rich genome and the biochemist propagated the genomic library using E. coli. Will the library be representative of all the sequences in the genome of the fish?
a). the chance a biochemist finds a will find an antifreeze gene if he sequences the insert of just one clone from a library is f,where f is fractional proportion of the genome in single DNA. For the plasmid library, f=(7x10^3)/(2x10^9)= 3.5x10^-6 For BAC library: f=(2x10^5)/(2x10^9)=1x10-4 For YAC library: (1x10^6)/(2x10^9)= 5x10-4 b). N is the necessary number of recombinant DNA molecules, p is the probability of including one particular sequence, f is the fractional proportion of a genome in single DNA recombinant molecule, N=ln(1-p)/ln(1-f). Here, p= 0.95. Use the values for f determined in part a. For a plasmid vector, N= ln(1-0.95)/ ln(1-0.35x10^-6)= 8.6x10^5. For a BAC vector, N= ln(1-0.95)/ ln(1-(1x10^-4)=3.0x10^4. For a YAC vector, N= ln(1-0.95)/ ln(1-(5x10^-4)= 6.0x10^3. c). plasmids clnes are easier to manipulate than BAC or YAC clones and high occupancy in the cells facilitates their purification. This and the ease of constructing plasmid libraries with inserts les than 10kb, which makes them useful for sequencing genomes using a whole shot gun genome approach. But plasmid vectors hold less DNA than BAC and YAC vectors. So for situations like this problems a particular clone must be identified in s library many more clones need to be analyzed when a plasmid library is used. BAC are artifical bacterial chromosomes and are present in cells as one copy. They can insert up to 300kb and can be manipulated like the plasmid but purification is not ad simple. The sequences are stable and fo not undergo rearrangement in the cell, which has made them the perferred vector for making large clones
*8.32 Once a genomic region is sequenced, computerized algorithms can be used to scan the sequence to identify potential ORFs. a. Devise a strategy to identify potential prokaryotic ORFs by listing features accessible by an algorithm checking for ORFs. b. Why does the presence of introns within transcribed eukaryotic sequences preclude direct application of this strategy to eukaryotic sequences? c. The average length of exons in humans is about 100-200 bp, while the length of introns can range from about 100 to many thousands of base pairs. What challenges do these findings pose for identifying exons in uncharacterized regions of the human genome? d. How might you modify your strategy to overcome some of the problems posed by the presence of introns in transcribed eukaryotic sequences?
a. Prokaryotic ORF's reside in transcribed regions, and follow a bacterial promoters recognized by sigma factors. The promoter containing the consensus sequences attracts the sigma factor, then Shine-Dalgarno b. Eukaryotic introns are transcribed but not translated. They will be spliced out of primary mRNA before its translated. If not accounted for it could introduce aditionla amino acids, frameshift c.) small average sie of exons relative to range of sizes for introns make it challenging to predict whether a region with only short set of in-frame codons is used as an exon. Such regions could have arisen by chance or be the remnnts of exons no longer used td=due to mutation in splice site signals D Eukaryotic introns typically contain a GU at their 5 end and AG at 3ends and Ekaryotic ORFs in DNA sequences scn sequences following a eukaryotic promoter for the presence of possible introns by searching for sets of these three consensus sequences. Try to translate ssequences obtained if potential introns are removed testing whether a long ORF with good codon usage can be generated since alternative mRNA splicing exists at many genes, more than one possible ORF may be found in a given DNA sequence.
8.24 Researchers at Perlegen Sciences sought to identify tag SNPs on human chromosome 21. After determining the genotypes at 24,047 common SNPs in 20 hybrid cell lines containing a single, different human chromosome 21, they used computerized algorithms to identify haplotypes containing between 2 and 114 SNPs that cover the entire chromosome. A total of 2,783 tag SNPS were selected from SNPs within these blocks. a. What is a SNP marker? b. How do haplotypes arise in members of a population? c. What is a hapmap? d. What is a tag SNP? e. What advantages were there for the researchers to use hybrid cell lines instead of genomic DNA from 20 different individuals? f. The 20 individuals whose chromosome 21 was used in this analysis were unrelated and had different ethnic origins. Do you expect the haplotypes and number of tag SNPs to differ if... i. the cell lines were established from blood samples drawn at a large family reunion. ii. the cell lines were established from unrelated individuals, but their ancestors originated in the same geographical region.
a.) A SNP marker is a single nucleotide polymorphism a simple single base pair alteration present in soe individuals at one particular chromosomal site. b.) (a halotype is a set of specific SNP allesle at particular SNP loci that lie close together in one small region of a chromosome.) Differences in SNP alleles in different members of a population will lead to population having a set of haplotypes. (recombination is uncommon in a small region of the genome) (SNPs in one small region of a chromosome tend to be inherited together.) snps at a set of lci lying within a recombination cold spots tend to be inherited together in a population and result sin a haplyotype that is shared in different membersof the population. c.) A hapmap is a haplotype map a complete description of all of the haplotypes known in a set of tested human populations including info no their chromosomal location. d.) A tag SNP is an snp within haploype whose genoype can be sued to identify other snp alleles ini the haplotype. SNPs within a haplotype are inherited together because they lie in a recombination cold spot. This makes it possible to infer the other SNP alleles in the haplotype. Using single measurement, that of the tag SNP. e.) By using hybrid cell lines containing a single copy of the chromosome 21 it can be inferred what alleles are present on a single chromosome. Suppose they had instead analyzed samples obtained from the blood of different individuals that contained two copies of chromosome 21. Then, they would not be able to determine the arrangement of SNP alleles on one copy of chromosome 21. For example, consider two SNP loci, A and B, in one region of chromosome 21 and an individual who has the genotype A1 A2 B1 B2. The individual could have alleles A1 and B1 on one copy of chromosome 21 and A2 and B2 on the other copy, or A1 and B2 on one copy of chromosome 21 and A2 and B1 on the other copy. By using hybrid cell lines with just one copy of chromosome 21, the researchers were able to determine which SNP alleles were present in one haplotype. f i.) Though a popoulation may have many different SNP alleles and Haplotypes, not all of these will be seen in members of just one family. Since individuals within a family share more of their genome, they will share more haplotypes than unrelated individuals will. Consequently, if the cell lines were established from blood samples drawn at a large family reunion, there would be fewer haplotypes than if the lines were established from unrelated individuals. If there are fewer haplotypes there should also be fewer tap SNPs. Indeed, since recombination is generally infrequent in any one chromosomal region, related individuals are expected to share haplotypes over larger segments of their genome. Therefore, fewer tag SNPs can be used to identify their haplotypes. f ii.) Some SNPs may only have arisen in certain populations, and so some haplotypes may be seen only in certain populations. In addition, some haplotypes may be more frequent in a particular populations. Therefore, individuals whose ancestors originate from the same geographical region, though unrelated as family members are likely to share more haplotypes than do individuals whose ancestors orignate from distinct geographical origins. Therefore, the number of haplotypes and the number of tag SNPs is likely to be fewere individuals with ancestors from the same geographical region than individuals with ancestors from different geographical regions.
8.39 In the United States, 3-5% of public funds used to support the Human Genome Project were devoted to research to address its ethical, legal, social, and policy implications. Some of the results are described in the website http://www.ornl.gov/sci/techresources/Human_Genome/elsi/elsi.shtml. After exploring this website, answer the following questions: a. Summarize the main ethical, legal, social, and policy issues associated with the human genome project. b. Why is legislation necessary to protect an individual's genetic privacy? What such legislation currently exists? c. What are the pros and cons of gene testing? d. Both presymptomatic and symptomatic individuals are subject to gene testing for an inherited disease. How are gene tests used in each situation, and how do the concerns about using gene testing differ in these situations? e. Are laboratories that conduct genetic testing regulated by law?
a.) The main issues associated with the project are: i. The fair use of genetic information. ii. Maintaining privacy and confidentiality. iii. Whether an individual's genetic differences have a psychological impact and lead to stigmatization. iv. How genetic information issued to reproduce decision making and reproductive rights. v. Clinical issues relevant to the education of health-service providers, patients, and the general public in genetic capabilities, scientific limitations, and social risks; and relevant to the implementation of standards and quality control measure in testing procedures. vi. Uncertainties associated with the gene tests for susceptibilities and complex condition linked to multiple genes and gene environment interactions. vii. Conceptual and philosophical implication regarding human responsibility, free will versus genetic determinism, and concepts of health an disease. viii. Health and environmental issues concerning genetically modified foods and microbes. ix. Commercialization of products including property rights and accessibility of data and materials b.)To protect individuals from discrimnation based on information derived from genetic tests. The US has the Genetic Information Nondiscrimination Act (GINA) made in 2008. c.) Pro's are: clarify diagnosis, direct physicians to the right dx, avoid children with devastating diseases, and identify people who have preventable risks. Cons are: dont always have clear results, some tests only test for one disease but has potential for more, they subject to laboratory error, they can promote anxiety, there is not currently treatments for all the diseases dx with genetics testing, and they subject individuals with stigmas or discrimination that can outweigh the benefits. d.) -presymptomatic individuals can find out whether a person is at risk for developing a disease -symptomatic is used to clarify the diagnosis of a particular disease. The concerns are they the confirm or disaffirm a diagnosis, tell someone whos parents died from a disease will have the same mutant allele. If its a symtomatic person doctors may render a better therapeutic strategy. For asymptomatic people it may cause excess anxiety and lead to psychiatric care. e.) The FDA does not regulate these test developed in the laboratories and are categorized as services. Only certain states have regulation to evaluate the accuracy and reliability of genetic testing.
8.26 Some features that we commonly associate with racial identity, such as skin pigmentation, hair shape, and facial morphology, have a complex genetic basis. However, it turns out that these features are not representative of the genetic differences between racial groups—individuals assigned to different racial categories share many more DNA polymorphisms than not—supporting the contention that race is a social and not a biological construct. How could you use DNA chips to quantify the percentage of SNPs that are shared between individuals assigned to different racial groups? Table on page 215
compare the SNPs present in individuals assigned to different racial groups using DNA chips (probe arrays) containing thousands of SNPs that are representative of SNPs throughout the entire genome. -use a probe array that has, for each several 1000 rep. SNPs, a set of oligonucleotides that match the common allele and all possible variant alleles. -each hybridization will assess the SNP present in one individual, preformed by labeling fragments of genomic DNA from one individual as target DNA and hybridizing the labeled DNA to a SNP probe array. -an individuals SNP alleles will be determined by observing the pattern of fluorescence on the probe array and comparing this to the locations of the oligonucleotides for each SNP allele. -the percentage of SNP that are shared between individuals assign to different racial groups can be determined by comparing the results of hybridizations with target DNA from individuals from different racial groups. If tag SNPs were detected using this approach, one could quantify the percentage of haplotypes that are shared in different racial groups
8.28 Three of the steps in the analysis of a genome's sequence are assembly, finishing, and annotation. What is involved in each step, and how do they differ from each other?
during assembly the raw sequence obtained from genome sequencing projects are pieced together in the order they are found in the gemone this is done by aligning overlapping sequenes and when a seuencing read containing unuqe sequence at one end terminates with repetitive seuqcne identfiynnig other clones containing unique sequences that span the lengt of repeitie element assembly prcess produces working draft of genominc sequence. A working draft of a genome seqence contains many gaps that must be filled as well as sequencgin errors. The finishing process fills in these gaps and addresses the errors it results in a highlyaccurae sequence with less than one error per 10,000 bases and as many gaps as possible filled in. After the complete seunce of a genome s obtained it is annotated during annotatins genes and other important seunce features are identified genes cn be identified by compring genomic sequnce cDNA obtained from sequencing clones in cDNa librarires genes and other important sequence features can lso be predicted thouh the analysis of sequences using computerized algorithms. SNPs can be identified yby comparing the sequences of different indiducusla in a population. Anootaitno beings the process of assigning funcionts to all the genes of an organism.
*8.11 E. coli, like all bacterial cells, has its own restriction endonucleases that could interfere with the propagation of foreign DNA in plasmid vectors. For example, wild-type E. coli has a gene, hsdR, that encodes a restriction endonuclease that cleaves DNA that is not methylated at certain A residues. Why is it important to inactivate this enzyme by mutating the hsdR gene in strains of E. coli that will be used to propagate plasmids containing recombinant DNA?
if enzyme is not inactivated, the restriction enzyme produced by the hsdR gene will cleave any DNA transformed into E. coli with the appropriate recognition sequence. This will make it impossible to clone DNA with the recognition sequence that is not methylated at the A in this sequence.
8.29 What is a cDNA library, and from what cellular material is it derived? How is a cDNA synthesized, and how do the steps used to clone a cDNA differ from the steps used to clone genomic DNA? How are cDNA sequences used to help annotation of a sequenced genome?
it is a large collection of of cloned DNA sequences that are complementary and derived from mRNA template to synthesize a DNA strand. The first step is isolating of celluar mRNA and to it passed over a column to which oligo(dT) chains are attached. The bind the A nucleotides in the poly(A) +mRNA. The captured mRNAs are released from the column and used as a template for the reverse transcription. This makes the ds DNA-mRNA molecules. Next RNase H is used to partially degrade the RNA stand in the hybrid DNA-mRNA, DNA polymerase uses the peices from the degraded RNA fragments on the single stranded DNA as primers and ligase joins them together. 2 way to make sticky ends: 1. A site linker is ligated onto the ends of cDNA but it needs methylated nucleotides during synthesis for it to work. Then an unmethylated linker is added to the ends and will be cleaved by a restriction enzyme. 2 The second method to add stcky ends onto the ends of the cDNA is to ligate it to adapter- molecule with one blunt and one sticky end. Annotation can be be aided by sequencing clones from cDNA libraries. The cDNA sequences contain neither introns nor non-transcribed sequences. So analysis of cDNA is a reliable way to define the exact boundaries of exons, it also provides evidence that a genomic sequence is transcribed. One downfall is that not all cDNA produced in cDNA library will be full length but incomplete copy can identify region with a gene.
8.20 Explain how the whole-genome shotgun approach to sequencing a genome differs from the biochemist's approach described in Question 8(c). What information does it provide that the biochemist's approach does not? What does it mean to obtain 7-fold coverage, and why did his colleague advise him to do this?
refer to book
8.12 E. coli is a commonly used host for propagating DNA sequences cloned into plasmid vectors. Wild-type E. coli turns out to be an unsuitable host, however: the plasmid vectors are "engineered," and so is the host bacterium. For example, nearly all strains of E. coli used for propagating recombinant DNA molecules carry mutations in the recA gene. The wild-type recA gene encodes a protein that is central to DNA recombination and DNA repair. Mutations in recA eliminate general recombination in E. coli and render E. coli sensitive to UV light. How might a recA mutation make an E. coli cell a better host for propagating a plasmid carrying recombinant DNA? (Hint: What type of events involving recombinant plasmids and the E. coli chromosome will recA mutations prevent?) What additional advantage might there be to using recA mutants, considering that some of the E. coli cells harboring a recombinant plasmid could accidentally be released into the environment?
the recA mutation assists in preventing recombination between the host chromosomes and the plasmid vector. This restricts propagation of the plasmid to the cytoplasm and maintains the integrity of cloned sequences. It also makes the host cell less viable if it is accidentally released into the environment, because it is less efficient at DNA repair and sensitive to UV light.