Biochemistry Ch9: DNA-Based Information Technologies
electroporation
In an alternative method, cells incubated with the plasmid DNA are subjected to a high-voltage pulse. This approach, called electroporation, *transiently renders the bacterial membrane permeable to large molecules*.
fusion or chimeric protein
Changes can also be introduced that involve more than one base pair. Large parts of a gene can be deleted by cutting out a segment with restriction endonucleases and ligating the remaining portions to form a smaller gene. *Parts of two different genes can be ligated to create new combinations.* The product of such a fused gene is called a fusion protein.
DNA cloning
DNA cloning involves separating a specific gene or DNA segment from a larger chromosome, attaching it to a small molecule of carrier DNA, and then replicating this modified DNA thousands or millions of times through both an increase in cell number and the creation of multiple copies of the cloned DNA in each cell. The result is selective amplification of a particular gene or DNA segment.
restriction fragment length polymorphisms (RFLPs)
DNA fingerprinting is based on *sequence polymorphisms*, slight sequence differences (usually single base-pair changes) between individuals, 1 bp in every 1,000 bp, on average. Each difference from the prototype human genome sequence (the first one obtained) occurs in some fraction of the human population; every individual has some differences. *Some of the sequence changes affect recognition sites for restriction enzymes, resulting in variation in the size of DNA fragments produced by digestion with a particular restriction enzyme*. These variations are *restriction fragment length polymorphisms (RFLPs)*.
sticky ends
Some restriction endonucleases make staggered cuts on the two DNA strands, leaving two to four nucleotides of one strand unpaired at each resulting end. These unpaired strands are referred to as sticky ends, because they can base-pair with each other or with complementary sticky ends of other DNA fragments.
genomics and proteomics
Techniques for DNA cloning paved the way to the modern fields of genomics and proteomics, the study of genes and proteins on the scale of whole cells and organisms.
PCR is highly sensitive
This technology is highly sensitive: PCR can detect and amplify as little as one DNA molecule in almost any type of sample.
limitations of using plasmids as vectors
Transformation of typical bacterial cells with purified DNA (never a very efficient process) becomes *less successful as plasmid size increases*, and it is difficult to clone DNA segments *longer than about 15,000 bp* when plasmids are used as the vector.
restriction endonucleases (enzymes)
restriction endonucleases (also called restriction enzymes) recognize and cleave DNA at specific DNA sequences (recognition sequences or restriction sites) to generate a set of smaller fragments.
DNA ligases
the DNA fragment to be cloned can be joined to a suitable cloning vector by using DNA ligases to link the DNA molecules together. The recombinant vector is then introduced into a host cell, which amplifies the fragment in the course of many generations of cell division.
reverse transcriptase PCR (RT-PCR)
used to amplify RNA sequences. first step uses reverse transcriptase to convert RNA to DNA
quantitative or real-time PCR (Q-PCR)
used to show quantitative differences in gene levels
the yeast two-hybrid system
(a) In this system for *detecting protein-protein interactions*, the *aim is to bring together the DNA-binding domain and the activation domain of the yeast Gal4 protein through the interaction of two proteins*, X and Y, to which each domain is fused. This interaction is *accompanied by the expression of a reporter gene*. (b) The two fusions are created in separate yeast strains, which are then mated. The *mated mixture is plated on a medium on which the yeast cannot survive unless the reporter gene is expressed*. Thus, all surviving colonies have interacting protein fusion pairs. *Sequencing of the fusion proteins in the survivors reveals which proteins are interacting*.
protein complex isolation with epitope tag
*An epitope tag is a short protein sequence that is bound tightly by a well-characterized monoclonal antibody.* The tagged protein can be specifically precipitated from a crude protein extract by interaction wih the antibody.* If any other proteins bind to the tagged protein, those will precipitate as well, providing information about protein-protein interactions in a cell. *If the cDNA is cloned next to a gene for an epitope tag, the resulting fusion protein can be precipitated by antibodies to the epitope*. Any other proteins that interact with the tagged protein also precipitate, helping to elucidate protein-protein interactions.
expression vectors
*Cloning vectors with the transcription and translation signals needed for the regulated expression of a cloned gene are often called expression vectors.* The rate of expression of the cloned gene is controlled by *replacing the gene's own promoter and regulatory sequences with more efficient and convenient versions* supplied by the vector. Generally, a well-characterized promoter and its regulatory elements are *positioned near several unique restriction sites for cloning*, so that genes inserted at the restriction sites will be expressed from the regulated promoter element. Some of these vectors incorporate other features, such as a bacterial ribosome binding site to enhance translation of the mRNA derived from the gene, or a transcription termination sequence.
plasmids and transformation
*Plasmids are circular DNA molecules that replicate separately from the host chromosome. Naturally occurring bacterial plasmids range in size from 5,000 to 400,000 bp.* They can be introduced into bacterial cells by a process called transformation. *The cells (generally E. coli) and plasmid DNA are incubated together at 0 C in a calcium chloride solution, then subjected to a shock by rapidly shifting the temperature to 37 to 43 C. For reasons not well understood, some of the cells treated in this way take up the plasmid DNA.* Some species of bacteria are naturally competent for DNA uptake and do not require the calcium chloride treatment.
oligonucleotide-directed mutagenesis
*When suitably located restriction sites are not present, an approach called oligonucleotide-directed mutagenesis can create a specific DNA sequence change*. A short synthetic DNA strand with a specific base change is annealed to a single-stranded copy of the cloned gene within a suitable vector. *The mismatch of a single base pair in 15 to 20 bp does not prevent annealing if it is done at an appropriate temperature*. The annealed strand *serves as a primer for the synthesis of a strand complementary to the plasmid vector*. This slightly mismatched duplex recombinant plasmid is then used to transform bacteria, where the *mismatch is repaired by cellular DNA repair enzymes*. About half of the repair events will remove and replace the altered base and restore the gene to its original sequence; *the other half will remove and replace the normal base, retaining the desired mutation*. Transformants are screened (often by sequencing their plasmid DNA) until a bacterial colony containing a plasmid with the altered sequence is found.
DNA cloning: 5 general procedures
1. Cutting DNA at precise locations. Sequence-specific endonucleases (restriction endonucleases) provide the necessary molecular scissors. 2. Selecting a small molecule of DNA capable of self-replication. These DNAs are called cloning vectors (a vector is a delivery agent). They are typically plasmids or viral DNAs. 3. Joining two DNA fragments covalently. The enzyme DNA ligase links the cloning vector and DNA to be cloned. Composite DNA molecules comprising covalently linked segments from two or more sources are called recombinant DNAs. 4. Moving recombinant DNA from the test tube to a host cell that will provide the enzymatic machinery for DNA replication. 5. Selecting or identifying host cells that contain recombinant DNA.
features useful in a cloning vector (ex. pBR322)
1. pBR322 has an *origin of replication, ori,* a sequence where replication is initiated by cellular enzymes. This *sequence is required to propagate the plasmid* and maintain it at a level of 10 to 20 copies per cell. 2. The plasmid contains two genes that *confer resistance to different antibiotics (tet R, amp R), allowing the identification of cells* that contain the intact plasmid or a recombinant version of the plasmid. 3. Several unique recognition sequences in pBR322 (PstI, EcoRI, BamHI, SalI, PvuII) are *targets for different restriction endonucleases*, providing sites where the plasmid can later be cut to insert foreign DNA. 4. The *small size of the plasmid (4,361 bp)* facilitates its entry into cells and the biochemical manipulation of the DNA.
how are DNA microarrays useful?
A microarray can *answer such questions as which genes are expressed at a given stage in the development of an organism*. The total complement of mRNA is isolated from cells at two different stages of development and converted to cDNA, using reverse transcriptase and fluorescently labeled deoxynucleotides. The fluorescent cDNAs are then mixed and used as probes, each hybridizing to complementary sequences on the microarray. In Figure 9-22, for example, the labeled nucleotides used to make the cDNA for each sample fluoresce in two different colors. The cDNA from the two samples is mixed and used to probe the microarray. *Spots that fluoresce green represent mRNAs more abundant at the single-cell stage; those that fluoresce red represent sequences more abundant later in development. The mRNAs that are equally abundant at both stages of development fluoresce yellow.* By using a mixture of two samples to measure relative rather than absolute abundance of sequences, the method corrects for variations in the amounts of DNA originally deposited in each spot on the grid and other possible inconsistencies among spots in the microarray. *The spots that fluoresce provide a snapshot of all the genes being expressed in the cells at the moment they were harvested—gene expression examined on a genome-wide scale*. For a gene of unknown function, the time and circumstances of its expression can provide important clues about its role in the cell.
DNA microarrays (Figure 9-22)
A microarray can be prepared from any known DNA sequence, from any source, generated by chemical synthesis or by PCR. The DNA is positioned on a solid surface (usually specially treated glass slides) with the aid of a robotic device capable of depositing very small (nanoliter) drops in precise patterns. UV light cross-links the DNA to the glass slides. Once the DNA is attached to the surface, the microarray can be probed with other fluorescently labeled nucleic acids. Here, mRNA samples are collected from cells at two different stages in the development of a frog. The cDNA probes are made with nucleotides that fluoresce in different colors for each sample; a mixture of the cDNAs is used to probe the microarray. Green spots represent mRNAs more abundant at the single-cell stage; red spots, sequences more abundant later in development. The yellow spots indicate approximately equal abundance at both stages.
the yeast two-hybrid system continued
A sophisticated genetic approach to defining protein-protein interactions is based on the properties of the Gal4 protein (Gal4p), which activates transcription of certain genes in yeast. Gal4p has two domains, one that binds to a specific DNA sequence and another that activates the RNA polymerase that synthesizes mRNA from an adjacent reporter gene. The domains are stable when separated, but activation of the RNA polymerase requires interaction with the activation domain, which in turn requires positioning by the DNA-binding domain. Hence, the domains must be brought together to function correctly In this method, the protein-coding regions of genes to be analyzed are fused to the coding sequences of either the DNA-binding domain or the activation domain of Gal4p, and the resulting genes express a series of fusion proteins. If a protein fused to the DNA-binding domain interacts with a protein fused to the activation domain, transcription is activated. The reporter gene transcribed by this activation is generally one that yields a protein required for growth, or is an enzyme that catalyzes a reaction with a colored product. *Thus, when grown on the proper medium, cells that contain such a pair of interacting proteins are easily distinguished from those that do not.* *Typically, many genes are fused to the Gal4p DNA-binding domain gene in one yeast strain, and many other genes are fused to the Gal4p activation domain gene in another yeast strain*, then the yeast strains are mated and individual diploid cells grown into colonies. This *allows for large-scale screening for proteins that interact in the cell.*
construction of cDNA
As more and more genome sequences become available, the utility of genomic libraries is diminishing and investigators are constructing more specialized libraries designed to study gene function. An example is a library that includes only those genes that are expressed—that is, are transcribed into RNA—in a given organism or even in certain cells or tissues. Such a library lacks the noncoding DNA that makes up a large portion of many eukaryotic genomes. *The researcher first extracts mRNA from an organism or from specific cells of an or ganism and then prepares complementary DNAs (cDNAs) from the RNA in a multistep reaction catalyzed by the enzyme reverse transcriptase. The resulting double-stranded DNA fragments are then inserted into a suitable vector and cloned, creating a population of clones called a cDNA library.* The search for a particular gene is made easier by focusing on a cDNA library generated from the mRNAs of a cell known to express that gene.
bacterial artificial chromosomes (BACs)
Bacterial artificial chromosomes are simply plasmids *designed for the cloning of very long segments (typically 100,000 to 300,000 bp) of DNA*. They generally *include selectable markers such as resistance to the antibiotic chloramphenicol (Cm R), as well as a very stable origin of replication (ori)* that maintains the plasmid at one or two copies per cell. DNA fragments of several hundred thousand base pairs are cloned into the BAC vector. The large circular DNAs are then *introduced into host bacteria by electroporation*. These procedures use host bacteria with mutations that compromise the structure of their cell wall, permitting the uptake of the large DNA molecules.
in vitro packaging with bacteriophage γ
Bacteriophage γ has a very efficient mechanism for delivering its 48,502 bp of DNA into a bacterium, and it can be used as a vector to clone somewhat larger DNA segments. Two key features contribute to its utility: 1. About *one-third of the genome is nonessential* and can be replaced with foreign DNA. 2. DNA is packaged into infectious phage particles only if it is *between 40,000 and 53,000 bp long*, a constraint that can be used to ensure packaging of recombinant DNA only. *Researchers have developed bacteriophage vectors that can be readily cleaved into three pieces, two of which contain essential genes but which together are only about 30,000 bp long.* The third piece, "filler" DNA, is discarded when the vector is to be used for cloning, and additional DNA is inserted between the two essential segments to generate ligated DNA molecules long enough to produce viable phage particles. In effect, the packaging mechanism selects for recombinant viral DNAs. Bacteriophage vectors permit the cloning of DNA fragments of up to 23,000 bp. Once the bacteriophage fragments are ligated to foreign DNA fragments of suitable size, the resulting *recombinant DNAs can be packaged into phage particles by adding them to crude bacterial cell extracts that contain all the proteins needed to assemble a complete phage. This is called in vitro packaging.* All viable phage particles will contain a foreign DNA fragment. The subsequent transmission of the recombinant DNA into E. coli cells is highly efficient.
GFP-Tagged cDNA Library
Cloning of cDNA next to a gene for green fluorescent protein (GFP) creates a *reporter construct*. *RNA transcription proceeds through the gene of interest (insert DNA) and the reporter gene, and the mRNA transcript is then expressed as a fusion protein.* The GFP part of the protein is visible in the fluorescence microscope. The photograph shows a nematode worm containing a GFP fusion protein expressed only in the four "touch" neurons that run the length of its body.
site-directed mutagenesis
Cloning techniques can be used not only to overproduce proteins but to produce protein products subtly altered from their native forms. *Specific amino acids may be replaced individually by site-directed mutagenesis*. This powerful approach to studying protein structure and function *changes the amino acid sequence of a protein by altering the DNA sequence of the cloned gene*. If appropriate restriction sites flank the sequence to be altered, researchers can simply remove a DNA segment and replace it with a synthetic one that is identical to the original except for the desired change.
yeast artificial chromosomes (YACs)
E. coli cells are by no means the only hosts for genetic engineering. Yeasts are particularly convenient eukaryotic organisms for this work. As with E. coli, yeast genetics is a well-developed discipline. The genome of the most commonly used yeast, *Saccharomyces cerevisiae*, contains only 14x10^6 bp (a simple genome by eukaryotic standards, less than four times the size of the E. coli chromosome), and its entire sequence is known. Yeast is also very easy to maintain and grow on a large scale in the laboratory. *Plasmid vectors have been constructed for yeast, employing the same principles that govern the use of E. coli vectors described above*. Convenient methods are now available for moving DNA into and out of yeast cells, facilitating the study of many aspects of eukaryotic cell biochemistry. Some recombinant plasmids incorporate multiple replication origins and other elements that allow them to be used in more than one species (for example, yeast or E. coli). *Plasmids that can be propagated in cells of two or more different species* are called *shuttle vectors*.
DNA microarrays / genome chips
Major refinements of the technology underlying DNA libraries, PCR, and hybridization have come together in the development of DNA microarrays (sometimes called DNA chips), which allow the rapid and simultaneous screening of many thousands of genes. DNA segments from known genes, a few dozen to hundreds of nucleotides long, are amplified by PCR and placed on a solid surface, using robotic devices that accurately deposit nanoliter quantities of DNA solution. Many thousands of such spots are deposited in a predesigned array on a surface area of just a few square centimeters. *Once the chip is constructed, it can be probed with mRNAs or cDNAs from a particular cell type or cell culture to identify the genes being expressed in those cells.*
blunt ends
Other restriction endonucleases cleave both strands of DNA at the opposing phosphodiester bonds, leaving no unpaired bases on the ends, often called blunt ends
photolitographic synthesis of DNA used to prepare a DNA microarray
Photolithography. This technique for *preparing a DNA microarray makes use of nucleotide precursors that are activated by light, joining one nucleotide to the next in a photoreaction* (as opposed to the chemical process illustrated in Fig. 8-38). A computer is programmed with the oligonucleotide sequences to be synthesized at each point on a solid surface. The surface is washed successively with solutions containing one type of activated nucleotide (A*, G*, etc.). As in the chemical synthesis of DNA, *the activated nucleotides are blocked so that only one can be added to a chain in each cycle*. A screen covering the surface is opened over the areas programmed to receive a particular nucleotide, and *a flash of light joins the nucleotide to the polymers in the uncovered areas*. This continues until the required sequences are built up on each spot on the surface. Many polymers with the same sequence are generated on each spot, not just the single polymer shown. Also, the surfaces have thousands of spots with different sequences (see Fig. 9-22); this array shows just four spots, to illustrate the strategy.
yeast artificial chromosomes (YACs) continued
Research work with large genomes and the associated need for high-capacity cloning vectors led to the development of yeast artificial chromosomes YAC vectors contain all the elements needed to maintain a eukaryotic chromosome in the yeast nucleus: *a yeast origin of replication, two selectable markers, and specialized sequences needed for stability and proper segregation of the chromosomes at cell division*. Before being used in cloning, the vector is *propagated as a circular bacterial plasmid*. Cleavage with a *restriction endonuclease removes a length of DNA between two telomere sequences (TEL)*, leaving the telomeres at the ends of the linearized DNA. *Cleavage at another internal site divides the vector into two DNA segments*, referred to as *vector arms*, each with a different selectable marker. The *DNA fragments of appropriate size (up to about 2x10^6 bp)* are mixed with the prepared vector arms and ligated. The ligation mixture is then used to transform treated yeast cells with very large DNA molecules. Culture on a medium that requires the presence of both selectable marker genes ensures the growth of only those yeast cells that contain an artificial chromosome with a large insert sandwiched between the two vector arms. The stability of YAC clones increases with size (up to a point). Those with *inserts of more than 150,000 bp are nearly as stable as normal cellular chromosomes*, *whereas those with inserts of less than 100,000 bp are gradually lost during mitosis* (so generally there are no yeast cell clones carrying only the two vector ends ligated together or with only short inserts). YACs that lack a telomere at either end are rapidly degraded.
linkers and polylinkers
Researchers can create new DNA sequences by inserting *synthetic DNA fragments (called linkers)* between the ends that are being ligated. *Inserted DNA fragments with multiple recognition sequences* for restriction endonucleases (often useful later as points for inserting additional DNA by cleavage and ligation) are called *polylinkers*.
restriction-modification system
Restriction endonucleases are found in a wide range of bacterial species. Werner Arber discovered in the early 1960s that their biological function is to recognize and cleave foreign DNA (the DNA of an infecting virus, for example); such DNA is said to be restricted. *In the host cell's DNA, the sequence that would be recognized by its own restriction endonuclease is protected from digestion by methylation of the DNA, catalyzed by a specific DNA methylase.* The restriction endonuclease and the corresponding methylase are sometimes referred to as a restriction-modification system.
polymerase chain reaction (PCR)
The PCR procedure has an elegant simplicity. *Two synthetic oligonucleotides are prepared, complementary to sequences on opposite strands of the target DNA at positions just beyond the ends of the segment to be amplified. The oligonucleotides serve as replication primers that can be extended by DNA polymerase*. The 3 ends of the hybridized probes are oriented toward each other and positioned to prime DNA synthesis across the desired DNA segment. (DNA polymerases synthesize DNA strands from deoxyribonucleotides, using a DNA template.) *Isolated DNA containing the segment to be amplified is heated briefly to denature it, and then cooled in the presence of a large excess of the synthetic oligonucleotide primers. The four deoxynucleoside triphosphates are then added, and the primed DNA segment is replicated selectively.*
what determines the size of the DNA fragments?
The average size of the DNA fragments produced by cleaving genomic DNA with a restriction endonuclease *depends on the frequency with which a particular restriction site occurs in the DNA molecule*; this in turn depends largely on the size of the recognition sequence.
after 25 or 30 cycles of PCR
The cycle of heating, cooling, and replication is repeated 25 or 30 times over a few hours in an automated process, amplifying the DNA segment flanked by the primers until it can be readily analyzed or cloned. *PCR uses a heat-stable DNA polymerase, such as the Taq polymerase (derived from a bacterium that lives at 90 C), which remains active after every heating step and does not have to be replenished*. Careful design of the primers used for PCR, such as including restriction endonuclease cleavage sites, can facilitate the subsequent cloning of the amplified DNA
how to detect restriction fragment length polymorphisms?
The detection of RFLPs relies on a specialized hybridization procedure called Southern blotting. DNA fragments from digestion of genomic DNA by restriction endonucleases are separated by size electrophoretically, denatured by soaking the agarose gel in alkali, and then blotted onto a nylon membrane to reproduce the distribution of fragments in the gel. The membrane is immersed in a solution containing a radioactively labeled DNA probe. A probe for a sequence that is repeated several times in the human genome generally identifies a few of the thousands of DNA fragments generated when the human genome is digested with a restriction endonuclease. Autoradiography reveals the fragments to which the probe hybridizes.
Type I restriction endonucleases (requires ATP)
Type I restriction endonucleases cleave DNA at random sites that can be more than 1,000 base pairs (bp) from the recognition sequence.
Type II restriction endonucleases
Type II restriction endonucleases, first isolated by Hamilton Smith in 1970, are simpler, require no ATP, and cleave the DNA within the recognition sequence itself.
Type III restriction endonucleases (requires ATP)
Type III restriction endonucleases cleave the DNA about 25 bp from the recognition sequence.
single nucleotide polymorphisms (SNPs)
What does all this information tell us about how much one human differs from another? Within the human population are millions of single-base differences, called single nucleotide polymorphisms, or SNPs (pronounced "snips"). Each human differs from the next by about 1 bp in every 1,000 bp. From these small genetic differences arises the human variety we are all aware of—differences in hair color, eyesight, allergies to medication, foot size, and even (to some unknown degree) behavior. Some of the SNPs are linked to particular human populations and can provide important information about human migrations that occurred thousands of years ago and about our more distant evolutionary past.
pulse field gel electrophoresis (PFGE)
a variation of gel electrophoresis that allows the *separation of very large DNA segments*.
enhancing protein complex isolation with tandem affinity purification (TAP) tagged proteins
tandem affinity purification: allows two purification steps eliminating loosely associated proteins, and minimizing non-specific binding