Ch. 9: Genomics
randomly amplified polymorphic DNA (RAPD, pronounced "rapid," NOT "raped.")
Definition: *A set of several genomic fragments amplified by a single PCR primer; somewhat variable from individual to individual; + / - heterozygotes for individual fragments can act as markers in genome mapping.*
motifs
Definition: *A short DNA sequence that is associated with a particular functional role (see robotics).* In other words and supplementary info: Motifs are short strings of base pairs characteristic of sites regulating particular events in gene expression or chromosome replication, such as 5' splice sites or origins of replication.
clone-based genome sequencing
Definition: *An approach to genome sequencing on a clone-by-clone basis.* In other words and other info: One of the two major routes of production of nucleotide sequence maps. In clone-based genome sequencing, a subset of clones from the physical map is fully sequenced. Then, the overlaps between sequences of the individual clones are used to assemble the entire sequence map of the genome.
high-throughput
Definition: *Highly automated methods used to generate large-scale data collections (see robotics) .*
genome projects
Definition: *Large-scale, often multilaboratory efforts required to sequence complex genomes. In other words and supplementary info: Characterizing whole genomes is important to a fundamental understanding of the design principles of living organisms and for the discovery of new genes such as those that are involved in human genetic disease.
VNTRs (variable number tandem repeats)
Definition: *A chromosomal locus at which a particular repetitive sequence is present in different numbers in different individuals or in the two different homologs in one diploid individual.* In other words and supplementary info: Utilized as markers for SSLPs' minisatellite markers. The VNTR loci in humans are 1- 5-kb sequences consisting of variable numbers of repeating unit from 15 to 100 nucleotides long.
Some differences in the convenience of RFLP and SSLP analyses.
(1) RFLP analysis requires a specific cloned single-copy DNA probe to be on hand in the laboratory for the detection of each individual marker locus. (2) Microsatellite analysis requires a single-copy DNA PCR primer pair for each marker locus. (3) Minisatellite analysis on the other hand requires just one probe that detects the core sequence of the repetitive element simultaneously at all sites of that repetitive element anywhere in the genome.
DNA breakpoint (broken end)
...
Interactome
...
Ordering by Clone Fingerprints
...
Proteome
...
SINEs (short interspersed elements)
...
STS content maps
...
Transcriptome
...
bioinformatics
...
consensus sequence
...
fluorescence activated chromosome sorting (FACS)
...
heterochromatin
...
long interspersed elements (LINEs)
...
minimum tilting path
...
mobile genetic elements
...
motifs
...
ordered clone sequencing
...
paired-end sequences
...
primer walking
...
proteome
...
pseudogenes
...
pulse field gel electrophoresis (PFGE)
...
reporter gene
...
retrotransposons
...
satellite DNA
...
scaffolds
...
sequence assembly
...
sequence-tagged sites (STSs)
...
structural genomics
...
tandem repeats
...
transposases
...
transposition
...
whole genome shotgun sequencing
...
fluorescent in situ hybridization (FISH)
A process used in "in situ hybridization mapping". In this process, the cloned DNA is labeled with a fluorescent dye, and a denatured chromosome preparation is bathed in this probe. When the probe binds to the chromosome in situ, the location of the cloned fragment is revealed by a bright fluorescent spot.
contig
Definition: *A set of ordered overlapping clones that constitute a chromosomal region or a genome*
DNA fingerprints
Definition: *The autoradiographic banding pattern produced when DNA is digested with a restriction enzyme that cuts outside a family of VNTRs and a Southern blot of the electrophoretic gel is probed with a VNTR-specific probe. Unlike true fingerprints, these patterns are not unique to each individual.* In other words and supplementary info:
single-copy DNA
Definition: *DNA sequences present in only one copy per haploid genome.* In other words and supplementary info: Often used as a probe for RFLPs that is a cloned DNA fragment that uniquely comes from only one DNA segment of the genome and that overlaps the restriction site.
nucleotide sequence maps
The highest-resolution maps of the genome. There are two major routes of production of sequence maps: (1) *clone-based genome sequencing* (see other notecard), and (2) *whole genome shotgun sequencing (WGS) sequencing* (see other notecard).
chromosome painting
An extension of FISH used in "in situ hybridization mapping." Rather than chromosome morphology landmarks, this technique uses a standard control set of probes homologous to known locations to establish the cytogenetic map. Sets of cloned DNA known to be from specific chromosomes or specific chromosome regions are labeled with different fluorescent dyes. These dyes then "paint" specific regions and identify them under the microscope when the set of labeled clones are used as probes in FISH. If a probe consisting of a cloned sequence of unknown location is labeled with yet another dye, then its position can be established in the painted array.
high-resolution genetic maps
An overview of how to analyze a whole genome: (1) As starting material, use low-resolution genetic maps derived from the recombinational mapping of heritable differences or cytogenetic mapping techniques. (2) Layer onto the low resolution genetic maps the locations of *DNA polymorphisms* (see flashcard).
high-resolution meiotic recombination maps
Basically, this type of mapping takes the data determined by the study of specific genetic markers and utilizes them to determine the specific location of all of these DNA segments relative to each other thereby making a complete (high-resolution map) of the entire genome. this type of linkage mapping is based on the principles in Ch. 6, in other words, on analyzing recombinant frequency in dihybrid and multihybrid crosses. *Neutral DNA sequence variation* is used for locating markers for use in mapping. *in mapping, the biological significance of the DA marker (if any) is not of relevance; the heterozygous site is merely a convenient reference point that will be useful in navigating the genome. (A useful analogy: Markers are being used just as milestones were used by travelers in previous centuries. Travelers were not interested in the milestones (markers) per se, except as pointers to their final destinations.
microsatellite markers
Definition: *A DNA difference at the equivalent location in two genomes due to different repeat lengths of a microsatellite.* In other words and supplementary info: One of the two (2 of 2) types of SSLPs now routinely used in genomics Microsatellite markers are dispersed regions of the genome composed of variable numbers of dinucleotides repeated in tandem (i.e., double-stranded DNA base pair repeating). Regions surrounding these individual icrosatellite repeats can be detected by probes made with the help of the polymerase chain reaction (PCR).
DNA polymorphisms
Definition: *A naturally occurring variation in DNA sequence at a given location in the genome.* In other words and supplementary info: Molecularly defined differences between individuals. DNA polymorphisms are used for the layering because they are far more frequent than gene mutations affecting observable phenotypes on the morphology or viability of an organism, and so the potential density of the markers on high-resolution genetic maps is much higher than that of classical genetic maps.
functional genomics
Definition: *Studying the patterns of transcript and protein expression and molecular interactions at a genome-wide level.* In other words and supplementary info: The global study of the structure, expression patterns, interactions, and regulation of the RNAs and proteins encoded by the genome. Much of this analysis is focused on deciphering the structure and expression of gene products. i.e., studying the patterns of transcript and protein expression and molecular interactions at a genome-wide level. *Functional genomics can be broken into three categories.* These 3 categories are defined based on how many steps separate the products being assessed from the gene itself --- the direct products of genes are the RNAs, and in turn, the mRNAs encode the proteins. : *(1)* Thus, *characterizations of the structures of the transcripts and the polypeptides are of primary importance.* The genome encodes information determining when, where, and how much of each protein is expressed. *(2) Characterizing the expression patterns of the RNAs and proteins is then the second level of functional analysis.* *(3)* Finally, *individual RNA and polypeptide molecules typically do not function by themselves, but rather they interact within complexes with other molecules or more copies of themselves. Techniques exist to identify such interactions, and these form the third level of functional genomic analysis.*
robotics
Definition: *The application of automation technology to large-scale biological data collection efforts such as genome projects.* In other words and supplementary info: A special computer-driven automated technique for handling and sequencing millions of clones, with the same experimental manipulations carried out on each clone. Several advantages for genomics: (1) process flow rate can be much more rapid, allowing these projects to be *high throughput*, (2) personnel can focus on the more individualized and interpretive aspects of the genome projects, and (3) samples can be tracked more accurately (i.e., there will be more assurance that a sequence read truly comes from a particular clone stored in a freezer). Since any individual sequence "read" is subject to experimental error, computer programs are required to identify those sequences that overlap one another and statistically analyze their overlaps to determine the most likely correct sequence (the *consensus sequence*)
consensus sequence
Definition: *The inferred most likely real sequence of a segment of DNA based on multiple sequence reads of the same segment (see robotics).*
physical maps (an array of clones covering the genome)
Definition: *The ordered and oriented map of cloned DNA fragments on the genome.* In other words and supplementary info: Physical maps are maps of physically isolated pieces of the genome --- in other words, maps of cloned genomic DNA. A complete physical map of a genome consists of a series of maps for each chromosome in the haploid chromosome set. For each chromosome, a complete map consists of a series of continuous overlapping cloned genomic DNA segments extending from one telomere of the chromosome to the other. These continuous overlapping clone sets that represent each chromosome as giant chromosome walks. Two main values of physical maps: (1) Previously cloned genes and DNA markers on a genetic map can be localized to a specific clone on a physical map by hybridization techniques such as Southern blotting or PCR. This gives us a way to measure DNA locations and distances between these markers --- something that genetic maps cannot provide by themselves. (2) One of the main strategies for sequencing an entire genome requires a complete physical map. Thus, the availability of the physical map provides an intermediary in moving from genetic maps to complete sequence maps of the genome. Creating a complete physical map begins by amassing a large number of randomly cloned inserts. In some manner, sequence identities between end regions of the clones must be identified; these sequence identities are used to infer that the ends of the two clones overlap and thus the clones come from adjacent regions of the genome. A set of overlapping clones is called a *contig* (see flashcard). In early phases of a genome project, contigs are numerous and represent separate cloned segments of the genome. But, as more and more clones are characterized, clones are found that overlap two previously separate contigs, and these "joining clones" then permit the merger of the two contigs into one larger one. This process of contig merging continues until eventually a set of contigs is built that is equal to the number of chromosomes. At this point, if the contigs in this set extend out to the telomeres of the chromosomes, the physical map is complete. In summary: Genomic cloning proceeds by assembling clones into overlapping groups called contigs. As more data accumulate, the contigs extend the length of whole chromosomes.
physical maps
Definition: *The ordered and oriented map of cloned DNA fragments on the genome.* In other words and supplementary info: Physical maps provide a view of how the clones from genomic clone libraries are distributed throughout the genome. These maps support positional cloning efforts, since in essence they are chromosome walks throughout the genome (see Chapter 8 for a description of chromosome walks), and they also provide logical criteria for selection of clones for production of the highest-resolution maps.
whole genome shotgun (WGS) sequencing
Definition: *The sequencing of ends of clones without regard to any information about the location of the clones.* In other words and supplementary info: One of the two major routes of production of nucleotide sequence maps. In whole genome shotgun (WGS) sequencing, portions of the inserts adjacent to junction points that contain vector sequences are sequenced from a great many random clones throughout the genome. Then, the overlapping sequence information is used to assemble the sequence of the entire genome and to reconstruct the physical map of the clones.
neutral DNA sequence variation
Definition: *Variation in DNA sequence that is not under natural selection.* In other words and supplementary info: Neutral variation is variation that's not associated with any measurable phenotypic variation.
comparative genomics
Definition: The analysis of the relationships of the genome sequences of two or more species. In other words and supplementary info: The examination of the overall structure of a genome and comparing it to that of a taxonomically related species to understand the evolutionary events that have occurred in subsequent to the divergence of these species.
restriction fragment length polymorphisms (RFLPs)
Definition: *A DNA sequence difference between individuals or haplotypes recognized as different restriction fragment lengths. For example, a nucleotide-pair substitution can cause a restriction-enzyme-recognition site to be present in one allele of a gene and absent in another. Consequently, a probe for this DNA region will hybridize to different-sized fragments within restriction digests of DNAs from these two alleles. * In other words and supplementary info: One of the molecular markers (1 of 3) that are employed to produce high-resolution meiotic maps. RFLPs are restriction enzyme recognition sites (Ch. 8) that are present in some strains and absent in others. Take progeny of individual heterozygous for RFLP. Then backcross that individual to an individual that's homozygous for one of the RFLP variants (presence or absence of the restriction site). After that, take Southern blots of restriction-digested genomic DNA from these progeny and hybridize them with a probe that will distinguish the various genotypes for an RFLP. A *single-copy DNA* probe (see flashcard) is used for this purpose. By measuring the recombination frequency between this RFLP with other RFLP markers, a detailed RFLP map of the genome can be produced. The RFLPs can be mapped relative to one another or to genes of known phenotypic expression (i.e., linkage analysis)
simple-sequence length polymorphisms (SSLPs) aka short-sequence-length polymorphism.
Definition: *The presence of different numbers of short repetitive elements (mini- and microsatellite DNA) at one particular locus in different homologous chromosomes; heterozygotes represent useful markers for genome mapping. * In other words and supplementary info: The collective name for tandem repeat markers. A tandem repeat is a sequence that is repeated two or more times in the same orientation (e.g., ATCTCATCTCATCTCATCTC is a fourfold tandem repeat of the sequence ATCTC). Advantages over RFLPs: (1) Larger number of alleles that could act as specific tags so that the input alleles from both parents (two from each parent) can all be tracked in a pedigree --- RFLPs have a limited # or alleles in a pedigree. (2) Heterozygosity for RFLPs can be low --- i.e., if one allele is relatively uncommon in relation to the other, the proportion of heterozygotes (the crucial individuals useful in mapping) will be low. However, SSLPs show much higher levels of heterozygosity (more useful because the likelihood of having an SSLP difference located in a region of interest in a particular pedigree is quite high. Two types of SSLPs are now routinely used in genomics, *minisatellite* and *microsatellite* markers (see flashcards for both.
SNPs, or single-nucleotide polymorphisms
Even with all of the above DNA markers, there still are not sufficient markers at sufficient density for some purposes. With the availability of a whole genome sequence an even richer source of variation can be exploited --- SNPs, or single-nucleotide polymorphisms. Many of these differences are due to neutral variation, such as third codon position variantes in degenerate codons for the same amino acid. Between any two human genomes, there is about one SNP difference in every 1000 base pairs of human DNA. Given a genome size of roughly 3 billion base pairs of DNA, this means there are about 3 million differences between any two of our genomes. Extensive projects to develop high-density SNP maps for humans and some model organisms have been carried out.
High-Resolution Cytogenetic Maps
High-resolution cytogenetic maps can be produced in a variety of ways, by relating the locations of DNA markers to cytogenetic landmarks such as chromosome bands and puffs, or to disruptions to the chromosomal integrity (rearrangement breakpoints). By correlating structural landmarks on chromosomes with the location of cloned probe DNA, extensive cytogenetic maps are produced.
genome projects
In other words and supplementary info: For complex eukaryotes with genomes inn the range of several tens to some thousands of megabase pairs, sequencing requires a complex strategy and coordination of the efforts of many scientists and engineers: these are referred to as genome projects. These coordinated efforts for many microbes, for several of the important model genetic organisms, and for humans have culminated in first drafts or complete views of their genomic sequence.
whole genome mapping
In other words and supplementary info: The creation of molecular views of the genome at different levels of resolution. High-resolution maps are used to place molecularly defined differences (such as the presence or absence of a restriction enzyme cut site) on the sorts of linkage maps already discussed (Chapter 6) or on cytogenetic maps (as will be discussed in Chapter 11). These maps provide molecular landmarks for building the higher-resolution physical and sequence maps, and also provide molecular entry points for researchers interested in cloning genes with interesting phenotypes (positional cloning --- see Chapter 8). In other words,
whole-genome mapping
In other words and supplementary info: Whole genome mapping proceeds in several steps, each increasing in level of resolution, culminating in a full sequence map of each chromosome of the species. As a starting point, obtain low-resolution chromosomal maps of genes known to produce certain mutant phenotypes (they're called "low-resolution" because phenotypic analysis identifies only a small subset of the genes within a genome). Starting with these genetic linkage maps, whole genome molecular mapping proceeds through many steps with each step increasing in resolution. (Step 1) Position genes and molecular markers on high-resolution genetic (recombinational and cytogenetic) maps of each chromosome. 2. Physically characterize and position individual cloned DNA fragments relative to one another to create a synthetic clone-based view of each chromosome. During this process, the high-resolution genetic map or maps of the genome will be anchored to the physical map. 3. Conduct large-scale genomic DNA sequence analysis to produce a complete sequence map of each chromosome. The genetic and physical maps can then be anchored to the sequence map.
In situ hybridization mapping
One method (1 of 3) for producing high-resolution cytogenetic maps. If a cloned DNA sequence is available then it can be sed to make a labeled probe for hybridization to chromosomes in situ. The logic of this approach is identical to any filter hybridization technique such as Southern blotting, except that here, largely intact chromosomes are the target for probe hybridization (rather than DNAs extracted onto the membrane). In this technique, the denatured and labeled probe is hybridized to preparations on microscope slides in which cells have been broken open and their chromosomes spread out. The DNA of the chromosomes has been denatured such that the labeled probe will hybridize to the sites of homologous sequences within the chromosomal DNA. If the individual chromosomes of the genomic set are recognizable through their morphology (banding patterns, size, centromere location, or other cytological features), the probe sequence can be mapped to the approximate position on the chromosome to which it hybridizes. The term "approximate" is used, because this technique does not have the resolving power of recombinational mapping.Commonly used labels for probes are radioactivity and fluorescence.
Rearrangement breakpoint mapping
One method (2 of 3) for producing high-resolution cytogenetic maps. In Ch. 11, chromosomal rearrangements will be discussed. Chromosomal rearrangements are a class of mutations that result from the severing of the DNA backbone of a chromosome at one location and its rejoining with another similarly derived *DNA breakpoint* (broken end). In other words, a segment of chromosome now has new neighboring sequences. One helpful feature about rearrangement breakpoints is that they also serve as molecular landmarks. When cloned DNA spanning a breakpoint has been identified, the breakpoints are easily detected on Southern blots as two bands of hybridization, whereas in normal chromosomes there would be only one, or by FISH mapping as two sites of labeling instead of one.
Radiation hybrid mapping
One method (3 of 3) for producing high-resolution cytogenetic maps. This technique was designed to generate a higher-resolution map of molecular markers along a chromosome, and importantly, it does not require marker heterozygosity. Take radiation hybrid mapping of human genes as an example: The technique depends on the fact that in cell lines produced by forcing cultured human and rodent cells to hybridize (fuse) with one another, only a few human chromosomes will be retained by any resulting hybrid cells. These human chromosomes are then stably inherited in a clone of cells. The selection of human chromosomes that are retained in a given cell appears to be random. In radiation hybrid mapping, instead of whole chromosomes, each hybrid cell line contains a random set of human chromosome fragments. The procedure is to irradiate human cells with 3000 rads of X rays to fragment the chromosomes, and then fuse the irradiated cells with rodent cells to form a radiation hybrid mapping panel --- a series of clones, each containing a different random assortment of fragments of human chromosomes. Typically the fragments are integrated into the rodent chromosomes by X-ray-induced chromosome breakage and rejoining. DNA from each cell line in the radiation hybrid mapping panel is isolated, placed in separate spots on membranes, and denatured. A labeled single-copy human DNA probe is hybridized to such membranes, and sites of labeling identify cell lines containing a human chromosome fragment corresponding to the DNA homologous to the probe. After accumulating probe-hybridization information for many probes, the data are analyzed for co-retention of DNA markers in a manner analogous to that for mapping bacterial genes by cotransduction (Ch. 7). In other words, linkage can be determined. In other other words, the co-retention of different human markers in radiation hybrid mapping panels allows high-resolution mapping of the chromosomal loci of the DNA markers.
DNA markers based on variable numbers of short-sequence repeats
One of the molecular markers (2 of 3) that are employed to produce high-resolution meiotic maps. RFLPs have been replaced by markers based on variation in the number of short tandem repeats (*SSLPs* --- see flashcard for SSLP definition) SSLPs have
RAPDs: DNA markers based on random PCR amplification
One of the molecular markers (3 of 3) that are employed to produce high-resolution meiotic maps. A single PCR primer designed at random will often by chance amplify several different regions of the genome. The single sequence amplifies only those DNA segments bracketed by nearby inverted copies of the primer sequence (typically a few hundred base pairs apart). The set of amplified DNA fragments is called *randomly amplified polymorphic DNA (RAPD)* (see flashcard). In a cross, some of the amplified bands may be unique to one parent, in which case they can be treated as heterozygous loci and used as DNA markers in mapping analysis.
minisatellite markers
One of the two (1 of 2) types of SSLPs now routinely used in genomics. Definition: A type of repetitive DNA sequence based on short repeat sequences with a unique common core; used for DNA fingerprinting. In other words and supplementary info: Minisatellite markers are based on variation of the number of *variable number tandem repeats (VNTRs)* (see flashcard for definition). If a VNTR probe is available and the total genomic DNA is cut with a restriction enzyme that has no target sites within the VNTR arrays, then a Southern blot will reveal a large number of different-sized fragments that are bound by the probe. Because the variability in the number of tandem repeats from person to person, the set of fragments that are revealed by Southern blot analysis is highly individualistic. In fact, these patterns are sometimes called *DNA fingerprints* (see flashcard).
genomics
Studying biological patterns and processes from the perspective of the whole genome. Definition: *The cloning and molecular characterization of entire genomes.* In other words and supplementary info: "genetics is local; genomics is global" Global approaches (i.e., genomics) allow us to identify ALL genes within the genome contributing to a system of interest according to some shared molecular property --- structural similarity, common time or anatomical site of expression, common change in expression levels when cells are exposed to an environmental agent, same chromosomal location, etc. Two reasons genomic analysis merits special attention: (1) Special experimental techniques are used to carry out task of manipulating and characterizing large #'s of genes and large amounts of DNA (see *robotics* definition), and (2) analysis of whole genomes (i.e., interpretation of the *consensus sequence*) gives us new insights into global organization, expression, regulation, and evolution of the hereditary material. To interpret the informational content of the consensus sequence of a genome, a computer program that compares a new consensus sequence with databases of many different kinds of sequences is necessary. Such comparisons look for statistically significant levels of sequence similarity at the level of encoded proteins or DNA landmarks called *motifs*. In *comparative genomics*, other programs examine the overall structure of a genome and compare it to that of taxonomically related species.
