Genomics Exam 2
Explain how a microarray is like a Northern blot in reverse.
Northern blot studies mRNA to determine gene expression microarray studies cDNA to determine transcript expression
Method to detect genetic variation based on SNPs that occur in restriction enzyme recognition sites.
RFLP (restriction fragment length polymorphism)
RNA-Seq experiments routinely yield over _____________ DNA sequence reads per RNA sample.
10,000,000
At the molecular level, describe 3 distinct types of variation or mutations that occur in genomes.
replacement mutations- change in amino acid synonymous mutations- no change in amino acid indel- insertion or deletion of a base pair
How are genome-wide association studies used to identify genes contributing to complex diseases such as diabetes? Explain the typical study design.
scan and compare genomes for healthy and diseased pateints to identify SNPs and CNVs that are associated with a disease
Given the choice of using microsatellites or SNPs to genetically map historical human migration patterns, which method would you choose and why?
SNPs because they only occur via point mutations whereas microsatellites can occur via replication slippage, unequal crossing over, or mutations extending or interrupting a series of repeats i.e. they are more common
What is the human microbiome project? Describe two examples of how this information may lead to improved health care in the future.
a project to identify and characterize the microorganisms associated with the human body personalized medicine for dietary health and antibiotics
Why don't we use QTL studies to map human disease genes?
because QTL works best if you can control genetic variance while reducing environmental variants, this is often done by mating inbred parents to disrupt LD via recombination obviously you can't do that with humans
What is the difference between biological and technical replication?
biological replication is multiple samples i.e. RNA samples from diff. individuals technical replication is multiple tests i.e. same RNA sample repeatedly
Simultaneous comparison of transcriptome profiles across dozens of treatments and/or genetic mutants.
compendium
Amplification or deletion of genomic DNA regions over 10,000 bp in size.
copy number variants
What is the purpose of data normalization in a microarray experiment?
corrects for systemic errors like slide to slide variation, variation in dyes, etc
Which of the following approaches are suitable to fluorescently label cDNA for hybridization to a microarray slide? a. Direct labeling by incorporating the fluorescent dye during reverse transcription. b. Indirect labeling using a 3D reagent that is attached to an oligonucleotide probe that is complementary to part of the primer used in cDNA synthesis. c. Amplify RNA using biotinylated nucleotides, which can then be recognized by a fluorescently-labeled streptavidin-phycoerythrin compound. d. All of the above. e. None of the above.
d. All of the above.
Which of the following can serve as templates for identifying SNPs? a. cDNA clones b. BAC clones c. PCR amplification of target genomic DNA regions d. All of the above.
d. All of the above.
Which of the following methods is used for the detection of minisequencing single-base extension products? a. Hybridization to a microarray slide. b. Hybridization to microbeads followed by fluorescence-activated bead sorting. c. Liquid phase reaction followed by gel electrophoresis. d. All of the above.
d. All of the above.
Which of the following statements about coding SNPs is correct? a. May be located in the 5' or 3' UTR regions. b. May be located in the gene promoter region. c. May be located within introns. d. May be located within exons. e. May be located within intergenic regions.
d. May be located within exons.
Which of the following is not an example of an experimental design for a microarray experiment involving competitive hybridization? a. Reference sample designs. b. Split-plot designs. c. Loop designs. d. Square designs.
d. Square designs.
Which of the following statements about the distribution of SNPs is incorrect? a. The vast majority of SNPs occur infrequently (i.e., they are rare) in natural populations. b. Most SNPs are maintained in natural populations through a balance between mutations (creating new SNPs) and genetic drift (loss of SNPs). c. On average, humans have one SNP per kilobase between the two chromosomes of any individual. d. The level of nucleotide diversity in humans is constant across the genome. e. None of the above.
d. The level of nucleotide diversity in humans is constant across the genome.
What is the problem with using intensity ratios to express array data? How do we correct for this?
down regulated gene intensity ratios are limited from 0 to 1 while up regulated gene intensity ratios can go from 1 to infinity this can be fixed via a log 2 transformation which treats genes symmetrically and centers them around zero
Log2 transformed intensity ratios are used in microarray data to correct for compression of what type of genes?
down-regulated
In recombination mapping, linkage is broken when recombination occurs between the disease locus and what?
genetic marker
Nonrandom association of alleles.
linkage disequilibrium
This microarray format has DNA probes of 50-70 bp with one probe copy per gene, often located within the UTR.
long oligo probe
What is linkage disequilibrium?
nonrandom association of alleles i.e. closely located alleles will travel together during recombination/meiosis
Proportion of nucleotides that differ between a pair of alleles in a population.
nucleotide diversity
Statistical parameter that is based on p-values and is used to calculate a false discovery rate.
q-value
Genetic mapping strategy that tests if a SNP is more common in affected (i.e., with disease), unrelated individuals than expected by chance.
association mapping
What is the purpose of data normalization in a microarray experiment? a. To account for and remove bias due to variable amounts of mRNA used as template for each sample. b. To identify genes that are differentially expressed between treatments. c. To account for and remove bias associated with incorporation and detection of multiple fluorescent dyes. d. A and B. e. A and C.
e. A and C.
Hierarchical clustering heat map features
-each row corresponds to a gene on a microarray -each column corresponds to a treatment class in an experiment -a dendogram of the phylogenetic relationships between the expression patterns of each gene is shown on the left -the color-coding corresponds to transcript abundance relative to a control sample -up-regulated genes are colored red whereas down-regulated genes are colored blue
Volcano plot features
-vertical axis is the -log of P values -genes with high statistical significance are found in the upper half of the figure -horizontal axis is the log2 ratio of treatment/control or fold-change gene expression -up-regulated genes are to the right of "0" and downregulated genes are to the left of "0" -genes centered near "0" on the horizontal axis and located in the lower half of the vertical axis are "boring", i.e., no change in abundance relative to control and with a high P value -genes of interest are located in the top right (low p value and up-regulated expression relative to control) and the top left (low p value and down-regulated expression relative to control)
Size of the "tag" used in an RNA-Seq library.
200 bp
Indicate your preferred choice of SNP genotyping method/platform and provide a justification. You are a scientist examining flower timing in Arabidopsis thaliana populations spaced along a north to south gradient of the west coast of North America. Your experimental plan requires comparison of flowering times (phenotype) with genotypes for 10 genes previously identified as contributing to the trait.
A relatively small number of genes (10) are being examined in the study. For a targeted genotyping project like this it would be most appropriate to use one of the minisequencing methods such as single-base extension or pyrosequencing. These assays can be customized to target individual genes by designing primers immediately adjacent to SNP loci.
Statistical test to measure the significance of change for more than two groups.
ANOVA
Human genetic diversity is highest in populations currently living in this region of the world.
Africa
How does genetic variation in the CCR5 gene influence our susceptibility to "germs" such as the HIV virus?
CCR5 is part of a group of proteins that allow viruses into the cell a variation in the CCR5 gene protects individuals from the HIV gene because the virus cannot gain entry into the cell
A q-value is a measure of the proportion of genes at any significance level that are expected to be true positives. True False
False
For the statistical analysis of microarray data, an analysis of variance (ANOVA) is appropriate when comparing the mean expression of a gene between two treatments; however, more complicated experimental designs involving multiple treatments or levels of a treatment should utilize a Student's t-test. True False
False
High-density, short-oligonucleotide microarrays, such as those produced by Affymetrix, are created by physically spotting probes onto a glass slide using a robot with printing pins. True False
False
In recombination mapping, scientists determine if a SNP is more common in affected, unrelated individuals than expected by chance. True False
False
In single-base extension minisequencing, a single radioactively-labeled dideoxy nucleotide is added at the SNP position adjacent to the end of a genotyping primer. True False
False
Minisequencing methods for SNP detection rely on re-sequencing of entire cDNA clones or BAC clones. True False
False
One of the advantages of QTL mapping is the very high resolution that is achieved, allowing for precise identification of the specific gene(s) contributing to the trait under study. True False
False
Quantitative Trail Loci (QTLs) are physical locations on a chromosome that contribute to the mapping of Mendelian traits (i.e., where a single gene is causative of a single disease). These QTLs cannot be used to map genome variation that contributes to polygenic phenotypic variation. True False
False
Serial Analysis of Gene Expression (SAGE) is a method used to determine relative abundance of a fraction of all transcripts expressed in a population of cells or a tissue. True False
False
The majority of SNPs in the human genome show substantial geographic variation and are not shared among racial groups. True False
False
Typical microarray TIFF image files generate pixel intensities ranging from 0 to 65,536,000, thus allowing for accurate estimates of gene expression over five orders of magnitude after subtracting background noise. True False
False
Insertions or deletions less than 10,000 bp in size.
INDELS
How does the amount of variation within groups that are being compared influence the t-statistic?
More variation = higher t value
This QTL breeding strategy produces offspring that are homozygous at nearly all loci.
RIL design (recombinant inbred lines)
Compare and contrast DNA profiling and racial profiling, as they relate to law enforcement.
Racial profiling is based on outward phenotype based on skin tone which may or may not be a reliable form of profiling DNA profiling can be more reliable as it looks at genotype rather than phenotype and can tell much more about a person than just their skin color
In this method, a single fluorescently-labeled dideoxy nucleotide is added at the SNP position adjacent to the end of a genotyping primer.
SBE (single-base extension)
Briefly describe how access to comprehensive SNP profiles for individuals, families and/or groups could be used to help identify disease-associated alleles.
Search for SNPs that are linked with loci containing disease genes. SNPs that are closest to the disease locus are more likely to be linked. SNP profiles can be generated for all the members of a family where at least some relatives present with the disease. Map location of the disease locus becomes more refined as the number of SNPs increases and as the pedigree becomes larger. Once a disease locus is identified, candidate genes in the region can be individually tested.
Indicate your preferred choice of SNP genotyping method/platform and provide a justification. You are a scientist conducting a genome-wide association study consisting of 1,000 cases of the polygenic disorder schizophrenia compared with 1,000 control individuals who do not display any mental health issues.
Since the target genes are unknown, the researcher would want to examine a dense, genome-wide set of SNPs. Therefore, a high-throughput genotyping platform would be most appropriate such as the Illumina GoldenGate, Illumina infinium or Affymetrix GeneChip platforms. For large studies such as this (i.e., 2,000 patients and likely >100,000 SNPs) it is critical to use a platform that has a low cost per SNP in order to be cost effective.
The DNA sequence of any two humans is 99.6% identical. With so little genetic variation among us, how can population geneticists trace the ancestry of individuals?
The tiny variation due to SNPs is passed along through generations and offspring, allowing us to determine that if two people have similar SNPs, they are most likely related
A haplotype is a distinct combination of SNPs on a single chromosome at a locus. True False
True
A major advantage of methods such as Serial Analysis of Gene Expression (SAGE) and RNASeq is that there is no bias due to the choice of reference samples, choice of cDNA clones/probes, and/or hybridization artifacts. True False
True
A major confounding factor in the interpretation of association (linkage disequilibrium) mapping studies is the presence of genotype by environment interactions where different environments lead to the variable expression of genotypes. True False
True
A synonymous polymorphism creates a change in the triplet codon sequence that does not alter the amino acid encoded. True False
True
Although the number of single nucleotide polymorphisms (SNPs) in the human genome is greater than the number of copy number variants (CNVs), the latter involve a much greater number of nucleotides (i.e., CNVs constitute a larger proportion of the total polymorphic DNA content of the genome). True False
True
Among the SNP genotyping methods discussed, the Sequenom MassARRAY iPLEX GOLD system is unique in that it uses mass spectrometry to detect single-base extension products. True False
True
Both of Illumina's genotyping platforms (GoldenGate and Infinium) read the identity of a SNP based on hybridization of primers to microbeads containing molecular "bar codes". True False
True
Detection of SNPs on Variant Detector Arrays (a.k.a., SNP chips) relies on hybridization of sample DNA to 4 x 25-nucleotide probes that differ at a single base position in the middle of the probe (i.e., the SNP position). Identity of the SNP is determined based on the hybridization signal intensity, such that the highest signal intensity is obtained when there is a perfect match to the probe. True False
True
In QTL mapping, scientists seek to control the genetic variance and to minimize the environmental variance in the experimental design in order to maximize the proportion of phenotypic variance that is due to each QTL. True False
True
In a microarray experiment, technical replicates are repeated samples of the same biological material. For example, a single RNA preparation that is used in multiple, independent hybridizations to the same microarray platform. True False
True
In k-means clustering, the number of clusters is defined in advance, unlike hierarchical clustering. K-means clustering is most useful when the researcher has an a priori hypothesis about the number of clusters the genes should group into (e.g., according to biological criteria such as the number of life stages of an organism for a profile of gene expression during development). True False
True
Recombination mapping has been particularly effective for disease gene discovery in countries with large public health care systems that provide standardized care and centralized medical records, as well as detailed records of family ancestry (e.g., Iceland). True False
True
Restriction Fragment Length Polymorphisms (RFLPs) are the traditional molecular method for the identification of SNPs. This method is limited to the detection of SNPs that occur within restriction enzyme recognition sites. True False
True
The Illumina GoldenGate assay offers medium throughput SNP genotyping (up to 96,384 SNPs per sample) and relies on PCR amplification of short products using allele-specific primers. True False
True
The human genome effectively consists of tens of thousands of blocks of DNA, each up to 100 kb in size, that contain a small number of common haplotypes, each consisting of dozens or more SNPs in linkage disequilibrium. True False
True
The purpose of a log transformation on the base 2 scale is to produce symmetry of relative increases and decreases in expression about zero. True False
True
The two main reasons why intensity ratios are used in microarray analysis are that: 1) most researchers are primarily interested in changes in relative gene expression, not absolute expression; and 2) competitive hybridization involving two samples labeled with different fluorescent dyes eliminates substantial array-to-array variation from the data analysis. True False
True
To maximize the number of unique genes represented on a cDNA microarray, it is best to select sequence-verified cDNA clones from multiple libraries representing many tissues and treatment conditions. True False
True
Two commercial platforms used for the detection of SNPs in genome-wide association studies (GWAS) include the Affymetrix Gene Chips and Illumina Infinium genotyping assays. True False
True
Region of the human genome containing low diversity.
Y chromosome
Provide a scientific definition for the term "race".
a geographically isolated breeding population that shares certain characteristics in higher frequencies than other populations of that species
What is the generally accepted minimum sample size for a genome-wide association study (GWAS)? a. 1,000 individuals each (with and without the disease). b. 10 individuals each (with and without the disease). c. 100,000 individuals each (with and without the disease). d. 100 individuals each (with and without the disease). e. 10,000 individuals each (with and without the disease).
a. 1,000 individuals each (with and without the disease).
Which of the following statements about linkage disequilibrium (LD) is incorrect? a. LD is the random association of alleles. b. Recombination disrupts LD in proportion to the genetic distance between sites. c. LD is a powerful tool for the mapping of disease loci in large populations. d. SNPs separated by large distances (i.e., > 1 kilobase) can be in LD while SNPs located in between are in linkage equilibrium. e. None of the above.
a. LD is the random association of alleles.
In a volcano plot comparing statistical significance versus magnitude of effect, in which area of the plot do we find genes that are "boring", that is, no significance and no effect? a. Lower middle region. b. Upper middle region. c. Lower left and lower right regions. d. Upper left and upper right regions. e. None of the above.
a. Lower middle region.
Place these steps of an RNA-Seq experiment in the correct order. 1. Perform random primed cDNA synthesis to construct libraries. 2. Conduct next-generation sequencing to generate tens to hundreds of millions of short sequence reads. 3. Isolate polyA RNA and fractionate into ~200 base pieces. 4. Map sequence reads to a reference genome for the species and count the number of reads per gene to determine transcript abundance. a. 1,3,2,4 b. 3,1,2,4 c. 1,3,4,2 d. 1,2,3,4 e. 3,1,4,2
b. 3,1,2,4
Place these steps of a microarray experiment in the correct order. 1. Statistical analysis (e.g., normalize fluorescence data, t-tests, ANOVA) 2. Data mining (e.g., cluster analysis, gene ontology, validation) 3. Experimental design (e.g., biological question, select microarray platform, design hybridization series) 4. Technical performance (e.g., obtain samples, extract RNA, perform hybridizations) a. 1,2,3,4 b. 3,4,1,2 c. 2,4,3,1 d. 4,3,1,2 e. 3,4,2,1
b. 3,4,1,2
Which of the following factors will not influence the success of an association (linkage disequilibrium) mapping experiment? a. The heritability of the disease or trait. b. Biological function of the disease gene. c. The penetrance of the QTL effects. d. The number of genes that affect the disease or trait. e. Extent and uniformity of linkage disequilibrium between SNP markers and between markers and the disease locus.
b. Biological function of the disease gene.
Which of the following statements about color-coded representations (a.k.a., heat maps) of hierarchical clustering of gene expression is not correct? a. Raw fluorescence intensities are converted to false-color representations, such as red, green and black. b. Heat maps can only be used to compare gene expression in a single tissue, time point, or treatment condition. c. Some authors chose the combination of yellow and blue over red and green because individuals who are colorblind cannot distinguish between red and green. d. Heat maps allow for easy and rapid identification of co-regulated genes, especially when combined with a clustering method. e. Heat maps are displayed with genes on one axis and experimental treatments on the opposite axis.
b. Heat maps can only be used to compare gene expression in a single tissue, time point, or treatment condition.
Which of these breeding strategies is not an appropriate design for a QTL mapping study? a. Recombinant inbred lines. b. Random mating in natural populations. c. Backcross design. d. F2 design. e. None of the above.
b. Random mating in natural populations.
In Affymetrix microarray manufacturing, the ___________ prevents random binding of nucleotides to DNA probes on the array surface.
blocking compound
Pyrosequencing is a commonly used minisequencing method for SNP detection. Place these pyrosequencing reaction steps in the correct order: 1. Luciferase converts ATP to light. 2. DNA polyermase adds a nucleotide. 3. Apyrase degrades unincorporated nucleotides and ATP. 4. ATP sulfurylase converts pyrophosphate to ATP. a. 3,4,1,2 b. 2,4,3,1 c. 2,4,1,3 d. 3,4,2,1 e. 1,2,3,4
c. 2,4,1,3
What is the difference between non-coding, coding and regulatory SNPs?
coding SNPs are located in exons non-coding SNPs are located in 5' and 3' UTRs, introns, and intergenic regions regulatory SNPs are located in regulatory sequences that affect transcription, translation, splicing, and RNA stability
Which of the following strategies can be used to identify the candidate disease gene within a mapping interval? a. Sequence the mapped region to identify mutations in genes of affected individuals. b. Examine the predicted gene annotation (i.e., does this match the disease phenotype). c. Examine gene expression to determine if candidate genes are co-regulated with other genes associated with the disease. d. Examine gene expression to determine if candidate genes are expressed in affected tissues. e. All of the above
e. All of the above
What is the purpose of the international HapMap Project? a. Provide public access to the catalog of human genetic similarities and differences via the HapMap project web site. b. Document allele frequencies for millions of SNPs among individuals from multiple human populations. c. Use SNP and haplotype information to support discovery of disease-promoting alleles. d. Catalog all of the common haplotypes in the human genome. e. All of the above.
e. All of the above.
Which of the following statements describes an application of gene clustering of microarray data? a. Identification of genes that are co-regulated under the same treatment conditions. b. Graphical representation of data that allows for easier identification of gene expression patterns. c. Annotation of genes of unknown function. d. A and B e. All of the above.
e. All of the above.
Which of the following is not an appropriate application of microarray technology? a. Define genetic pathways. b. Molecular phenotyping. c. Dissection of regulatory mechanisms. d. Annotation of gene function. e. Detect protein-protein interactions
e. Detect protein-protein interactions
Explain how the Affymetrix Gene Chip can be used for SNP detection.
genomic DNA is fragmented, labeled, and hybridized to the array signal intensities that match genomic DNA are compared to those that differ at a single base (SNP)
__________ blocks consist of SNPs in linkage disequilibrium.
haplotype
Gene clustering method that starts with a specified number of clusters.
k-means clustering
When applied to cancer diagnosis, transcriptomics can be used to generate what for a tumor?
molecular phenotype
Based on your readings in PGS, describe three genetic and/or experimental factors that affect the success of an association mapping experiment.
population structure (are and of the people related?) environment/epigenetics genetics vs environment (how big is genetic component?) penetrance vs expressivity disease presentation (has the disease presented yet?)
Codon changes without a change in amino acid sequence.
synonymous SNP
The goal of microarray normalization is to correct for this type of error.
systematic errors
When the same RNA sample is repeatedly hybridized to different microarray slides it is called _________ .
technical replication
Provide one "pro" and one "con" of using each of the following methods when analyzing transcriptome data: Volcano plot Gene ontology Hierarchical clustering
volcano plot gene ontology heirarchical clustering
Describe two possible sources of DNA template for SNP genotyping.
whole genome DNA, cDNA libraries, BAC clones