Quiz 5
ENCODE project. The highest priority set (Tier 1) includes two widely studied immortalized cell lines:
1) H1ESC The H1 human embryonic stem cell line 2) GM12878 a-lymphoblastoid line (also being studied by the 1,000 Genomes Project) 3) K562 erythroleukemia cells
5 main classes of repetitive DNA:
1. Interspersed repeats 2. Processed pseudogenes 3. Simple sequence repeats 4. Segmental duplications 5. Blocks of tandem repeats
Sequenced the whole genomes from a total of 1092 individuals from a pool of 14 different populations selected by their ancient migratory history and genetic relationship to each other.
1000 Genomes Project
The first complete genome to be sequenced was:
A bacteriophage.
Centromere, which is located close to a telomere called..
Acrocentric centromere
Method of separating biochemical mixtures based on a highly specific interaction such as that between antigen and antibody, enzyme and substrate, or receptor and ligand called:
Affinity chromatography
De Novo Seq Template Includes:
Ancient DNA seq. Metagenomics
Applications of resequencing:
Assessment of genomic changes in disease-associated regions; Sequencing of all human exons in multiple individuals; Sequencing of large sets of genes associated with cancer
First viral genome sequence
Bacteriophage MS2 (3,569 bp)
Alignment programs:
Bowtie BWA
... is measured in base pairs or in picograms (pg) of DNA.
C-Value
One picogram of DNA corresponds to approximately 1 Gb.
C-Value
The haploid genome size of eukaryotes, called the C value, varies enormously.
C-Value paradox
The range in C values does not correlate well with the complexity of the organism called:
C-Value paradox
NGS is used to find transcription factor binding sites to the DNA
CHip-Seq
Type of NGS used to find Histone modification (it's locations) H3K4me monomethylation
CHip-Seq
Regulatory elements which predict the presence of genomic DNA features such as promoters, enhancers, silencers, Insulators called:
CRM cis-regulatory modules
The first multi-cell eukaryote genome sequenced.
Caenorhabditis elegans (Worm)
a region that remains unstained with many dyes, appears as a constriction
Centromere
Chromatin Immunoprecipitation abrv.
ChiP
NGS application to find TF binding sites:
ChiP seq
What is ChIP-sequencing?
Chip-Seq is a method used to analyze protein interactions with DNA
First human chromosome in 1999
Chromosome 22 (49 Mb)
Potein-protein interaction works by selecting an antibody that targets a known protein that is believed to be a member of a larger complex of proteins. By targeting this known member with an antibody it may become possible to pull the entire protein complex out of solution and thereby identify unknown members of the complex.
Co-immunoprecipitation (Co-IP)
This concept of pulling protein complexes out of solution. Also referred to as a "pull-down".
Co-immunoprecipitation (Co-IP)
The interactions of two purified proteins can be measured with techniques:
Co-immunoprecipitation (Co-IP). Affinity chromatography. Yeast two-hybrid system.
An example of a regulatory element: the dinucleotide cytosine followed by guanosine (CpG)
CpG islands
Transcript Assembly programs:
Cufflinks/Cuffmerge/Cuffcompare/Cuffdiff
NGS is used to study DNA methylation
DNA methylation, RRBS seq, RRB bisulfate
NGS is used to capture open chromatin
DNAse fair-seq
Method of sequencing involved in determining the DNA sequence of an organism as completely as possible is called:
De novo sequencing
Examples of Transcriptome:
Defining regulated mRNA transcripts; Identifying and quantifying mRNA in samples
One of the challenges of Protein networks::
Different categories of network or pathway maps.
Examples of Model organisms
E.coli (prokaryote), Saccharomyces cerevisiae (eukaryote)
ENCyclopedia of DNA Elements
ENCODE
Goal of this organization to discover and define the functional elements encoded in the human genome:
ENCODE - ENCyclopedia of DNA Elements
The aim of this project is to build a catalogue of functional elements of the human genome. Functional elements: genes, transcripts, transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns.
ENCODE - ENCyclopedia of DNA Elements
A short (50-1500 bp) region of DNA that can be bound with proteins to activate transcription of a gene or genes
Enhancer
High throughput sequencing can be used to assess the methylation status of a genome called:
Epigenomics
The modification of DNA or chromatin through DNA methylation and/or through the posttranslational modification of histones called:
Epigenomics
The study of the complete set of epigenetic modifications on the genetic material of a cell known as..
Epigenomics (epigenome)
Heritable changes other than those involving the 4DNA sequence per se.
Epigenomics or epigenetics
Single-celled or multicellular organisms that are distinguished from prokaryotes by the presence of a membrane-bound nucleus, and a cytoskeleton.
Euakaryotes
NGS application to seq Personalized genomes:
Exome-seq
Which of the following NGS technologies is used for personalized genome sequencing?
Exome-seq
Type of NGS used to sequence all exxons and not the whole genome
Exxon-seq
Epigenomics is sampling the genomes of many organisms from a particular environmental site- such as the human gut (organismal) or an ocean region (environmental).
False
Forward genetics asks "What is the phenotype of this mutant?"
False
Reverse genetics asks "What mutants have this particular phenotype?"
False
"What mutants have this particular phenotype?"
Forward genetics
"phenotype-driven" approach, altered phenotype first, then identify genes responsible is called
Forward genetics
Transcriptome Template Includes:
Full-length transcripts; SAGE - Serial Analysis of Gene Expression; Non-Coding RNA
The genome-wide study of the function of DNA.
Functional Genomics
Organism's full hereditary information.
Genotype
Difference between the genotype and the phenotype.
Genotype is an organism's full hereditary information. Phenotype is an organism's observable characteristics.
One of the challenges of Protein networks
Great variation in protein composition and behavior of different protein pathways.
First genome of a free-living organism, the bacterium
Haemophilus influenzae
NGS application to measure Chromosomal interactions:
Hi-C
Which of the following NGS technologies is used to study inter- and intra- chromosomal interactions?
Hi-C
NGS is used to capture chromosomal interactions
Hi-C (3C, 4C, 5C)
One of the challenges of Protein networks :
How to assess the accuracy of a protein network?
30x human genome sequenced by
Illumina
Dominant technology in high-throughput sequencing:
Illumina
Technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein.
Immunoprecipitation
Examples of ReSequencing:
Individual human; Assessment of genomic arrangements of disease associated regions; Seq mutations in cancer
Collection of manually drawn maps in six areas: metabolism, genetic information processing, environmental information processing, cellular processes, human diseases, drug development.
KEGG Kyoto Encyclopedia of Genes and Genomes
The database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies
KEGG Kyoto Encyclopedia of Genes and Genomes
Imaging the chromosomes during metaphase, when each chromosome is a pair of sister chromatids
Karyotype
Which of the following is not a DNA sequencing technology?
Mass spectrometry
Examples of Epigenetics:
Measuring methylation in cancer
Centromeres located near the middle of the chromosome:
Metacentric centromere
Sampling the genomes of many organisms from a particular environmental site such as the human gut (organismal), or an ocean region (environmental)
Metagenomics
Epigenetics Template Includes:
Methylation changes
Simple sequence repeats (1-6 bp long)
Microsatellites
Simple sequence repeats (12-500 bp long)
Minisatellites
Organisms that are studied intensively.
Model organisms
Shares almost all genes with humans. Close functional and structural relationship with human genome
Mus Musculus
Three principal genome browsers for eukaryotes:
NCBI offers Map Viewer. Ensembl offers browsers for dozens of genomes. UCSC offers genome and table browsers for dozens of organisms.
Organism's observable characteristics.
Phenotype
A region of DNA that initiates transcription of a particular gene.
Promoter
One of the challenges: There are no accepted benchmark data sets comparable to those available for fields such as sequence alignment and structural biology.
Protein networks
Genes that are not actively transcribed or translated:
Pseudogenes
These genes have a stop codon or frameshift mutation that interrupts an open reading frame => they do not encode a functional protein.
Pseudogenes
They commonly arise from retrotransposition, or following gene duplication and subsequent gene loss.
Pseudogenes
NGS application to find Gene expression:
RNA-Seq
What NGS method is used to measure gene expression or transcript expression?
RNA-seq
Which NGS technology is used to measure gene expression?
RNA-seq
NGS application to Measuring methylation:
RRBS-seq
Which of the following NGS technologies is used to measure DNA methylation?
RRBS-seq
Application when Genome permits the variation between individuals to be assessed.
Resequencing
This application can guide medical decisions, personalized medicine
Resequencing
Which of the following is not true for resequencing of a genome?
Resequencing cannot guide medical decisions.
"What is the phenotype of this mutant?" This question is raised by what type of genetics?
Reverse genetics
"gene-driven" approach, targeted deletion or disruption of gene
Reverse genetics
Programs for gene/protein network visualization
STRING db, GENEMANIA
The first single-cell eukaryote genome sequenced
Saccharomyces cerevisiae (Eukaryote)
The first single-cell eukaryote whose genome was sequenced is:
Saccharomyces cerevisiae (yeast).
Examples of De Novo seq:
Seq > 1000 influenza genomes; Extinct Neanderthal Genomes; Human gut
A contiguous length of nucleotide bases that is generated using a sequencing machine called..
Sequencing read
..structures characterized by tandem arrays of repetitive sequences found at the chromosome ends.
Telomeres
They provide stability to chromosomes by preventing the degradation of the chromosome end and by blocking the fusion of chromosome ends.
Telomeres
Ancient DNA projects allow the sequencing of historical samples. A special challenge is:
The DNA is often fragmented.
Consortium applied and compared several experimental and computational methods to annotate functional elements in a defined 1% (30Mb) of the human genome.
The Pilot phase of the ENCODE project
One of the challenges of Protein networks :
The choice of experimental organism is important.
Why is genome sequencing important?
To obtain a 'blueprint' - DNA directs all the instructions needed for cell development and function, and underlies almost every aspect of human health, both, in function and disfunction. To study gene expression in a specific tissue, organ or tumor. To study human variation. To study how humans relate to other organisms. To find correlations how genome information relates to development of cancer, susceptibility to certain diseases and drug metabolism (pharmacogenomics). Personalized Genomics/Medicine.
Splice junction mapper program:
TopHat
Microarray-based gene expression profiling to study RNA expression levels.
Transcriptome
Transposable elements, or retro-transposon "jumping genes"
Transposon
A knockout mouse is a genetically engineered mouse in which researchers have inactivated or "knocked out" an existing gene by replacing it or disrupting it with an artificial piece of DNA.
True
At least 30x coverage is required for human genome sequencing meaning that each nucleotide has been sequenced at least 30 times.
True
Co-IP is a powerful technique for analyzing protein-protein interactions.
True
Genome sequencing is important to study gene expression in a specific tissue/organ/tumor and human variation as well as to discover how genome information relates to development of cancer.T/F
True
Illumina is the dominant technology in high-throughput sequencing.
True
In Illumina multiplexing technology the following steps are followed: attach barcode- mix samples- sequence- identify and remove barcode.
True
Pseudogenes can be captured using RNA-seq.
True
The 1000 genomes project sequenced the whole genomes from a total of 1092 individuals from a pool of 14 different populations selected by their ancient migratory history and genetic relationship to each other.
True
The concept of pulling protein complexes out of solution is referred to as a "pull-down".
True
The draft sequence of the human genome was published in 2001 by 2 separate groups the International Human Genome Sequencing Consortium and Celera Genomics. T/F
True
CpG islands are regions commonly found in... (direction) regulatory regions near the transcription start sites of genes
Upstream
One of the challenges of Protein networks:
What data to use?
ReSequencing Template Includes:
Whole Genomes Genomic Regions Somatic Mutations
NGS is used to seq everything (whole genome)
Whole genome-seq
Aim of 1000 genomes project:
to identify the SNPs that are present at 1% or greater frequency