MBIOS 301 Exam 4 Study Guide
functional protein microarrays
- consists of many different cellular proteins - used to probe the function of proteins note: requires HUGE investment of antibody development, and is mainly used by drug companies for drug discovery.
what is the goal to find the DNA sequence from the protein sequence?
- determine the possible codon sequences - used as a query sequence in DNA databases search - gene sequence is used to predict the amino acid sequence of the entire protein.
antibiotic microarray
-collection of antibodies that recognize short peptides -used to assess the level off protein expression
shotgun sequencing
-easier and faster method -does not require extensive physical map -clones from a genomic or chromosomal library isolated randomly and sequenced multiple times. -overlapping sequencing matched together using computers.
post-translational modification
-irreversible changes may be necessary to produce a functional protein ex: proteolytic processing, disulfide bonds, attachment of prosthetic groups, sugars and lipids -reversible changes that transiently affect protein function. ex: phosphorylation, methylation, and acetylation
alternative splicing
-most important alteration that occurs in eukaryotes -single pre-mRNA is spliced into more than one version -often cell specific or related to environment conditions
RNA editing
-much less common that alternative splicing -leads to changes in the coding sequence of mRNA
why are protein microarrays more challenging?
-proteins can more easily be damaged during microarray fabrication (sensitive) -synthesis and purification of proteins are more difficult.
mass spectrometry (determining amino acid sequence of a protein).
-separates peptides based on size and charge -peptide mass correlates with amino acid composition -amino acid composition can be searched against a database to determine gene identity
what are the two ways cotigs form?
1. assembled piece of DNA sequence built from short sequencing reads (most common). 2. A set of overlapping DNA clones that represent a region of chromosome. note: chromosomal DNA is partially digested and fragments are cloned to create genomic DNA library. CRITICAL FACTOR!
the proteins a cell produces is dependent on:
1. cell type 2. stage of development 3. environmental conditions
what are characteristics of prokaryotes?
1. circular genome 2. <5Mb 3. 500-5,000 genes 4. "compact" genomes - gene density ~1/kb exception: linear chromosomes
what are the three ways programs used in bioinformatics to analyze sequences?
1. locate specialized sequences (sequence element or motif). 2. locate predefined sequences and identify specific types of sequence organization or sequence elements. 3. locate a pattern of symbols
what are the three main phases of genomic analysis?
1. mapping-historically came first 2. sequencing-more popular 3. functional genomics
Goals of the Human Genome Project
1. obtain a genetic linkage map of the human genome 2. obtain a physical map of the human genome 3. obtain the DNA sequence of the entire human genome. 4. develop technology for the management of human genome information. 5. Analyze the genomes of other model organisms 6. develop programs focused on understanding and addressing the ethical, legal, and social implications from the results of the human genome project. 7. develop technological advances in genetic methodologies.
mass spectrometry steps:
1. protein purified from gel slice 2. protein fragmented by enzymes 3. protein fragments subjected to mass spectrometry 4. fragments are then fragmented again 5. mass spectrometry performed again
what are the steps of ChIP assays?
1. proteins cross linked do DNA 2. DNA isolated from cells and broken into small pieces 3. antibody used to precipitate the protein DNA complex. 4. DNA is purified and amplified with PCR 5. sequence of the amplified DNA can be identified by using it as a probe on microarray. Note: most useful for transcription factors
BLAST
Basic Local Alignment Search Tool: used to search databases to find alignment between newly sequenced genome and genes that have already been identified in the same or different species. -identify homologous genes that are evolutionarily related in other organisms -using these searches it can help assign putative function to a sequence of interest.
comparative genomics
Compares genomes of different organisms to answer questions about genetics and other aspects of biology -incorporates the study of gene and genomic evolution. -explores the relationship between organisms and the environment -studies differences and similarities between organisms and its contributions to phenotype, and life cycles.
molecular markers
DNA sequences that do not encode genes but can be mapped. Molecular markers also have to be POLYMORPHIC
Why is the proteome larger than the genome?
Due to alternative RNA splicing, post-translational modification, and RNA editing.
Next generation DNA sequencing
Ion torrent/Proton SMART sequencing
computer program
a defined series of operations that can analyze data in a desired way.
Ion torrent/Proton
addition of a single base at a time and monitors release of protons and is much cheaper and no need to separate reaction products.
transcriptomes
after transcriptional regulation -in theory show all transcribed genes - usually illumin methodology used, extract mRNA, make into cDNA and sequence all of it
repetive sequences
are also widespread in eukaryotic genomes (transposons) -the largest factor in the difference in genome sizes
single nucleotide polymorphisms (SNPs)
associated with disease conditions (sickle cell or cystic fibrosis).
basic research
characterization of genes and genomes
homologs
closely related genes (high DNA sequence similarity).
gene knockout collections
collections of knockout strains for every gene ex: NIH knockout mouse project
evolution
comparative genomics
Encyclopedia of DNA elements (ENCODE)
created with the aim of using experimental approaches and bioinformatics to identify and analyze functional elements that regulate expression of human genes.
RNAseq
current standard for demonstrating reproducibility=sequence a minimum of 3 biological replicates-libraries made from 3 identical RNA extractions. -also reveals gene expression in cells and tissues
Algorithm-based software programs
developed for creating DNA sequence alignments (lined up and compared).
agriculture
development of improved traits
SMART sequencing
different fluorochromes onto each singular base (all at once=much quicker). fluorochromes are attached to a phosphate that is cleaved off the ddNTPs and ends up a pyrophosphate product. Works in REAL TIME.
pattern recognition
does NOT rely on specialized sequence info. identifies a pattern of symbols than can occur within any group of symbol arrangements.
Illume sequencing
each base is fluorescently labeled, all have proprietary removable terminators. a. one base added at a time b. imaged to see which base is added c. terminator is removed d. repeat
proteome
entire collection of a species' proteins
proteomics
examines the functional roles of the proteins that a species can make.
e-value
expected value: statistical analysis comparing result with random chance - represents the number of times that a match or a better one would be expected to occur by random chance. note: the lower the e-value the more significant the match (0=identical match).
what can DNA Microarrays be used for?
finding genetic variations cell specific gene expression gene regulation tumor profiling microbal strain identification
structural genomics
focuses on sequencing genomes and analyzing them to identify genes and important sequences such as regulatory elements.
important note:
for many model organisms you can order specific gene "knockouts" and study the effects of losing those genes on phenotype.
bioinformatics gives insights into:
gene structure gene function relationships between genes and organisms protein function protein interactions predicting drug structure and function
what is the first step in computer analysis of genetic data?
generation of a computer data file (collection of information in a form suitable for storage and manipulation). files are usually annotated and stored in a database.
genomics
genes and other nucleic acid sequences
orthologs
genes at the same locus in different species inherited from a common ancestor.
transcriptomics
genes expressed in a given sample
important note:
genome sequences have opened new possibilities for analyzing transcriptomes and proteomes.
functional motifs
helix-turn-helix, leucine zipper, or zinc finger motifds
physical mapping information:
historically involved making libraries of chromosomal DNA, very time and labor extensive. ultimate goal: obtain a complete contig for each type of chromosome.
proteomics
how proteins interact
medicine
identification of genetic bases of disease
important note:
identification of open reading frames requires translation of all 6 reading frames.
Qualitatively
identifying which genes are expressed and which are not in a given sample.
homology
implies a common ancestry (similarity is due to homology).
gene density
in eukaryotes is very low compared to bacteria and varies even between chromosomes in a species -large variation in introns size and number between and within genomes
DNA microarrays (DNA chips)
labelled cDNA binding to a spot is detected by fluorescence (analysis of genes simultaneously). -Microarrays are being replaced by next generation DNA sequencing of cDNA.
important note:
mass spectrometry can also be used to identify protein covalent modifications.
similarity
means sequence similarity
quantitatively
measuring varying levels of expression of genes
important note:
new technologies in higher speeds, greater accuracy and reduced costs of sequencing-extensions of HGP.
blastn
nucleotide vs. nucleotide databases
Genbank
one of the most important genomic databases, Genbank annotates the following: genes, their regulatory sequences, their functions.
gene knockout
organism that has a specific gene that has been inactivated. -bacterial knockouts are created using transposons -mammalian knockouts created using homologous recombination. - plant knockouts are usually t-DNA insertions (transformed with a piece of DNA that inserts randomly into the genome).
contigs
overlapping fragments collectively from one continuous DNA molecule within a chromosome.
sequence recognition
program identifies specific sequences
blastp
protein vs protein database
tblastn
protein vs translated nucleotide
restriction enzymes
recognize specific (usually 6 bases long) sequences and cleave the DNA at those sequences.
paralogs
related genes within or between species (from duplication events in one genome).
physical mapping
relies on DNA cloning techniques - genes are mapped relative to each other (base pairs-measurement). - identifying overlapping clones across whole chromosome to create CONTIGS.
linkage mapping
relies on genetic crosses - genes are mapped relative to each other (map units-measurement).
cytogenetic mapping
relies on microscopy - genes are mapped relative to visible band locations.
copy number variants (CNVs)
segments of DNA that are duplicated or deleted.
pyrosequencing
sequencing by synthesis among others (always changing).
DNA microarrays
show the mRNA expression of thousands of genes simultaneously
polymorphoc
show variation between individuals in a population.
protein microarrays
similar concept to DNA microarrays and are used to study: - protein expression - protein function - pharmacology
bioinformatics
software to analyze the huge amounts of data goal: extract information from genetic sequences with a mathematical/computational approach.
what is the purpose of the BLAST searches?
starts with a sequence and then located homologous sequences in a large database.
open reading frames (ORFs)
stretches of nucleotides that when translated to protein generate a series of amino acids prior to a stop codon, suggestive to a protein-encoding gene.
Transcriptome analysis (global analysis of gene expression)
studies expression of genes qualitatively and quantitatively.
functional genomics
study of gene function based on the RNAs or possible proteins they encode as well as regulatory elements. - more typically now RNAseq is used to generate transcriptomes. - uses Nextgen sequencing to sequence cDNA made from specific tissue samples.
proteomics
study of the proteome (proteins produced) and how they interact with each other.
eukaryotic genomes
the basic features are similar in different species, genome size in eukaryotes is highly variable, but the number of actual genes is fairly consistent.
genome
the complete set of DNA in a single cell of an organism
functional genomics
the goal of functional genomics is to elucidate the roles of genetic sequences in a given species.
After denaturation...
the proteins are separated by molecular weight and identified by mass spectrometry.
genomics
the study of genomees
similarity score (identity value)
the sum of identical matches/total number of bases or amino acids aligned.
if proteins are extracted under native conditions
they are separated by their isoelectric point (no net charge)
what is the aim of gene annotation?
to identify and label important structural features of genes (known or unknown). Such as: - regulatory elements in promoters and enhancers - exons and introns (splice sites). - translation start and stop sites -polyadenylation site
blastx
translated nucleotide vs protein database
tblastx
translated nucleotide vs translated nucleotide
what is used to separate proteins?
two-dimensional gel electrophoresis
chromatin immunoprecipitation (ChIP) assays
used to determine if proteins can bind to a particular region of DNA.
genome/transcriptome annotation
using BLAST searches and motif databases to assign putative functions to all expressed genes (BLAST2GO).
most widely used method:
whole genome shotgun sequencing (decreases cost and increase of computer power).