W: Exam 12 - 13Protein, 14recDNA and 15NCBIpg.1-9 Vids 107-120 - review 117-120!!!
restriction endonucleases
- recognition site usually longer than overhang (which is made due to sticky end producing enzyme) sticky ends stick together bc one 5' end is complementary to 3' end - blunt ends: cut sites are exactly opposite one another - ligate poorly bc no extra stickiness, just hit together right way, rare Sau3AI: cuts on outside of every: 5'-GATC-3' 3'-CTAG-5' that you can find, and bc it cuts on outside of rec site, sticky end is same as recognition site BAMHI - cut occurs in between 2 bases, sticky end is only 4 bases (look at end), even tho recognition seq is 6, creating sticky end sticky = how many bases extra from split site (not inc it, dont count the one(s) BEFORE)
beta protein domain - two immunoglobulin domains: BCAM
- recognize/bind to cells, involved in cell adhesion - many antiparallel beta strands - 1 small alpha helix
NM mode: preferred way to get NM DNA to fastA file format
- select whole seq - send to: save as FASTA file "create file" - set so if it sees .fasta open in textedit, can copy header, other material - can copy paste header, can just grab some material
Continue your analysis of the primer from the previous question, but this time see if this primer will form a primer dimer with the other primer in the pair, which is: ATGCTATGATTCTGCG What is the DeltaG of the hetero-dimer prediction that we should most worry about?
-3.61
Analyze this primer candidate on the IDT website with their OligoAnalyzer tool: AAACATCGAGAGATC What is the DeltaG of the self-dimer prediction that we should most worry about?
-4.62
cloning basics
goal: to get DNA fragments into plasmid get 1 type of bacteria with 1 type of plasmid with 1 type of DNA fragment (clone), need to: 1) create plasmid prep, just grow out E.coli containing plasmids and extract DNA 2) obtain DNA fragments from PCR OR digest genomic DNA to make a library 3) cut DNA fragment/plasmid with restriction digest/enzyme and put in gel, cut bands, purify DNA bands for purity 4) ligate with DNA ligase at restriction enzyme junction, create connected plasmid/dnafrag 5) transformation of E.coli competent cells (ready to receive DNA), shocked so they can take it up 6) plate transformed E.coli cells onto selective medium - only ones with plasmid grow 7) plasmid preps again of several colonies 8) test insertion by restriction digest/PCR
gene is in
green
Intrinsically Disordered Proteins......
have large sections of protein sequence which can form many alternate structures
tooltip
hovering over an mRNA/protein section, window with a lot of indo - mRNA, title, location, where you can download it through GENEID, BLAST
gene model rendering
how you organize the info, with/without introns, gene bar
in "gene" menu of gene bank record
if gene = RPL21P4, means ribosomal protein L21 and pseudogene 4, which are nonfunctional proteins, scattered across the genome, psuedos are often in introns of other genes, which is where this one landed - click gene id: should be hashed, going in opposite direction of BRCA1
within gene id click of promoter tab (gene bank record)
in eukaryotes: - shared regulatory region of BRCA1 gene and antisense transcript - talks about silence elements and all the promoter does (birectional or not), has an interactive picture of it - if bidirectional, promotes transcription of RNAs on either side!! - red elements = promoter elements - silencers = turn off promotors, negative regulation - enhancer = positive regulation (showing both positive and negative regulation of transcription) (proto-oncogene: normal gene but mutated/overexpressed, long non-coding RNA listed as antisense gene which acts to turn off gene)
For the gene above, staying in GeneID, where is this gene expressed the most? The least?
kidney, pancrease
lncRNA
long non-coding RNA (doenst code for anything, modulates something)
For this Arabidopsis gene, STP1, in which tissue is it expressed the highest? [Hint: look at the plant drawings in Araport]
mature leaf
right panel of NCBI main page
most popular tools used to analyze seq (BLAST/genome)
Look up the entry for human multimerin 1 (MMRN1). Choose the RefSeq transcripts option. What is the Locus ID for Transcript Variant 1 and what is the length of the transcript?
not: NM_007351, 5001 bases
Go to the Genome Data Viewer for BRCA1 and zoom in on the promoter region just upstream of the BRCA1 transcript. Make sure you look upstream and not downstream! Which of the following is a transcription factor protein that binds to one of the promoter elements listed in the promoter for this gene?
not: Ppr23
What kind of enzyme is coded for by the gene located at 42,124,979 on human Chromosome 17? [Hint: Hover over the gene and click GeneID in the Tooltip box to find this information]
not: serine kinase
A Greek key motif is a beta sheet with:
one of the loops connecting the beta strands not being a hairpin loop
hasihing around RNA (purple)
pseudogene
RNA is in
purple
coding regions is in
red
know how to fill in given 3 bases
reverse of opposite G-C, C-G, A-T, T-A 1) opposite 2) flip GAA reverse = CTT flip = TTC
tooltip over gene itself
right click on gene name click: view GeneID (pulls up full report)
in genome browser
see part of human genome containing the gene searched - tells you chromosome number, position with base pairs - can see gene itself and all mRNAS (NM) + proteins (NP) + all different variants recorded for gene - solid buttons are exons (just not zoomed in), introns between them - taller mid line behind exon = mini exons, 2 short lines between = introns - arrows = read from that direction (3' left) - can drag labels around from top to bottom of colored lines - clinical variation = shows disease area + mutations in dark blue - RNA seq data last in blue bars - many forms too
ApE - Restriction Enzyme Analysis
to find enzymes that cut twice: - make SURE linear is on top right of gene click to change - graphic map + U: shows annotations next to unique restriction enzyme sites, same spot = same name, shows seq if you hover over, usually blunt vs sticky end cut A^.... = 5', .....^A= 3' - show all enzymes: enzyme selector, can pick restriction enzyme, shows # of cuts - seq, compatible (overhangs), unique + six cutter, (no >) and select, have six base recognition site -
NM mode: version vs variant
variant(#) = variations of a seq, diff each time - BIOLOGICAL, lack portions of UTR, coding region, missing codons version(#) = iterations of same seq
how to examine a gene bank record
1) brca1 gene search, hit genome browser 2) right click on gene title and click genbank view in pop up menu 3) on genebank page, click pull down menu for format *chromosome name in title - NC mode is whole chromosome, looking only at position slice # with dots in between which is XXX bp (on top) *can see journals cited/references, authors 4) features area: source, gene, mRNA, codons, translated form * mRNA annotations/complement(join(#....#....#....) skips many INTRONS (,), 2# ranges are 1 exon! * each mRNA variant has "transcript variant #1-6) * CDS: lists of exons with .... (codons for mRNA), 1 of mRNA is 5'UTR region, codons can start at a later position and end at an earlier position (3'UTR) 5) some translated isoforms (in CDS) may be longer or shorter than others due to exon skipping (ribosomal protein usually in introns, psuedogene)
find gene info shortcut
1) name of gene in search on mainpage (space) 2) name of organism its found in more specific so less number results
NCBI Navigation 1)
1) type name of gene in main page search bar, will see results by database menu, where number by gene is HITS in blue circle, for example, does not mean 19,000 genes, just how many gene entries have been deposited (variants) * # by protein: protein excession * # pubmed central: # of papers published
Open up the file "EngD cellulase" in ApE. What is the length of the major ORF?
1548
There is a unique HindIII site located in the DNA of "EngD cellulase". At which base does this HindIII site start?
1704
RefSeq listings
19,000 gene entries based on 1 gene, the 1 RefSeq represents them all
For human gene TMEM106A, on which exon is found the translation start site? [There are 4 variant mRNAs, and one is an oddball, so give me the answer that works for 3 out of the 4 variant mRNAs]
3
In "EngD cellulase", how many times does the restriction enzyme NdeI cut this sequence?
3
Look up the entry for ERBB2 and go to the RefSeqGene version. At the GenBank page, click the Graphics hyperlink. How many exons does this gene have? [Hover over black exons to get the numbers]
32
BAMH1 and Sau3AI
5' always LOT, ROB even if strand snipped BAMHI: 5' sticky ends, end is on left in upper, right in bottom, ALWAYS, so if bases/overhang are sticking out from top left or bottom right: 5' Pst1: 3' sticky ends, end is on right in upper, left in bottom, ALWAYS, so if bases/overhang are sticking out from top right or bottom left: 3' blunt: cut is in same site in both top and bottom of rec seq, not offset
What is the molecular weight of the protein coded by the major ORF in the file "EngD cellulase"? [hint: try translating it]
56.0 kDa
NCBI find a gene
? button (help), legends, "graphical view legend"
to navigate genome
? on upper right hand corner, help, "navigating the seq viewer" under general section - pan arrows: go L/R (shows additional gene neighbors) can be dragged - zoom slider - A&T symbol: zoom to the sequence level (ATTG) - zoom to range - undo: go back
Here is the first three bases of the recognition site of a standard 6-base restriction enzyme: GAT. What are the next three bases on this strand?
ATC
What is one way to prevent getting your insert DNA put in backwards in the insertion site of the plasmid?
Cut the insert with two different restriction enzymes and clone into a multiple cloning site
On the NCBI Main Page, look up the Protein entry for mannan-binding lectin serine protease 1 isoform 2 precursor [Homo sapiens]; verify the Accession number: NP_624302. Go to the GenPept page. There is listed an article with the lead author Oroszlan in the Journal of Immunology. Look up the article with the PubMed link. What are the first words of the first sentence in the abstract?
Factor D (FD) is an essential element
NM vs NP
NM = mRNA = purple NP = protein = green
ApE - ORF Mapping
ORF map: 1,2,3 -1,-2,-3 (top 3 are the only valid ones for a + RNA virus) jump from 3 to 1 if you have spaces see vertical half line: start Met full line: stop cluster of short ones: mathematical, short, random, biological: starts space stop
genetics of cloning
PUC plasmids have: - ori: origin of replication - genetic segment recognized and replicated by DNA polymerase of ecoli (all circular DNA w/ori will be rep) - AMP: ampicillin resistance gene, ensures only ECOLI containing plasmid (w/DNA and AMP resistance gene) will grow on ampicillin bc TRANSFORMATION is inefficient - multiple cloning sites (MCS) prevent backwards insertions (gene/DNA of interest going into cloning site could flip 180*), clone fragment w/ Bam + Pstl vs Bam and Bam to cut plasmid and insert - xgal: also need to ensure all PLASMID that have AMP-res gene have gene/DNA frag insert bc LIGATION is inefficient, beta galactosidase in DNA will split xgal in gel into blue xgal colored thing, * intact = blue bc B-galactosidase is active * ruptured gene bc of DNA insert = no B=gal produced, no color made, GOOD - you WANT grown (AMP) but also white (X-gal) colonies
curators of protein classification
Pfam - main curator SCOP - structural classification of proteins, manual curation of protein classification CATH - class/architecture/topology/homology
reading a genebank record through NM mode (MRNA!!!)
REFERENCE SEQUENCE - seq version which replaces previous mRNAs in genome browser of searched gene - look at genes, NCBI menu (not ensemble) - select the PURPLE mRNA - can see length of mRNA (select NM option in genbank view), accession number, version number locus has NM: can click pubmed and full text seq verio - diff variants: look down to comment information and find transcript variant information there - all codons merged together! to give codons for entire gene, CDS says 195-5645, so everything up to 195 is the untranslated region - in origin section of features: only go up to length of JUST the mRNA (written in DNA language) - can click graphics page: shows you JUST where EXONS are
Open up the file "EngD cellulase" (from Canvas/Files/Sequences) in ApE. In what reading frame is the major (biggest) ORF? Approximately which base does this ORF start at?
Reading Frame 1; 240
In the basic cloning procedure, what is the order of steps?
Restriction digest, ligation, transformation, plating, plasmid preps
can you ligate BAMHI end to a Sau3AI end
Sau31 (short blunt) bases have to be in BAMHI (sticky) bases S: GATC B: GATCA = yes hybrid: GATCA(next) S: GATCC B: GATCA = no B: 5' GATCC and S; 5' GATC can ligate bc same bases in both, create a hybrid which would be: GATCC(nextSbase) if B: 5' GATCCA and S; 5' GATCA could not cut with BamHI anymore, bc we gained A could cut with S: GATC (bc sticky end is recognition site, and that is maintained in both)
Below are two restriction sites. The top one is "A" and the bottom is "B". You digest them and then ligate one to the other to form a hybrid restriction site. Which enzyme is always able to digest this hybrid restriction site? A|TGCAT TACGT|A |TGCA ACGT|
The enzyme which cuts B
ApE 1
a plasmid editor (free app) - tells you how many bases of RNA in "sequence" - changing number = what base you're on - tells you start, end, and length of base in base pairs - ORF menu + find next: shows you next ORF beginning with ATG - can name/edit features (ORF), pick color for it, scroll to colored spot: tells you name of gene ur in - put cursor 30 bases before end or at end and hit find next again, repeat - can see if sections are shared between ORF (overlapping) - new feature: name: replicase: color change
in RNaSeq menu
after looking up gene and clicking RNAseq transcripts, being on gene bank, click graphics
ApE - translating
always have to select something before you use translate - pick b/t 1 letter and 3 letter code - shows DNA complete translation - shows MOLECULAR WEIGHT (how many AA, position in genome)
middle of NCBI main page
analyze holds all tools needed for data analysis (analyze)
boutique: for every gene, right click
and click picture link (annexin is ca2+ binding protein localized in the cytoplasm) pubmed ids and journals - genomics info - protein - where produced (time), fucntion of protein, family, subcellular location and tissue-specificty - gene expression heatmap, becomes expressed after germination, in root not in shoot
A zinc finger comprises zinc... and what else?
beta sheet and alpha helix
The "Alpha + Beta" protein domain consists of:
beta strands and alpha helices strung separately from each other and beta strands are antiparallel
repeats are in
blue
format for chromosome, location, and spread
chr1: 1,500,000 / 200 200 = how wide of a spread of bases to look at when there
feature ruler
click a particular RNA or protein - tells you how many bases found in this exon 1-42, 43-X, Y-Z - protein = minus 5'UTR, 3'UTR missing (ends early)
RefSeq transcripts
click any one, find gene bank of mRNA you pick
copy reverse complement
command only found in ape
learn section of NCBI main page
conferences, recorded tutorials, manuals, documents, and FAQs
misc_features
could be promoter regions, miscellaneous recombination features
in "regulatory" menu of gene bank record
could see silencers: responsible for turning gene off could see enhancers: misplaced promoter could see promoter itself here: click gene id
ApE - editing
create PCR primer candidate, select bases that stick to a selected range - highlight + right-click BEGINNING BASES of top strand of an ORF for upstream primer (copy: paste in word to load into IDT as upstream primer) - highlight + right click same # LAST BASES of ORF, dont copy (COPY REVERSE COMPLEMENT for downstream primer) (length is shown up here, not pop up, new features is tab on top)
left panel of NCBI main page
different topics, can click to see available tools for a topic - proteins - literature
RefSeq transcripts + proteins caused by - click RefSeqGene to get to gene menu, same as genome browser
exon skipping which usually creates more transcript and protein variants than actual single gene variants
all find a gene info
expression info, bibliography, phenotypes, associated conditions, CNV, relationships with HIV, chemical pathways, interactions with other gene/proteins,, general gene info, gene ontology (types of binding), what types of break repair involves, cell structure connections
The beta-propeller structure consists of beta sheets with:
four consecutive beta strands, perpendicular to the radius of the propeller, in each "blade" so that each blade is a separate unit of sequence
if numbers are going down on scale, means
genome is being read backwards, so gene should be read in opposing direction - gene is on negative strand instead of positive strand of genome
vid
(ORF MAP, interpret them, 1,2,3-1,-2,-3, reading frame is 3 bc half lines indicate start sites and full lines indicate stop sites, how to translate 1 letter and 2 letter code - get the full peptide seq from DNA, number of amino acids, can copy reverse direction) wants to know you open IDT and APE
protein motif: helix-loop-helix
(also called leucine zipper) - 1 alpha helix - loop (long, lazy turn) - 1 alpha helix found in TF, 2 alpha helices bind to specific seq of DNA, other portion meant to stabilize - leucines stick together bc nonpolar, stability
dark green vs light green
(going R to L) exon: (tiny lines +) whole thing with dark green/white arrows translated: translational start site is right BEFORE 1st sark-green/white arrow box), going down in exons UTR: light green (usually big on 3' end - NOT) if gene variants underneath have light green boxes in diff places = exon skipping/splicing (then translational start site) arrows: always 5' 3', just r to l or l to r
intrinsically disordered proteins (IDP/IUP)
- 1 alpha helix in center (stays same) - JUST 2 ends with tracing over time (structures that were collected over time, structure of protein changes over time besides alpha)
protein motif: greek key
- 1 antiparallel beta sheet (2 come in interior, third comes all the way around, final one out) - said wont be tested on
protein motif: zinc finger
- 1 beta sheet w/2 antiparallel strands in it - 1 small alpha helix - 2 histine + cysteine ions which cover a stabilizing zinc molecule - motif found in TFs
protein motif: beta hairpin
- 1 parallel beta strand - 1 antiparallel beta strand - with tight hairpin (not loop or turn) holding them together
protein motif: helix-turn-helix
- 2 alpha helices separated by turns
small proteins
- 3 DS bonds - STD (sequential tri-disulfide bonds) making it extremely stable - cysteine-stabilized - often used in venoms or microbial peptides (need to be stable so they can stored for ready use)
protein motif: antiparallel beta sheet
- 3 antiparallel beta sheets (bc arrows go in opposite directions) (arrows point from n to c = polarity of protein)
beta protein domain - SH3 domain:
- 5 strand beta barrel
alpha/beta protein domains
- alternating parallel beta strands and alpha helices - strung together - TIM barrel from Triose Phopshate IsoMerase alpha-beta-alpha-beta-beta-alpha
beta protein domain - ten immunoglobulin domains: BCAM
- antigen-binding domain of antibody - alpha helices
beta protein domain - beta barrel:
- antiparallel beta sheet in 3D - porin allowing ions to go through
alpha + beta protein domains
- beta antiparallel strands, independent of alpha helices - alpha helices at beginning and end - all strung separately from eachother alpha-beta-beta-beta-alpha
alpha protein domains:
- bromodomain: 4 alpha helices - globin-fold domain: 8 alpha helices surrounding a heme group with oxygen in middle
select range of something
- click and drag across K bar, then right click for a menu of possible options
for NCBI
- dont use safari, use chrome - dont use ipad (mac or pc only)
what NEB has***
- double digest finder (space needed to cut 2 restriction enzymes) - enzyme finder: how compatible buffers would be (find by name, rec site, overhang seq) - biocalculator for ligation (mass to moles) - biffers, incubation - video tutorials - stuff you can order, guidelines, protocol
to go to seq level
- drag in menu so dark box of interest is between two gray middle arrows - select atm zoom to feature button - undo blue arrow in upper left to restore unzoomed view whenever
BRCA1 summary info
- gene encoding protein phosphorylated in nucleus, acts as tumor-suppressor, part of a large complex - BASC: BRCA-1 associated genome surveillance complex - associated w/RNA polymerase 2 and transcription, repair double stranded breaks and NHEJ - alternative splicing alters functions (genes can share promotor machinery)
genomes and maps on left panel - genome - human genome
- how to find gene if you know the genome/species of an organism it belongs to and location - 1st: pick a chromosome (zoom out) - type location, gene name, or phenotype in search assembly, following format * chr#:(position#) / (spread#)
integrated DNA technologies
- oligoanalyzer tool - need an IDT account - type a primer seq and hit self-dimer for BOTH primers (bc 2 for PCR) - also test both upstream and downstream primer to eachother (by inserting secondary seq) and hit hetero-dimer *** 2 rules for seeing if primer is rejected (primer-dimer, primer that is complementary to itself, and can do PCR on itself) - 3' most base is base-paired (paired to anything else) - delta G is LESS (more neg) than -3, if so, means more stable if meets both, primer is stable enough to cause us problem *bind to itself*, have to reject it in PCR - solution: adding a base to the 3' end to extend primer (for both self and dimer) - $15- for 1kb DNA
protein structure repositories
- protein data bank (PDB), swissprot, PIR, PRF - molecule modeling database (MMDB) provides access to PDB in NCBI
