MCB 110
Catabolite control of the lac operon
(a) Under conditions of high glucose, a glucose breakdown product inhibits the enzyme adenylate cyclase, preventing the conversion of ATP into cAMP (cyclic adenosine monophosphate). Important because cAMP is an activator for another protein. (b) In low glucose, As E. coli becomes starved for glucose, there is no breakdown product, and therefore adenylate cyclase is active and cAMP is formed. (c) When cAMP (a hunger signal) is present, it acts as an allosteric effector, complexing with the CAP (catabolite activator protein) dimer. (d) The cAMP-CAP complex (not CAP alone) acts as an activator of lac operon transcription by binding to a region within the lac promoter. Needs cAMP to form structure that will then bind tightly to a region of the promotor that is now going to tweak RNA polymerase to get moving. CAP sites are also present in other promoters. cAMP-CAP is a global catabolite gene activator.
Forward and Reverse genetics. What does it all do?
"forward genetics" What causes this phenotype? Making mutations in the genome and seeing what happens. "reverse genetics" What does this gene do?
How many types of RNAs does a single polymerase make?
A single RNA polymerase makes multiple types of RNAs (rRNA, tRNA and mRNA) in prokaryotes. Different story in eukaryotes.
Rho-dependent transcription termination
Accounts for ~50% of all E. coli terminations. Rho-loading site (60-100 nt sequence relatively free of secondary structure and rich in C's) Rho: forms RNA-dependent hexameric helicase/ATPase, translocates along RNA in 5'-to-3' direction (using energy from ATP hydrolysis), and unwinds the RNA-DNA hybrid within the transcription bubble to terminate transcription. •Rho is a AAA+ helicase •Loads onto rut site in RNA, starts moving toward polymerase •When Rho reached RNA exit tunnel, it pulls on RNA and disrupts RNA/DNA duplex
What do histone acetylases use as a cofactor?
Acetyl-CoA
What do histone methylases use as a cofactor?
Adenosylmethionine (AdoMet)
Polyadenylation
- 5′ capping and 3′ polyadenylation are linked with each other and with other RNA polymerization processes - Capping is needed to allow RNA Pol II to continue transcription. Polyadenylation is needed for efficient transcription termination - The C-terminal domain (CTD) of the largest subunit of RNA polymerase II, RPB1, is responsible for mediating mRNA processing 1. 5' capping enzyme complex is recruited by partially phosphorylated CTD 2. additional phosphorylation of the CTD upon transcription elongation allows for recruitment of splicing machinery 3. transcription past 3' end processing signals leads to recruitment of 3' end processing complex 4. spliced mRNA is cleaved and polyadenylated
Ribonuclease P
- 5′ trimming of tRNAs is done by the endonuclease ribonuclease P - RNase P enzymes have an RNA component as well as proteins - The bacterial RNA component alone can cut RNA, and the protein part enhances the activity and may determine substrate range - The eukaryotic, archaeal and mitochondrial RNase P RNA component cannot cut RNA alone, but is essential for function - Introns are present in some tRNAs and rRNAs. tRNA splicing is catalyzed by protein factors - Some rRNA introns can catalyze their own removal - they are self-splicing. The precursor RNAs therefore act as ribozymes themselves
Other modifications
- Additional modifications of mRNA (RNA editing) further enhances the range of molecules that can be produced. - Specific nucleotides can be modified, inserted or deleted - Insertions or deletions can be one or two nucleotides, or can be more extensive - Example: The Trypanosoma brucei NADH dehydrogenase 7 gene undergoes extensive editing - Black nucleotides are encoded in mitochondrial DNA - Blue asterisks show where uridines have been deleted - Red shows uridines that have been inserted - For such modified genes, it is difficult to predict the protein sequence from the DNA sequence alone
CTD sequentially recruits the different processing complexes:
- CTD becomes partially phosphorylated on transcription initiation and recruits capping enzyme - Elongation leads to more phosphorylation of CTD, which recruits splicing machinery - This also leads to recruitment of the cleavage and polyadenylation complex
Why is regulation of gene expression important?
- Cellular function & identity dictated by the set of macromolecules inside the cell. - Different macromolecules accumulate to different levels under different growth conditions and in different cell types and their expression must be properly controlled. - Diseases can be caused by aberrant control of gene expression: too much or too little of a protein; wrong time and wrong place for a protein.
Benefits of RNA processing
- Contribution to regulation of gene activity. - Diversity - many different RNAs can be produced from one gene via alternative splicing (by removal of different combinations of introns; figure). Can get many gene products from 1 mRNA. - Quality control - defective mRNAs are detected and degraded. Pathways that will recognize an RNA that has been transcribed with the wrong nucleotide.
Deamination
- Cytidine to uridine deamination is also observed in the mRNA that makes human apoliprotein B - The deamination of a particular cytidine results in formation of a stop codon, resulting in a shorter version of the protein (APOB48) - this occurs in the small intestine, and is needed for lipid absorption from food - The long version of the protein (APOB100) is made in the liver, and is involved in cholesterol transport
There are two main common RNA edits:
- Deamination of adenosine to inosine (the most common edit in more complex eukaryotes). Inosine is interpreted as guanosine, so changes in the coding region can change the final protein sequence - Deamination of cytidine to uridine, which has been found to date mainly, but not exclusively, in plant mitochondrial and chloroplast mRNAs -Nucleotide conversions like these are widespread throughout life - Uridine insertions and deletions have only been found so far in mitochondrial genes of single-celled eukaryotes like trypanosomes - Cytidine insertions have only been observed in slime molds
Positive and negative regulation of the lac operon
- Glucose present, low cAMP, no lactose. Repressor will be bound. CAP will not bind. No activity. Polymerase isn't engaged. No transcription. Off state - Glucose and lactose present, cAMP low. Even though we have now incapacitated the repressor, you've taken away break, but haven't put accelerator on. Have very little transcription. - No glucose present, lactose present, cAMP high. May have used up all the glucose. Switches on. Induces lactose operon. To do that it already has the repressor gone, but now can produce cAMP, which will then activate CAP protein and now get high levels of transcription.
Polyadenylation at 3' end
- Polyadenylation at the 3′ end of eukaryotic mRNAs starts with an initial cleavage - This cleavage usually occurs after a CA that lies between a conserved AAUAAA hexamer and a U or GU-rich region - After cleavage, ~200 adenosines are added by poly(A) polymerase. It is released and that release leaves a little short tail of RNA coming out of polymerase, which then leads to termination in some sloppy manner as polymerase moves and falls off. - A larger protein complex is required for polyadenylation than for 5′ capping, probably because it is more complex to recognize the different polyadenylation sites in different mRNAs
RNPs
- RNA modifications involve a large number of molecular complexes and are often specifically located in the cell (e.g. mRNAs are not exported from the eukaryotic nucleus until properly modified). Recognized by proteins and these proteins are specifically located in the cell. - Many of the RNA processing complexes contain both protein and RNA (ribonucleoproteins; RNPs. - The RNA in RNPs can be structural, but can also have catalytic activities (ribozymes- RNA enzymes that function just as proteins in catalytic active sites.). RNA can work to put proteins in the right conformation or can be right at the active site. Catalytic RNA are in the ribosome. - Some RNPs contain guide RNAs that base pair with pre-RNAs and guide the RNP to the correct place for processing
RNA processing
- RNAs are synthesized from DNA templates, but the molecules produced are often not functional. These are precursor RNAs (pre-RNAs), and need to be modified to make the mature, functional RNA Examples: - Cleavage - Splicing -5' capping -polyadenylation -Editting: base insertion, base deletion, base modification
Endonuclease and Exonuclease
- Ribonucleases cleave the RNAs into smaller parts and process them so they can work appropriately - Exonucleases successively remove nucleotides from the end of a transcript, most often in the 3′ to 5′ direction but sometimes 5′ to 3′ - Exonucleases are not usually sequence specific - Endonucleases cleave the DNA within the strand. Some are specific for double-stranded RNA, some for single-stranded - RNase III and RNase P are examples of endonucleases
poly(A) tail
- The 3′ end of most eukaryotic mRNAs have about 200 adenosines added - this is a polyadenosine, or poly(A) tail - mRNAs have polyadenylation sites, where pre-mRNAs are cleaved and the poly(A) tail added - Multiple polyadenylation sites are found in some mRNAs, such as cyclin D1 mRNA above, and these can participate in regulation - Polyadenylation at the distal site retains multiple regulatory sequences - Polyadenylation at the proximal site (more towards 5' end) eliminates the regulatory sequences
How is the 5' cap added?
- The 5′ cap is added in three stages, shortly after the mRNA emerges from RNA Pol II (about 20-30 nucleotides) - First, an RNA 5′ triphosphate catalyzes removal of a phosphate from the 5′ end - Second, a guanyl transferase attaches a guanosine monophosphate (GMP) to the end in a 5′ -5′ triphosphate linkage - Third, the guanine is methylated by a guanine-7-methyl transferase - In yeast, the three steps are done by different enzymes - The first two reactions are done by a single enzyme in C. elegans and mammals
Adenine riboswitch
- The B. subtilis adenine riboswitch regulates adenine synthesis and transport. Gene expression depends on whether a terminator or anti-terminator forms. Depends on whether aptamer can bind adenine or not. - In low adenine, the RNA structure is as in - regions 2 and 3 form an anti-terminator and transcription proceeds - High adenine allows to form - regions 3 and 4 form a hairpin and creates a terminator and transcription stops.
Trypanosome mRNA
- The mechanism for the extensive editing in trypanosome mitochondrial mRNA is understood, but not the reasons for the editing - 20-50 nucleotide guide RNAs bind to mRNAs and define insertion and deletion locations - Guide RNAs first base pair with part of the mRNA before the process beings. An endonuclease recognizes and cuts the mRNA at a mismatch, and the guide RNA acts as template for addition or deletion of uridines - Addition is catalyzed by a 3′ terminal uridylyl transferase, deletion by a 3′-5′ exonuclease and fragments are then ligated
CCA sequence
- The ways RNA can be an allosteric effector -The 3′ ends of tRNAs have a conserved CCA sequence - this is the attachment site for the amino acid. - The CCA is sometimes encoded by the tRNA gene but is most often added later by a CCA adding enzyme. Allows for regulation of tRNA abundance. Determines the rate of amount of a particular tRNA is there. This 3' hydroxyl on end of tRNA is very exposed and susceptible to exonuclease, so want to have enough tRNA around. Have an enzyme that monitors pool of tRNA. - Addition of CCA is catalyzed by the nucleotide binding pocket - How does it know what amino acid to add next? The pocket has three different conformations. When the tRNA comes in with nothing on it, the enzyme assumes a shape that allows it to bind the first C and adds the first C. Now it is in another conformation that allows it to put on a C and then another conformation that allows it to put an A.
Where does it occur?
- Transcription and processing of eukaryotic mRNA occurs in the nucleus - Translation occurs in the cytoplasm, so mature mRNAs must be exported from the nucleus - The protein factors needed for transport are loaded onto the mRNA during transcription, but polyadenylation is needed before the RNA-protein complex can be released from the transcription complex - Some mRNAs are located in specific regions of the cytoplasm - this requires "localization elements", usually found at the 3′ end. They also regulate translation
RNA can get modified
- tRNA and rRNA nucleotides are often chemically modified after transcription - Modifications can be small, like methylation or larger, like addition of threonine - Many modifications have been studied and they are usually essential for growth and survival - More than 80 modifications have been observed in tRNAs, and they increase the repertoire of shapes, structures, and stability of tRNA molecules - The most common rRNA modifications are ribose 2′-O-methylation and pseudouridylation - Many rRNA modifications are found in regions important for ribosome function
Pathways are complicated
- tRNA and rRNA processing can involve many steps and multiple pathways - The reasons why multiple pathways exist are not known -Pathways produce different ribosomes. The ribosomes have proteins that interact with them and guide them to different mRNA. Having these multiple pathways allow you to get ribosomes that have ability to bind other proteins that allow for translation in different frequencies in different mRNAs - Processing steps are coordinated with one another - Excision of bacterial rRNAs is performed by RNase III, which recognizes double-stranded RNAs. - RNase III bends stem structures in pre-RNAs and cleaves the dsRNAs
tRNA and rRNA
- tRNA and rRNA transcripts are made as long precursors that must be processed. Allows for coordinate expression of different rRNA subunits. By having them all in one long transcript, and processed in appropriate way you can get mature RNA that can work in the ribosome. - An E. coli precursor encodes three rRNAs and several tRNAs - The S. cerevisiae precursor encodes three rRNAs - Encoding several RNAs in one precursor ensures that similar amounts of each RNA are made
AREs
-AU-rich elements (AREs) are important for mRNA stability -AREs are found in the 3′ UTRs of some mRNAs - ARE presence directs poly(A) removal and increases mRNA instability - c-fos in vertebrates encodes an ARE-containing short-lived transcription factor that promotes cellular growth - Some tumor-causing viruses express v-fos, which does not have the ARE - The v-fos transcript is not degraded properly, and this leads to excessive cellular growth
5' Capping
-Both ends of eukaryotic RNAs are modified during transcription - The end modifications protect the mRNAs from nuclease degradation and help with protein interactions - The 5′ ends are capped with a 7-methylguanine nucleotide via a 5′ -5′ triphosphate linkage (this is the 5′ cap). This guanine is then methylated at N7 - The 5′ cap is needed for efficient elongation and termination of the transcript, for mRNA processing and export from the nucleus, and for directing translation - In more complex eukaryotes, the 2′ O of the second and sometimes third base are methylated
Defective RNAs
-Defective endogenous RNAs are removed by specific decay mechanisms -mRNAs can have additional or absent stop codons, or other problematic coding sequences that stall translation -In eukaryotes, ribosomes on defective RNAs are marked by interaction with proteins such as the EJC, which recruit RNAses to degrade the RNA
tmRNA
-In bacteria, stalled ribosomes are recognised by a complex containing tmRNA (an RNA that acts both as a tRNA and as an mRNA). Broken mRNA -tmRNA binding allows the ribosome to finish translation using the tmRNA as template -tmRNA also aids recruitment of RNases
What is needed for cellular processes to function?
-Precise RNA-protein interactions are needed for cellular processes to function -The proteins have specific RNA-binding motifs - these are different from DNA-binding motifs because of the structural differences between RNA and DNA -The most common is the RNA-recognition motif (RRM), also known as the RNA-binding domain (RBD) or ribonucleoprotein (RNP) domain -This has alpha helices and four beta sheets in a sandwich - this is very adaptable and can bind RNAs that have different structures -For example, RRM in hamster nucleolin and human PABP bind very differently structured RNAs
Complexity of RNA editing
-RNA editing was observed by Seeburg and colleagues -mRNA is often examined by looking at cDNA clones - these are copies of mature mRNA made in vitro -The researchers noticed some codons in glutamate channel genes did not match in the genomic DNA and in the cDNA - arginines (R) were seen where the genomic copy specified glutamine (Q) -Lots of different glutamate channel cDNA molecules were sequenced to test the frequency of editing. GluR-C and GluR-D are unedited, and always have Q. Three genes (GluR-B, GluR-5 and GluR-6) are edited, but at different frequencies. -GluR-B is always edited to Q, whereas only a subset of GluR-5 and GluR-6 cDNAs are edited
RNA stability
-RNA stability is described as RNA half-life: the time in which the amount of RNA is reduced by half -RNA half-lives range from <1 minute to an hour in E. coli and ~20 mins to > 24 hours in vertebrates -RNA stability is affected by several factors -The structures at the 5′ and 3′ ends are important. The 5′ cap in eukaryotic mRNAs protect against exonuclease digestion. Bacterial RNAs with a 5′ -triphosphate are more stable than those with a monophosphate -Stem loop structures at the 5′ and 3′ ends also contribute to stability. 3′ end stem-loops in bacteria, including those formed in Rho-independent termination, protect against 3′ to 5′ exonuclease activity -In bacteria, a 3′ poly(A) tail decreases stability. This structure increases stability in eukaryotic mRNA, and elements that remove the tails contribute to decreased stability -Other RNA processes, like splicing, transport and translation, can impact half-lives by blocking or allowing access to degrading enzymes
RNA-binding proteins
-RNA-binding proteins often have more than one binding motif -The eukaryotic PUF protein family, including Pumilio 1 has repeats of a two helix motif, with conserved amino acids, each interacting with a single nucleotide -Some RNA-binding proteins need to have small regions rich in arginine and lysine (basic) -Some proteins with the zinc finger DNA-binding domains also bind to RNA
Degradation of RNA
-RNAs need to be degraded at some point, removing RNAs that are no longer needed and recycling the nucleotides -"Normal" RNAs (those the cell produces) are degraded in a different way to foreign and defective RNAs -Some RNAs, like rRNAs, are needed a lot, and are fairly stable. Others, like some mRNAs, are only required for short periods of time and so are rapidly degraded
Riboswitches
-Riboswitches are portions of a transcript that can directly bind a small molecule that controls the RNA secondary structure, regulating transcription or translation. Can stimulate or repress transcription. - Riboswitches have two regions - the aptamer that binds to the metabolite, and an expression platform which controls transcription or translation. Expression platform can be in on or off state.
Model for the action of chromatin remodeling complexes
1. Complex binding 2. "loosening" of the chromatin structure (ATP used) 3. Remodeling- either octamer transfer or octamer sliding Multi-subunit enzymes that use ATP to move nucleosomes around. Most cases, it loosens things up and pushes things aside. Sort of translocates these nucleosomes around, so that the region that was extremely packed is not accessible.
Transcription of DNA into RNA by RNA polymerase: An overview
1. Requires DNA template, four ribonucleotide 5' triphosphates, Mg+2. 2. De novo synthesis: does not require a primer. Low fidelity compared to DNA polymerase: errors 1/104-105 (105 higher than DNA pol). 3. Activity highly regulated in vivo: at initiation, elongation and termination. 4. Nucleotide at RNA 5' end retains all 3 phosphate groups; all subsequent nucleotides release pyrophosphate (PPi) when added to the chain and retains a single phosphate. 5. The released PPi is subsequently hydrolyzed by pyrophosphatase to Pi, driving the equilibrium of the overall reaction toward chain elongation. 6. Growth of the transcript always occurs in the 5'-to- 3' direction.
PIC components
1. TFIID (with TATA-binding protein, TBP) binds to TATA-box and bends the DNA 2. TFIIA and TFIIB associate, stabilizing TFIID binding and helping to open small bubble 3. Pol II and TFIIF bind to A/B/D and scan to find +1 site 4. TFIIE and TFIIH bind to Pol II, finish opening bubble and license initial transcription
Protospacer is not 100% specific
20 bp protospacer + NGG PAM --> 10^-13 specificity Guide RNA has high stringency, but part closer to the PAM has lower stringency. Can allow some mismatches. Specificity rules are still being worked out. Fine for research and more. OK for therapeutics?
Pathways for degradation of eukaryotic mRNAs
In the deadenylation-dependent pathways, the poly(A) tail is progressively shortened by deadenylase until it reaches a length of 20 or fewer A residues, at which point the interaction with PABPI is destabilized, leading to weakened interactions with the 5' cap and translation initiation factors. The deadenylated mRNA may either (1) be decapped by the decapping enzymes and then degraded by a 5' to 3' exonuclease (both are enriched in the P-bodies); or (2) be degraded by 3' to 5' exonuclease in cytoplasmic exosomes (a multi-protein complex capable of degrading various types of RNA molecules). Some mRNAs are cleaved internally by an endonuclease (e.g. Argonautein RISC) and the fragments degraded by an exosome. Other mRNAs are directly decapped before they are degraded by a 5' to 3' exonuclease.
Practical use of siRNAs
In the lab, if you want to silence the expression of a gene of interest, one useful tool is to use siRNA. siRNAs and short hairpin (sh)RNAs are used to "knock down" expression of specific target genes in mammalian cells 2 ways of doing gene knock down studies 1st is to chemically synthesize RNA duplex. You synthesize it so it makes it in the 3' 2 nucleotide overhang on each side. 25 nt long. Bypasses the entire trimming process. Deliver the synthesized siRNA duplex into the cell and allow for RISC complex to pick on of them to target the gene you want to silence. 2nd is to make (create DNA plasmid, then transfect into cell to allow it to be produced) short hairpin RNA (shRNA, consist of sense and antisense sequences separated by a loop sequence.). Resembles intermediate. Can be further cleaved by your dicer enzyme inside your cell. This will be processed and one of the strands will be picked by RISC. This method will produce a longer lasting effect because this plasmid will be often integrated into the chromatin, so it will be always present and it will always be produced. In 1st one, it will most likely be degraded after it targets gene.
Thalassemia
Mutation caused by a splicing error. It is a genetic disease in the noncoding sequence. Shows how important it is to precisely cut the intron sequence. A group of inherited anemias characterized by defective synthesis of hemoglobin (O2-transporter with α2β2 subunits) is caused by mutations in α- or β-globin genes. Defective hemoglobin can lead to anemia. A new 3' splice site due to a G to A mutation, leading to aberrant splicing of β-globin gene. Tricks machinery to thinking that this is the end. Removes another part as the intron and leaves the rest of the actual intron in the sequence. Have a premature stop codon. Will turn into a truncated version of the B-globin.
dCas9 is a DNA targeting platform
Not only for making mutations in genes and rearranging the genome, but also to inactivate the active site so you don't get cleavage on either strand. Enzyme no longer cuts DNA, but still targets the chromosome, so you can fuse the Cas9 to the protein itself. Can fuse to it protein modules effectors that activation domains (turn genes on) or repressor domains that turn genes off. Without making mutation, can modulate gene up or down.
In vivo assay for transcription factor activity
Now you have to test and ask can you prove if it is important in turning gene on in context of living cell. Tricky business: if you do plasmid experiment in liver cell, the liver cell already makes that protein, so you know it is going to turn it on. Now, you are going to use it in a cell that doesn't have the protein. Will have 2 plasmids: the reporter gene with the control sequence and then have second plasmid with the protein you just purified that you obtained a gene for. This promotor is only going to be activated when this protein gets produced and it's going to be the right protein. If wrong protein, it's not going to recognize the sequence and will not get reporter response. You can take this reporter and just go into liver cell and will get response. But if you go into some other cell that doesn't have the factor, but only turn it on when you put in the plasmid encoding the gene you just purified, you know that protein will be responsible for turning it on. • Host cells should lack the gene encoding protein X and the reporter protein. • The production of reporter-gene RNA transcripts or the activity of the encoded protein can be assayed. • If reporter-gene transcription is greater in the presence of the X-encoding plasmid, then X is an activator; if transcription is less, then X is a repressor.
How does it inhibit translation?
Nowadays, people generally believe the reason for translation inhibition is mostly a location issue because the RISC complex will take the the miRNA or siRNA and go into a special place inside of the cytoplasm called P-Body. This P-Body don't have translation machinery but are enriched in RNA decay factors, allowing mRNAs targeted by RISC/miRNAs and RISC/siRNAs to have different fates once sequestered in these cytoplasmic foci.
Cellular Reprogramming
Nuclear Transfer: Take a cell from skin. Take the nucleus and transplant it into an embryonic cell type which has been enucleated. Now you have a cell that has the capacity to self renew but has nucleus of another Cell Fusion: Use chemicals that will fuse 2 cells. Genetic info from both. Trouble, you have 2 copies more than you should have. Direct Reprogramming: Easier and more effective. 4 transcription factors introduced into skin cell. Can reverse system and turn it back to a pluripotent cell. Convert skin cell into almost pluripotent stem cell which can then differentiate.
Catabolite repression
Observation: Glucose is a preferred sugar for E. coli, which uses glucose and ignores lactose in media containing both sugars. In these cells, β-galactosidase level is low, suggesting that the lactose-mediated de-repression at the operator site is not enough to turn on the lac operon due to the fact that the lac operon has a weak promoter. This phenomenon is called catabolite repression.
Metabolism of lactose in E. coli and the lac operon
Primary concern is having enough food because they need enough nutrients to survive and replicate. A lot of genetic information is used for the purpose of responding to nutrient presence or deprivation. There is a stretch of DNA in the E.coli genome that has an operon that controls a series of proteins that are required for it to make efficient use of the nutrients. Monod discovered that you could feed it different kinds of sugars and they would turn on different genes or turn off genes. Bacteria love sugar because it is easy to break down and can convert it into energy faster. LacZ: enzyme that cleaves the disaccharide (lactose and glucose molecules) so that it can by hydrolyzed to make ATP Operon: P: Promotor O: operator- all important negative control unit. Series of genes LacI: repressor; PI and LacI are not part of the lac operon. They can be anywhere in the entire chromosome and it wouldn't matter because promotor uses RNA polymerase to turn on expression on LacI and LacI is the thing that recognizes the operator to keep this whole operon shut down when there is no lactose around. •Binds to two LacO sites and tightly bends DNA •Allosteric binding of allolactose (or mimic molecules) causes release of DNA
Strong vs weak promotor
Promoter: sequences that are recognized by activators, repressors, and sigma factors to facilitate RNAP loading •Almost always directly upstream of +1 start site Strong promoters: close to consensus sequences and spacing Weak promoters: contain substitutions at the -35 and -10 regions
Two-step decoding process for translating nucleic acid sequences in mRNA into amino acid sequences in proteins
Protein synthesis in making proteins off of mRNA is a 2 step decoding process. Important recognition is not just in mRNA, but also in Aminoacyl-tRNA synthetase. There are many synthetases that have a specific active site that recognizes a specific amino acid and this specific tRNA. This reaction is very sensitive to the amount of amino acids around and amounts of proteins. 1. Linkage of amino acid to tRNA with specific active site (Use of ATP). Forms high energy ester bond 2. tRNA binds to the UUU codon. Results in the amino acid being selected by its codon.
Three types of RNA polymerase inside eukaryotic nuclei
RNA Pol 1 is present in the nucleolus. It is involved in the transcription or pre-rRNAs that are going to then be processed to give rise to functional ribosomal RNA. Is insensitive to toxin alpha-amanitin. Presence of alpha-amanitin toxic will not inhibit the production of these ribosomal RNAs. RNA Pol 2 is present in the nucleoplasm. It is the one that is involved in transcribing all protein genes (making all mRNA) as well as some nuclear RNA (RNAs that never leave the nucleus and are involved in splicing and other types of RNA processing) as well as micro-RNAs. Very sensitive to alpha-amanitin RNA Pol 3 also in the nucleoplasm. Transcribes all tRNAs, as well as the 5S ribosomal RNA, and some small nuclear RNA.
What transcribes DNA to mRNA?
RNA Polymerase II. Reads the genome. Polymerases cannot discriminate between genes, promoters, and random DNA. It mist be directed by other factors.
Cooperative binding of cAMP-CAP and RNAP on the lac promoter
RNA polymerase is binding to promotor that is weak and not doing a very good job. cAMP improves ability of RNA polymerase to bind to the site. Uses a protein-protein interaction between alpha subunit to literally physically touch a portion of the complex. This enzyme is what binds more efficiently to lac promotor. cAMP-CAP contacts the a-subunits of RNAP and enhances the binding of RNAP to the promoter.
What does RNA Pol II require?
Requires >85 associated factors and regulatory proteins to control transcription. Don't call it a holoenzyme because don't travel together like sigma, they have to be assembled. Bits and pieces of it at the promotor of given gene. Proximally bound activators and distal enhancer: those are binding sites for the same type of DNA specific transcription factors, but they now have to loop DNA and loop back to core promotor region, so that information that is sitting long away (can be either 5'- upstream or downstream of genome).
What blocks the path of RNA transcript?
Rifampicin • Rif binds a non-conserved region of the transcript channel • RNAs >2 nts long bump into Rif • Rif still used to fight TB
Protein synthesis on a circular polyribosome in eukaryotic cells
Schematic drawing showing how a series of ribosomes (polyribosome) can simultaneously translate the same eukaryotic mRNA molecule, which is in circular form stabilized by interactions between proteins (PABPI and translation initiation factors eIF-4E and 4G) bound at the 3' and 5' ends. The 5' cap and 3' poly(A) tail have been shown to synergistically enhance translation. They likely do so through stabilizing mRNAs and probably also facilitating ribosome recapture on circularized mRNAs.
What is the repressor actually doing?
Single protein working as a tetramer. Each colored ribbon is a subunit of repressor (4=tetramer). Tighter binding in absence of inducer. Inducer causes conformational change and weakens the ability to bind operator. The lac operator sequence is a nearly perfect inverted repeat centered around the GC bp at position + 11.
Computer-‐‐assisted search for human SALL1 gene enhancer that directs transcription of the gene in the developing limb
Another technique. It is because sequencing got so cheap and so quick. You can now take a chromosome from 10 different organisms that are diverged in various time scales. In this case from fish to mouse. These are all vertebrates. Can compare where sequences have been conserved. Notice there are certain regions of this chromosome 16 that happen to have a fair high degree of conservation in many different organisms. Big clue, especially because these sequences are noncoding. Finding homology that is high in a coding region is not unexpected. It is unexpected to find noncoding regions ("dark matter) and finding some regions that are highly conserved across hundreds of years of evolution. Tells us that this sequence is probably important. Another way that computationally you can try to pick out regions that you think are important for regulating some gene. Tricky part: so you found a hypersensitive site, which gene is it regulating? Hard to know. The gene can be far away. Need to do other techniques to try to understand what the connections are between these conserved hypersensitive sites with what gene is being regulated by them. What happens with any of these techniques like whether it is the hypersensitive site or the conserved site, or any other, you know have a candidate, have a region you think is important. Graphic representation of the conservation of DNA sequences within a corresponding region (~6-Kb) in five different genomes reveals a region of ~500 bp of non-coding sequence that is conserved from fish to human.
2 applications of induced pluripotent cells (iPSC) in biomedicine
Applications 1. Regenerative medicine. Literally create a replacement organ or cell type. 2. Biggest problem with attacking a certain disease, in order to develop drug you must have a model to test the drug. Ideally you want it to be identical genetically to the disease.
Flowchart of DNase-seq protocol. Experiment
Because the sequences that are going to be important for regulating the activity of the core promotor can be far away. How do you find them? Had to come up with a bunch of indirect screened tricks to give us some idea of where these control elements might sit, especially these distal elements. First of several techniques about trying to figure out where a region of interest for a promotor or enhancer binding factor might lie. One thing that was discovered back in the 80s, was that if you took chromatin that you very gently got out of the nucleus, and you hit it with a light amount of DNA, double stranded cleaving DNA. It would scant he entire genome, and wherever the naked DNA (not protected by the nucleosome) was available, it would cut. Basically will read the entire chromatin- every chromosome and cut wherever it is accessible. Took us a while to figure out that if you do such an experiment, and then map accurately all the places where the DNA had accessibility, its actually a good but rough surrogate for mapping all the locations where transcription factors might be. Principles are always the same. Have to have labeling what's going on or picking out label and then determine the sequence so you can go back to your genome map and lay it out. Situation here, DNA has a hypersensitive site. Usually means it is free of a nucleosome. Nucleosome have a periodicity to them. Reason why this technique works, turns out that DNA binding proteins, the very transcription factors you care about, which will be binding to the important regions, will bind and come off, bind and come off. SO those regions are accessible. As a consequence DNA has a chance to get in there and do cuts. If you do that, you can then do a couple of tricks, which is to Ligate- chemically link biotinylated linker sequences (red sequences, which allow this to be puled out). Way to recover these little sequences where your surrounding, wherever it cleaved. Happening in the entire genome, so will get thousands of little fragments. Many ways to try to identify these sequences. Briefly, cells are lysed with detergent to release nuclei, and the nuclei are digested with optimal concentrations of DNase I. DNase I-digested DNA is embedded in low-melt gel agarose plugs to reduce additional random shearing. DNA (while still in the plugs) is then blunt-ended, extracted, and ligated to biotinylated linker 1 (red bars). Excess linker is removed by gel purification. Biotinylated fragments (linker 1 plus 20 bases of genomic DNA) are digested with MmeI and captured by streptavidin-coated Dynal beads (brown balls). Linker 2 (blue bars) is ligated to the 2-base overhang generated by MmeI, and the ditagged 20-bp DNAs are amplified by PCR and sequenced by Illumina/Solexa., but in the ends, because DNA sequencing has gotten so fast and so cheap, turns out that easiest way is to take all these little fragments that you pulled out (which are represented of al the places in the genome where it is accessible to cleaving by the nuclease, and just sequence them all. Computer program that then lays them all out along the entire genome of whatever organism you are studying. Then ask where are the fragments most frequently being cut and so will line up. Peaks tell us DNA has cleaved here more often than the other regions. More representation of that sequence over and over. Have now figured out all the regions of chromatin that are accessible, how is that helpful? Key thing that took us a long time to figure out is that these hypersensitive sites turn out to be very often coincident with sequences that are bound by a specific transcription factor. How would we know that? DNA footprint. Once you scan the entire genome, you can go back and take a fragment that crosses over or overlaps with one of these hypersensitive sites, then ask is there something binding to that sequence using the DNA's footprint or other faster technique. Key point: you need multiple experiments that are kind of orthogonal in the way they are carried out to address question of what is actually binding there. Advantage of this technique is that it scans the entire genome. If you did experiment with liver cell and got your peaks and then did it on a muscle cells and got your peaks. What would you expect to see between the pattern you get for the liver and the pattern for the muscle cell? Would expect some differences, some similarities. Every cell has certain functions (housekeeping functions) that have to happen. Those should be the same, but particularly interesting to us are the differences. The peaks that are different will tell us which genes are special to the liver and special to muscles cells. This very idea that you might have some differences, that different transcription factors really had to be driving the expression of different network of genes in different cell types.
Types of tiny RNAs. Where do they come from? Base pairing with target RNAs distinguishes miRNA and siRNA
Both made exactly the same way from exactly the same thing,except under one conditions we call them miRNAs and under different conditions we call them siRNAs. •miRNAs (micro RNAs; 21-‐‐25 nucleotides long) are processed from much longer precursors called pri-miRNAs, some are derived from excised introns and 5'/3' UTRs of some pre‐mRNAs. Call them miRNAs because there are multiple mispairings, multiple bubbles. When you have mispairing, you call it miRNA. It is not causing the degradation or cleavage of RNA, but instead causing translation inhibition of target RNA. Even though there is mispairing, the first 2-7 bases have to be matched otherwise nothing can work. Critical for targeting. •siRNA (Short interfering RNAs) are related to miRNAs and produced by the same mechanism. Called this if siRNA can find newly 100% complementarity with the target RNA. It will result in the cleavage of the target RNA. Cleavage occurs between 10-11 of short RNA.
Model of spliceosome-mediated splicing of pre-mRNA
Bringing everything together. Very much like PIC, many components and all have single mission. •Major components: five snRNPs (U1, U2, U4, U5 and U6 small nuclear ribonucleoprotein particles) containing 5 snRNAs (U1, U2, U4, U5 and U6 small nuclear RNAs, ranging from 107 to 210 nucleotides) and their associated proteins (6-10 per snRNP) assemble on the pre-mRNA to form the spliceosome. •There are a total of ~100 proteins in the spliceosome, some of which are not part of the snRNPs. These non-snRNP proteins may contribute to the specificity of recognition of the splice sites by snRNPs and some of them contain RNA helicase/ ATPase activity (hydrolyze ATP) to help rearrangements of base pairing in snRNAs and with pre-mRNA during the splicing cycle.
CTD
C terminal domain, is very important for the regulation of the activity of the polymerase and for a lot of the posttranslational modification of mRNA that happened because these act as hook, depending on what modification it has, for the binding of enzymes that then act on the mRNA. This region is used for recruitment of different factors. Has a Beta structure with many repeats that can be phosphorylated.
Polyadenylation of mRNA at the 3' end
CPSF: cleavage and polyadenylation specificity factor. CStF: cleavage stimulatory factor. CFI & CFII: cleavage factor I & II. PAP: poly(A) polymerase. PABPII: poly(A)-binding protein II. -RNA is cleaved 10~35-nt 3' to A2UA3. - The binding of PAP prior to cleavage ensures that the free 3' end generated is rapidly polyadenylated. - PAP adds the first 12A residues to 3'-OH slowly. - Binding of PABPII to the initial short poly(A) tail greatly accelerates polyadenylation by PAP. -Poly(A) tail stabilizes mRNA and enhances translation and export into the cytoplasm. - The polyadenylation complex is associated with the CTD of Pol II following initiation.
CRISPR array
CRISPR array--> tracRNA:crRNA --> sgRNA 3 component system. Can take RNA polymerase, transcribe it, take the protein and make massive amounts, mix them together and get an active enzyme that has the RNA you made and get a nuclease. You can introduce the nuclease into the cell as an RNP complex or put it into a virus vector that would have this RNA and transcribe this RNA from the vector. The virus can have the gene encoding Cas9 sequence itself. The virus can now make the protein and the RNA in the cell you want. This would create a nuclease in the cell and give you a DSB.
Where does CRISPR/Cas9 come from?
CRISPR/Cas9 comes from a bacterial immune system. Piece of DNA: protospacer and PAM.Piece of DNA that has some sequence next to it. Virus comes into cell once, and it had a sequence that was acted on by an enzymes that clipped a piece of DNA out in such a way that that the protospacer piece got associated with another set of enzymes that inserted that into the genome. This process where this was cut and put into the genome, stays as a mark in the chromosome of this viral infection. It survived the infection. Locus gets built up and now this locus creates an RNA transcript. Adaption: is a promoter that can now make long RNAs that have all of these repeats in them. Those RNAs were detected and they were nucleases that process these RNAs. These RNAs were processed and became associated in the cell with another class of CRISPR associated sequences that can then form a complex which then can come back and be used to cut any DNA of invading virus. Adaptive immune system.
Cas9 can be massively multiplexed
Can make guide RNAs. Synthesize thousands of them at once and assemble them in complicated ways. Package viruses that each one of them has a different guide in with Cas9 enzyme. Can infect cells in culture, stem cells, organisms and look for phenotypes in complicated ways. Can do whole genome screens for mutations for a particular phenotype. Can select viruses that give you your phenotype.
What wraps around the DNA target?
Cas9 + guide RNA wrap around the DNA target
How do we know that this base pairing between the U1 and the 5' splice site is important? What is the experimental evidence?
Classic experiment to showcase that base pairing is important. 1st thing they did was to use a mutation changing this G to a A in the pre-mRNA. When you change this, this creates a bubble because the C and A can not base pair. The snRNA can not properly pair with the 5' splice site. If you measure splicing reaction containing this mutant, you see splicing is damaged, it is blocked. How do you know that this is due to a defective interaction between snRNA? maybe you messed up some other unknown things (ATP can not hydrolyze, etc.) 2nd experiment: Make a compensatory mutation. Now you make a change in snRNA, change C to U. By changing this, you can allow the U1 to base pair. You measure splicing reaction for both 1st and 2nd experiment and see that splicing is restored.
Lattice light sheet illumination and better dyes
Confined Illumination: good sectioning capability, low photobleaching and sample damage improved axial resolution
Targeting of chromatin remodeling complexes to specific DNA sites
Conundrum: A lot like RNA Pol. They are promiscuous. They don't know where they are supposed to go. Problem: don't want it to go everywhere. Something has to direct them to the right place. At least one mechanism is that the very same proteins that we are very keen on, like sigma factor like transcription factors are the ones telling the complex where to go. Remodeling complex, which is not very specific, will get together with a very specific transcription factor and it can do it in 2 ways 1. Can form a complex with the transcription factor and then that goes to the right place 2. Can find a transcription factor that is already bound to DNA and then do its thing.
Eukaryotic DNA is wrapped into a highly compact form: Chromatin
DNA is not ds naked DNA, but rather wrapped up in a structure called Chromatin. Actually a protein-nucleic acid assembly. Has histone genes (charged molecules that form octamers) with DNA wrapped around it.
Gene Expression=
DNA-->RNA-->Protein DNA-->RNA (transcription: the "gatekeeper" of gene expression that is controlled by ~10% of our genome.) RNA--> Protein (translation)
Genome editing is all about DNA repair
DSB---> NHEJ --> Deletion or insert nucleotides. very error prone repair knockout. What does X do? or HDR--> Uses donor homologousDNA. Precise insertion or modification. Homology directed repair replace. What does X do? Cure genetic diseases.
Cas9 in more detail
Different Cas9--> different PAMs. PAMs allow for your target DNA to interact with the protein and help start the reaction. No matter what the guide RNA has, the target DNA has to have that in it. A module that a class of enzymes require for this particular guide. Constant region is always the same, always in the transcript. Protospacer is always specific to the target that you are interacting with.
Chemistry of the lac operon
Disaccharide has to be cleaved into monomeric units, which then have to be hydrolyzed into normal metabolic pathways. Lactose has to be transported in. Galactoside permease is important.
ChIP-Seq
Genome-wide analysis of TF binding in a cell population. Where are the regulatory sequences and transcription factors that will direct transcription? DS DNA 1. Population of cells. You extract these cells only after formaldehyde fixation. Proteins can't move. 2. Cell lysis and sonication 3. Immunoprecipitation. Interested in a certain gene. Use specific antibody that recognizes the specific one. (or can use CRISPR-Cas9, genetically modify proteins so it carries a tag, just have to make an antibody that detects the tag. don't need to make many antibodies.) 4. Reverse crosslink and amplify DNA 5. Analyze DNA. Wherever there is a peak, that is where the protein is bound. More fragments it is bound to the higher the peak. The more places it is bound to, more peaks. Can see where a protein is binding throughout the genome and what are its neighbors. Gives you a survey of all of the potential binding sites of a particular transcription factor that is sitting there
What is the ground state for the lac operon?
Ground state: promotor will be active. Will make protein LacI, which is a repressor. Repressor is going to be binding to the operator and keeping whole system shut down, but ready to change if conditions should change.
Transcription of most prokaryotic genes is regulated in units called operons (Jacob and Monod, 1960)
He noticed that the growth curve of bacteria grown in certain media different from each other and that there were interesting bumps and grooves on the growth curve. Wanted to know why it looked like that. After many years of work, these two figured out the first paradigm for how gene regulation works at the transcriptional level. They discovered some fundamental principles, some of which apply largely to bacteria and other prokaryotes, and some that really apply across all organisms. Discovered basic architecture of a gene: that has a start, which has a short sequence of DNA called a promotor (promotes transcription), and the principle that is mostly true in prokaryotes, not so true in eukaryotes, that you have certain genes that have to be expressed accordingly (in other words, they are part of some pathway where all of the genes are required at the same time or at least in an ordered fashion. Instead of having a single promotor for single gene, you have a promotor transcribing an operon. What really captured his attention was the operator because they realized that even in bacteria, not all promotors are responding at the same time or the same extent, or under the same condition. Promotors have to be turned on and off. Means that a promotor that is not functioning, there is no product being made, goes from an off state to an on state in a certain amount of time.
Heat shock response in sigma
High temperature induces the production of σ32, which binds to the core polymerase to form a unique holoenzyme for recognition of the promoters of heat-shock induced genes.
Sequence-specific DNA Affinity Chromatography
How do you get your hands on it? Have to purify protein. Use same DNA fragment , make multiple copies of it, and hook it up to a resin. So that you have all the recognition sites and multiple copies at high concentration. Now if you take that whole liver extract, pass it through this, the right proteins should bind and all other non specific proteins should flow through or bind to nonspecific DNA. By doing this multiple times, you can end up with a purified test tube of only one protein (the one you care about).
Super-resolution imaging of TFs in live stem cells
How to determine temporal dynamics, order of events and 3D spatial disposition of TFs "working" in the nucleus of living cells? Must discriminate specific from non-specific interactions
What is the synthetic version of the inducer for the lac operon?
IPTG: non-metabolizable artificial inducer (can't be cleaved) Operator is only functioning under certain circumstances. When there is lactose you want to turn gene on. So how do you shut off the repressor? The sugar lactose itself is an inducer that will alter the function of the repressor, but in the lab when doing these experiments, we don't want to the inducer to be fleeting. Don't know what the concentration of the inducer is because it is changing all the time. So have designed a model that is non-hydrolyzable. It has the same part of the molecule that will bind to the repressor (target molecule), but it doesn't get hydrolyzed. Synthetic version of inducer.
Targetable DNAses introduce genomic breaks
If you make DSB in gene you sequenced, want to know what it does, you want to make a mutation there or change it in some way. You start by finding a way to make a specific DSB right at that site. The oldest mechanism for doing this relied on zinc fingers. ZFN: Protein-based targeting Mature technology. You can make Zinc fingers that recognize specific 3 nucleotides. If you string them together and make a fusion gene that recognizes this sequence in gene X, you can fuse it to a nucleus that can maybe nick. On the other side you can fuse another one that recognizes that side. This is very hard. TALEN: Protein-based targeting. More modular. TALEN domains that can be engineered to recognize a specific base. If you string these together you can recognize a large sequence. Also difficult. Nature provided us with a system that actually uses these specificity of RNA in an enzyme that recognizes duplex DNA to cleave that duplex DNA. CRISPR/Cas9. RNA-based targeting Fast, cheap, easy
Germ line engineering.
If you want to make a mouse mutant for a disease, you have to have cells in culture and you select from thousands of different cells the mutation you think you might look for. You sequence and then you go back to an embryo and read this mouse and then go back in different ways. Takes very long time to make a model to find out what your gene might be doing. hardly used anymore Can now inject the ends of an RNP particle directly into a zygote and in one generation, you can get a mutant .Takes a couple of months. Knockout can be made in a few months with little cost.
How do we visualize P-bodies (aka GW-bodies)?
Immunofluorescence staining: Argonaute and GW182 proteins co-localize in the processing bodies or P-bodies in the cytoplasm of human cells. Both proteins are crucial for the RNA silencing process.
How are 5' splice site and branch point are recognized by the U1 and U2 snRNP
The recognition is simple. It is a base pairing between U1 snRNA and part of sequence of 5' splice site. The 5' end of U1 can directly base pair with sequences around the 5' splice site. That is the basis for recognition. The U2 recognizes the branch point A also by base pairing between part of the U2 snRNA and the sequences surrounding the branch point A. RNA uses sequence to be complementary so that there is basis for recognition and proper alignment.
Negative regulation of the lac operon
The repressor, blocks the expression of the Z, Y, and A genes by interacting with the operator (O). The inducer (lactose or IPTG) can bind to the repressor, which induces a conformational change in the repressor, thereby preventing its interaction with the operator (O). When this happens, RNA polymerase is free to bind to the promoter (P) and initiates transcription of the lac genes The lac repressor, when it's produce by LacI gene from promotor, is a repressor that will bind very tightly to this short sequence and prevents RNA polymerase from starting transcription. When you shift bug from a non-lactose to lactose containing unit, the inducing molecule will bind to a little cavity of the repressor and cause a conformational change and now it will no longer be able to bind efficiently to the operator DNA sequence. You have taken away the gate that has kept RNA pol from transcribing.
lac repressor contacts two operator sequences simultaneously
The tetrameric lac repressor binds to the primary lac operator (O1) and one of two secondary operators (O2 or O3) simultaneously. The two structures are in equilibrium. •When both O2 and O3 are mutated, repression at the lac promoter is reduced by ~70-‐‐fold. Mutation of only O2 or O3 reduces repression 2-‐‐fold. There are multiple symmetrical binding sites and that it wants to make sure there is enough sequence to keep it shut down when it needs to be shut down •The secondary operators function to increase the local concentration of lac repressor (~10 tetramers per cell) in the micro-‐‐vicinity of the primary operator. l ocally increasing the concentration of the repressor so that you have a rapid response when and where you need it. Part of reason for that is that there aren't many molecules that are tetramers in the cell, so the tetramer has to be efficient at scanning the entire genome of E.coli, finding these operator sites and binding to it. One way to ensure that is by somehow locally increase the effective concentration of a protein.
Hunting for DNA sequences and Proteins that Control Gene Expression
There are proteins that scan genome constantly to make sure machinery and protein get to the right place. These so called transcription factors turn out to be a large family of proteins representing up to 10% of genome capacity for coding. These proteins are not like sigma factors in the sense that they are just binding right next to the start site of transcription, but rather they are unleashed from physical constraint and can bind in many places along the genome and somehow communicate with machinery at promotor, which is the start site of transcription. Enhancers- long distal elements that can be far away You are assembling a whole complex. A machinery with many parts - more than 85 components First, need to find DNA control elements (promoters-enhancers) in the genome that are recognized by TF and control transcription Transcription factors recruit and instructs RNA pol to initiate RNA synthesis at specific genes by recognizing and binding to DNA elements called promoters.
How do you remove histones and nucleosomes from a highly condensed region?
There is a whole protein machinery called chromatin remodeling factors, multi subunit complex machines that actually move nucleosomes around (don't often remove completely off, the just shift them around). Makes accessibility possible, so RNA Pol can bind, various transcription factors can bind, and so on.
miRNA/siRNA processing
These can be made in 2 trimming steps. Part 1: occurs inside of the nucleus. In this example we are looking at one particular gene that encodes long precursor RNA (pri-miR-1-1) for miRNA. One important feature of precursor is that they have quiet extensive complementary region within, so will form a long stem loop structure. Some of those precursors can be excised introns, or 5' 3' UTRs. During 1st step, you require 2 proteins, Drosha: A nuclear double-stranded (ds)RNA-specific endoribonuclease (make a cut in the middle stretch of RNA) that processes pri-miRNAs in the nucleus. Recognizes this double RNA and makes a cut. It is now called pre-miR-1-1. Intermediate precursor. DGCR8 (Pasha in Drosophila): A nuclear dsRNA-binding protein and a partner of Drosha. Produce intermediate Pre-miRNAs: ~70-nt long processed product from pri-miRNAs. Ready to be shipped into the cytoplasm by Exportin5: A nuclear transporter protein that exports the pre-miRNAs to the cytoplasm through the nuclear pore. Part 2: occurs in the cytoplasm Begin with 70-nt long intermediate. This will be further trimmed by another pair of proteins Dicer: Ribonuclease (RNase) III enzyme that processes double-stranded intermediate into 21-25 bp miRNAs/siRNAs leaving a 2 nt overhangs at the 3' end. Overhang is the signature of dicer. TRBP: Needed for dsRNA cleavage by Dicer and subsequent passage to the RISC complex. RISC: The minimal RNA-induced silencing complex (RISC) consists of the Argonaute protein (essential component) and a selected mature single-stranded (ss) mi/siRNA. The complete complex may also contain other proteins such as GW182, Dicer and TRBP. The complex will take 1 of the 2 strands of the finished duplex and insert it into the complex. Now this complex is ready to target your mRNA to inhibit either translation or cause cleavage of target RNA. RISC can not distinguish between the 2 strand. It will pick randomly one of the 2. It can pick both, but only 1 of them will be functional because only 1 strand will actually find the target mRNA.
How is information being read in a regulated fashion?
Think of the chromatin not only as an important packing apparatus, but also an unpacking apparatus. Turns out there are certain chemical signatures for parts of the chromosome that are going to be actively transcribed (genes are going to be switched on) vs parts of the chromosome that are going to be silenced (heterochromatic regions- highly condensed). There are certain chemical signals that if you had a machine that scanned, it would instantly tell you this part of the chromosome is dark, don't bother, go to this side where all the activity is happening. And there are a couple of chemical modifications (covalent changes) that can happen both on the DNA itself or on the proteins that the DNA is wrapped around, so called nucleosome and histone proteins. There are many different chemical modifications that are signatures. Gene is "switched on"- Actively transcribed region (Part of the chromosome which is carrying genes that have to be transcribed in a particular cell. Has to have room for RNA polymerase to sit in and do its thing.) 1. DNA is not methylated in the C position. Means it is silent regions of the chromosome. 2. More importantly, the histone octamers (8 subunits) there are regions of these histones that are not structured (flopping around). Structures can be modified by Lys or Arg, etc. Modified by either acetylation or methylation. Certain enzymes involved in this modification. i. HAT: histone acetyl transferase. Transferring acetyl group to certain components of the histone genes. Marks active genes. 3. Have to loosen up structure. Go from highly compacted shut down chromosome, into a more active loose state. Not shown: when you are in a really compacted state (call heterochromatic regions), you not only have histone octamers, you have histone H1 or H5, specialized histones that are called linkers that further compact structure. Off state (heterochromatic region) Have methylation: makes DNA less accessible. Plus have all histones in a very tight compact. Have a bunch of enzymes that are converting the compacted state to loose state, one of which are Histone Deacetylates: HDAC (move all acetyl groups which are making access easier to making it harder). Also a bunch of histone methyl transferases: HMT, enzymes that methylate various lysine residues in different histones. Tend to be signatures for shutting things down.
Alpha-amanitin
This polymerase is extremely sensitive to a toxin that is produced by this mushroom, alpha-amanitin. Binds very close to the bridge helix by the active site of the polymerase and inhibits the elongation of mRNA. This is the clamp and the jaw region where the duplex enters the polymerase and what is called the cleft of the polymerase transcribed strand then gets copied into mRNA. this is the catalytic Mg that is involved in the reaction. Then there is an exit channel for the mRNA.
What specialized proteins control gene expression?
Transcription factors. Found that these proteins could scan DNA, recognize short stretches of sequence , bind to it, and then collect other proteins to start transcription. (like sigma, but more complicated, Uses thousands). And not constrained to just binding -10 and -35, but can bind all over the place. TATAA binding protein. Recognizes TATAA sequence. Closest thing in eukaryotes to sigma, because this protein will form a complex with RNA Pol (pre-initiation complex) Other proteins that can bind quite a distance away. Sequence specific promotor and enhancer binding proteins: transcription factors- TFs
Iron-dependent regulation of TfR mRNA degradation
Transferrin receptor (TfR) is needed for the import of iron into the cell. It does so by internalizing the iron-transferrin complex through receptor-mediated endocytosis. TfR expression is regulated in response to intracellular iron by the IRE-BP protein. The iron-bound IRE-BP is unable to bind IREs (iron-responsive elements), while the iron-free IRE-BP can bind to IREs and prevent the recognition of the AREs by TTP/BRF
Translation of part of the leader mRNA to produce the leader peptide
Trp operon. Involves the levels of this charged tRNA with trp. mRNA for the trp operon. Before the beginning of the first polypeptide for trp synthesis (TrpE) is another ORF that can encode for the leader peptide. There are 2 codons for trp, this redundancy for trp is important. If you delete one of them, it actually screws up the whole thing. There are sequences here that form hairpin structures that formed a structured RNA while the RNA Pol is making and transcribing RNA. RNA can fold in a complex way that allows it to adopt in the right conformation a terminator or anti terminator. RNA itself is actually an allosteric effector of transcription. It depends on how the ribosome is going through this. Sensitive to the rate of translation for this RNA. Presence of tandem Trp codons within the leader peptide is highly significant!
The trp operon: Two kinds of negative regulation
Tryptophan + trp repressor dimer --> Tryptophan-repressor complex activated for DNA binding---> Binds Operator; blocks RNAP binding & represses transcription; Tryptophan a co-repressor This trp operon controls the synthesis of this protein required in producing trp. When trp is low, you need to activate more proteins to make this product. Trp repressor will bind and shut it off. Shuts it off just enough to give you low basal level.
If you want to look at a population of only mature mRNA?
Use Poly(A) tail (only mRNA supposed to have this)
The PhoR/PhoB two-component regulatory system in E. coli
What we are trying to do in all these cases, is coupling various signaling or sensing mechanism receptors on the surface or in the periplasm of the cell that will be able to tell what is happening to the environment and how it is changing over time. Usually these things aren't going to directly be the molecules that are going to act like a repressor or activator. Have to go through a cascade of signaling (chemical changes, some covalent others bind, which will lead to a path that will ultimately turn a gene off or on). In response to low phosphate concentrations in the environment and periplasmic space, a phosphate ion dissociates from the periplasmic domain of the sensor protein PhoR. This causes a conformational change that activates a protein kinase transmitter domain in the cytosolic region of PhoR. The activated transmitter domain transfers an ATP γ-phosphate to a histidine in the transmitter domain. This phosphate is then transferred to an aspartic acid in the response regulator PhoB. Phosphorylated PhoB then activates transcription from genes encoding proteins that help the cell to respond to low phosphate, including phoA, phoS, phoE, and ugpB.
Models for Transcriptional Regulation by Protein Acetylation
When you go from a deacetylate state to a acetylated state, it also means you change chemical compositions and now that nucleosome can actually be recognized by other proteins that might be involved in moving nucleosomes around or loosing DNA. Go from repressed state to active or competent state. Doesn't necessarily mean that just because you have acetylated region that genes are on. They are ready to be turned on. Also be a chemical signal for binding certain factors. A factor, let's say a positive activating factor only binds when looking at acetylated regions. Can create or eliminate a recognition site for the binding of another factor. Turns out that you can acetylate other proteins too and affect the activity of the protein.
Self-splicing Groups 1 and 2
Without any participation of proteins whatsoever, pre-rRNA derived from the primitive organism Tetrahymena, can also carry out self-splicing, can splice out introns in pre-rRNA. This type of self-splicing is called Group 1 self-splicing intron. Chemical reaction is the same (2 steps of transesterifaction) as pre-mRNA splicing, except that the first nucelophilic attack that is made is not by the branch point A, but by using external cofactor of G using 3'OH group of G external factor (not covalently linked to the RNA, not part of the RNA). Intron is released as a linear form, not lariat structure because doesn't use the 2'OH, there is no branch formation. Group 2 self splicing introns present in pre-mRNA, but the pre-mRNA is in the mitochondria or chloroplast. This is exactly the same as the pre-mRNA we discussed earlier. Uses A branch
Operon
a coordinated unit of gene expression consisting of several related genes and the common operator and promoter sequences that regulate their transcription. The mRNAs thus produced are "polycistronic'—multiple genes on a single transcript.
DNase I footprinting
a technique for identifying protein-binding sites in DNA 1. DNA fragment labeled at one end with 32P ( ) 2. DNA samples are digested with DNase I in the presence and absence of a protein that binds to a specific sequence in the fragment. 3. A low concentration of DNase I is used so that on average each DNA molecule is cleaved just once (vertical arrows). 4. The samples of DNA then are separated from protein, denatured to separate the strands, and electrophoresed. The resulting gel is analyzed by autoradiography, which detects only labeled strands and reveals fragments extending from the labeled end to the cleavage site by DNase I.
Typical E.coli promotors are recognized by
an RNA polymerase holoenzyme containing sigma70
What do transcription factors do?
bind specific DNA sequences, organize chromatin into looped domains and recruit the molecular machinery that reads and transcribes DNA into RNA . Control expression Only ~3% of mammalian DNA is coding sequence, much of the rest consists of regulatory sequences.
Epigenetics
refers to all these changes that are on the chromosome but come and go during your lifetime. These changes are responding to the environment, including metabolic changes. Anything that is not permanent and not necessarily going to be passed down from generation to generation. Changing constantly, but nevertheless really affecting the ability of the DNA to program information and access to that information.
Core promoter elements in metazoans
there are these elements called the initiator elements, literally where the first nucleotide gets laid down. TATA box binding protein: the closest thing we have to equivalent of a sigma There are also a whole bunch of other things. Upstream factor, called TF2B, that recognize degenerate elements. Downstream core elements that are recognized by a very complicated set of proteins that work in conjunction with TATA binding protein, called TATA binding associative factors. Whereas in bacteria you had a few proteins doing all the work, in us and other eukaryotes, you have a great expansion or elaboration of machinery. A lot of it has to do with the ability of this machinery to discriminate between the right place to transcribe at the right time. Core promotor: sequences directly flanking the +1 start site. Assembly of general transcription machinery The thing about the core promotor is that these sequences are so degenerate that they have very weak signals for where polymerase is supposed to start. If you give a piece of DNA with these sequences in it , the pol enzyme will have some probability of starting accurately here, but it will also start at other places. That creates a problem. Need something else, kind of like a police force or director to make sure you aren't making a lot of mistakes transcribing regions you aren't supposed to. TATA box and Inr are frequently found in highly transcribed/regulated genes. However, they are not always found in the same gene promoter.
E.coli RNA polymerase holoenzyme bound to DNA Subunits and role
α (alpha)- binds regulatory sequences/ proteins β (beta)- forms phosphodiester bonds β' (beta')- Binds DNA template σ (sigma)- Promotor recognition ω (omega)-RNAP assembly
Interactions of various sigma factors of E. coli with the same core polymerase to form holoenzymes with different promoter-binding specificity
σ70 - Most genes σ32 -Genes induced by heat shock σ28 -Genes for motility and chemotaxis σ38 -Genes for stationary phase and stress response σ54 -Genes for nitrogen metabolism & other functions •Sigma factors have specificity for different -35 and -10 sequences ] •Facilitates polymerase engagement through 1D search •Causes transition between closed and open conformation (strand opening) •Bacterial transcription activators often interact with sigma (e.g. CAP)
How to set up spliceosome and how to carry out the 2 steps of chemical reaction?
• Begin with pre-mRNA (has 5', 3' splice site, and branch Point A). U1 snRNP recognizes 5' splice site. U2 recognizes branch point. This sets a foundation to attract the pre-from U4/U6/U5 (tri-snRNP). This will join the 2 existing sNRPs. This cannot carry out the splicing reaction because U4 masks the catalytic activity of U6 in the U4/U6/U5 tri-snRNPs prior to the actual transesterification reactions. •Massive rearrangements of base-pairing interactions among various snRNAs convert the inactive spliceosome into a catalytically active form, which releases the U1 and then the U4 snRNPs and brings U2 and U6 together forming catalytic center + U5. Can now carry out 1st transesterifcation reaction and then the second to release intron.
Organization of Genes in the Eukaryotic Genome
• Human genome has 3 billion base pairs • Only ~3% of DNA is protein coding sequence. 97% is noncoding. There is a lot of conservation not only in the coding, but quiet a bit of scattered interspersed regions that are highly conserved even in the noncoding. This tells you that over evolutionary time, these things are important. Called "dark matter," because don't understand what is going on. But a lot of it is coding for these little stretches of sequence (stop signs, start signs, etc). Trouble is computationally it is very hard to find these sites because they are so short and don't always look the same. Hard to read it and know what you are reading. .• An important component of the remaining DNA are regulatory sequences (promoters) Unlike bacteria, where all the assist regulatory control regions (so called "molecular sign posts") are very close, relatively speaking to the start sight of transcription or where RNA polymerase binds. • Some distal elements (enhancers) can reside 10-100kb away from the gene they regulate. In eukaryotes, there were sequences that were millions of base pairs away that could influence the transcriptional event at a long distance. Enhancers can be upstream or downstream
Capping of the 5' end of nascent RNA transcripts with m7Gppp
• The methyl groups are derived from S-adenosylmethionine. •The cap is added after the nascent RNA molecules produced by RNA polymerase II reach a length of 25-30 nucleotides. • Guanylyltransferase is recruited and activated through binding to the Ser 5-phosphorylated Pol II CTD. • Capping helps stabilize mRNA and enhances translation, splicing and export into the cytoplasm.
Regulation of splicing occurs in the form of alternative splicing
• This is based on the fact that many genes contain multiple introns •The presence of multiple introns in many eukaryotic genes permits expression of multiple, related proteins from a single gene by means of alternative slicing, a key mechanism for production of different forms of proteins, called isoforms, by different types of cells and/or under different conditions. •important in life because over 80% of all human genes are alternatively spliced, leading to an expansion of the coding capacity of our genome. Abnormal variations in splicing cause many genetic disorders and cancer.
TBP (TATA-binding protein)
•Conserved C-terminal domain •Dyad symmetry ' •Binds multiple transcription proteins •Binds in the minor groove and significantly bends DNA It binds the DNA, recognizes these nucleotides that form the TATA box, and in the process bends it very dramatically. Duplex DNA does not like to be bent. It takes energy to bend. This protein, in the process of binding the minor groove and recognizing specific residues in the concave part of the saddle, it interacts with bases recognizing sequence. It also helps that TA sequences are more bendable than others, part of the recognition process
Concept of alternative splicing: Cell type-specific splicing of fibronectin pre-mRNA in fibroblasts and hepatocytes
•Fibroblasts produce fibronectin with exons EIIIA and EIIIB , which allow the protein to adhere to proteins in the fibroblast plasma membranes and enable fibroblasts to stick to the extracellular matrix. •Hepatocytes produce fibronectin without EIIIA and EIIIB through a process called exon skipping. The resulting protein circulates in the serum and is important during the formation of blood clots.
siRNAs and miRNAs cause degradation and block translation of specific mRNAs, respectively
•If the short ssRNA in the RISC complex base-‐‐pairs extensively with the target mRNA, the Argonaute protein, a homolog of RNase H that degrades the RNA of an RNA-DNA hybrid, cuts the phosphodiester bond of the target mRNA across from nucleotides 10 & 11 of the short siRNA. •This process, termed RNA interference (RNAi), is an ancient cellular defense against certain viruses especially RNA viruses that produce dsRNA intermediates. •Extensive mispairing between a miRNA and target mRNA leads to translational inhibition by the miRNA and associated proteins. Efficient inhibition often requires the binding of 2 or more miRNAs to distinct complementary regions in the target mRNA 3' UTR -‐‐combinatorial regulation of target mRNA translation by separately regulating the transcription of 2 or more different pri-miRNAs. Only then can you get the blocking of translation. •Numerous (60%) human genes are regulated by ~1000 or so miRNAs, many of which are expressed only in specific cell types or at given developmental stages. Regulation is key for many biological processes.
AU-rich element (ARE)-mediated mRNA decay (AMD
•Most mRNAs with very short half-life contain AREs such as AUUUA within their 3'UTR. •Rapid degradation of these mRNAs requires the RNA-binding protein TTP/BRF. Upon binding to ARE, TTP/BRF binds and activates a deadenylase complex to cause deadenylation of the mRNA. TTP/BRF further interacts with the exosome responsible for 3'-5' decay, and with the Dcp1/2 decapping complex and the Xrn1 exonuclease responsible for 5'-3' decay. Interaction with the RISC complex may also contribute to AMD. •Impaired AMD results in cancer initiation and progression and is linked to inflammatory bowel disease such as Crohn's disease and inflammatory arthritis.
Detection of alternative splicing by Northern blotting
•Northern blotting can be used to detect specific RNAs in complex mixtures. •Southern blotting detects specific DNA fragments in complex mixture of genomic DNA. •Western blotting (immunoblotting) detects specific proteins with antibodies in a complex mix of proteins •Blotting techniques allow you to detect a specific target among a mixture of complicated samples. Suppose you have a mixture of RNA. In Northern blotting. 1st step: run gel electrophoresis to separate the RNA molecules based on their different size. Smallest runs faster, bigger runs slower. 2nd. Make replica of RNA molecules by transferring them to a surface of nitrocellulose membrane. Get the RNA molecules exposed on the surface. You do this high tech transfer tank, composed of some liquid solution with a flat surface. Put gel in here and put nitrocellulose membrane on top of the gel, making sure to not trap air bubbles between the two. Put a stack of filter papers on top. Place brick on top and wait. See RNA gets transferred on to the surface of the nitrocellulose membrane. 3rd. Make a probe. Using probe to detect the RNA target that you want to see.
Nonsense-mediated decay (NMD) (don't memorize names for proteins, just know concept)
•P-bodies are sites where aberrant mRNAs harboring premature termination (nonsense) codons (PTC) are degraded by an RNA surveillance pathway called NMD. •The presence of PTC prevents the displacement of an exon-junction complex (EJC), of which UPF2 and UPF3 are key subunits, from the mRNA by the 1st, pioneer ribosome. •Recruitment of PTC-bearing mRNAs to P-bodies requires UPF1, an ATPase/RNA helicase. Entry into the P-body is not sufficient for mRNA decay; also required is ATP hydrolysis by UPF1 and ill-defined steps mediated by UPF2 and UPF3. •Some normal mRNAs are also present in P-bodies, where they may be stored as translationally repressed mRNAs. Unlike the PTC mRNAs, normal mRNAs can exit P-bodies to be translated, a process that requires UPF1-mediated ATP hydrolysis.
Why do we need those 5 snRNA molecules?
•RNA molecules play key roles in directing the alignment of splice sites (e.g. U1 and U2 base pairing with the pre-mRNA) and in carrying out the catalysis (a U2/U6 catalytic center).
Transcription initiation by RNAP holoenzyme
•Rather than generating a fork, RNAP makes a "bubble" of approx 31nt •Induces (+) super coils downstream and (-) supercoils upstream. Needs topoisomerase to unwind. •No primer requirement •Highly processive •More error prone than replication polymerases 1. holoenzyme sliding and scanning for promoter 2. Finds promoter. Forms closed complex on promoter a. rNTPS go in. PPi released. Sigma separates from the core once phosphodiester bonds are formed 3. Open complex, initiation begins.
mRNA lifespan affects pattern of protein synthesis
•The concentration of an mRNA is a function of both its rate of synthesis and its rate of degradation. • Most bacterial mRNAs are unstable, decaying rapidly with a typical half-life of only a few minutes. • Most mRNAs of higher eukaryotes have half-lives of many hours, and thus synthesis of the encoded proteins can persist long after transcription of the genes is terminated. • However, mRNAs encoding certain signaling molecules (e.g. cytokines) and early response regulators (e.g. MYC, FOS and JUN) in higher eukaryotes have relatively short half-lives (30 min or less).
Activation of σ54-containing RNA polymerase at glnA promoter by NtrC
Situation where distance becomes a problem because unlike lactose situation, this promotor region which is controlling glutamine synthesis has an activator sequence that is 140-110 bases away. Not actually called an activator, but an enhancer. It was devised to describe sequences in humans and vertebrates. •The glnA gene encodes glutamine synthetase, which synthesizes glutamine from glutamic acid and ammonia. • The σ54-containing RNA polymerase binds to the glnA promoter, forming a closed complex, before being activated. • In response to low levels of glutamine, a protein kinase called NtrB becomes active and phosphorylates (by use of ATP) dimeric NtrC, which then binds to two sequence elements (called enhancer) located at -108 and -140 away from the polymerase. • The bound phosphorylated NtrC dimers interact with the bound σ54-polymerase, causing the intervening DNA to form a loop. (loop helps sigma-54 get to the far away NtrC). • Phosphorylated activated NtrC can bind, and binds to distal side and at the same time it has surfaces that will touch RNA polymerase in a way that now allows weak holoenzyme to start transcription. Does it in 2 ways: conformational change and there is an ATP hydrolytic activity that is being provided by NtrC which also helps push the polymerase forward because putting energy in. The ATPase activity of NtrC then stimulates the σ54-polymerase to unwind the DNA strands at the start site, forming an open complex. Transcription of the glnA gene can then begin.
SP1 Binds to DNA via Three Zinc-‐‐Finger Domains
Solved crystal structure. Binds to DNA in an interesting way through what is called zinc-fingers.
Biochemical purification and molecular cloning of Human Transcription Factor Sp1, a Potent Activator
Sp1 recognizes specific GC-box DNA elements via Zn-fingers: A classic DNA binding motif First human protein. Happens to bind to these sequences that are very GC rich.
Why should you care about genes, genomics and gene regulation?
Stem cell biology and regenerative medicine relies on this process. Understanding how you manipulate gene expression and transcription factors to drive a cell into one differentiate state vs another or maintain its proteopotent stem cell state (cell that can regenerate itself and create every cell type in the body).
Transcription Factors Govern Cell-type Identity, Function, Differentiation and Development
Stem cells are able to self-renew and differentiate into other types of cells. Every cell in an organism carries the same set of genes but only 10-30% are expressed in a given cell-type Specific combinations of TFs switch on select sets of genes to generate a huge diversity of cell-types Each cell-type has a unique gene expression program that controls its function and identity
Splicing reaction proceeds in 2 steps
Step 1. Cleavage at the 5' splice site and joining of the 5' end of the intron to the branch point A within the intron, producing a lariat-like intermediate. Step 2. Cleavage at the 3' splice site and simultaneous ligation of the exons, resulting in excision of the intron as a lariat-like structure. Involve two transesterification reactions: (1) The 5' P of the intron is attacked by the 2'-OH of the branch site Adenosine, causing cleavage of a 3', 5'-phosphodiester bond and formation of a 2, 5'-phosphodiester bond (not hydrolysis followed by ligation). (2) The newly formed 3'-OH of exon 1 attacks the 5' P of exon 2, causing cleavage of a phosphodiester bond and formation of a new bond. Two transesterification reactions: the number of phosphodiester bonds remains unchanged in either reaction. first between the 2'OH of the branch point adenosine •second inline attack from upstream exon to phosphate of downstream exon Doesn't require energy, but ATP is used to bring the 5' and 3' splice site, branch point together. Have to be in a proper location.
Order of events for RNA Pol II
Still don't fully understand all the parts that are working nor understand the order of events. Could be that order of assembly process is dependent on what gene or cell type they are operating at. Enzyme involved in assembly for mRNA synthesis is RNA Pol II. It has to be recruited to start site of transcription, which is the core promotor. 3 elements: core promotor (sequences directly flanking the +1 start site. Assembly of general transcription machinery), proximal promotor (Sequences close to +1 that regulate expression. Position dependent), and distal enhancer (Sequences that can regulate a gene and are position independent). In vitro we can reconstitute. Great power of tearing the cell apart and isolating all components and testing it, is that you can change the order that you want to add back to reaction and ask which order works and which doesn't. By doing the stepwise assembly of pre-initiation complex (on site holoenzyme), we figured out that certain proteins have to get there first. One of the earliest proteins, was the TATA binding. TATA binding protein has a whole bunch of friends that it has to bring with it, TATA binding associate factors. Then there is important protein, TFIIB, that actually is intimately interacting with RNA Pol. After TFIIB and 2 other factors E and F come in, then RNA Pol has to be brought in. Now form closed complex.
Transcriptional elongation: Movement of transcription bubble (17-bp, 1.6 turns of B- DNA duplex)
Supercoiling of DNA during transcription requires topoisomerases to relieve stress and unwind the DNA
HiC-seq results
TAD: topologically associating domain Graph showing interaction frequency
How Initiation of Transcription Work
TATA-box DNA sequence is recognized and bound by TBP TATA-binding protein (TBP) Binds core promotor sequence TATA (part of common sequence). It is one the beta characterized core promotor sequence for especially the human system in vitro biochemical studies. It is only in a small % of human genes. 10-20%. Many other protein genes do not have this particular sequence. The Pol has no sequence specificity, but there have to be factors within that complex that recognize specific sequences in the DNA that then bring the polymerase and position it. The DNA sequence has to determine the start site of transcription. There are sequences near the start site that we call core promotor sequences. The TATA box is one of them. It has a conserved set of nucleotides that proteins like the TBP are able to recognize. They bind with high affinity, specifically to these sites. That is a nucleating point, where now all the things bind to TBP and eventually the Pol is recruited there. Now start to see the collaboration of these proteins. TATA binding protein is recognized by a whole bunch of other proteins, 12 -15 different subunits that form this complex. TBP+12-15 TAF subunits= TFIID TAF=TBP associated factors
Transcription pre-initiation complex (PIC)
TFIIA + TFIIB + TFIID TFIIE + TFIIF + TFIIH + RNA Pol II An ensemble of Multi-subunit transcription factors and co-factors. When the RNA Pol starts to read the transcript, a lot of this other stuff falls off (like sigma), but along the way it picks up a whole bunch of other things that's processing the RNA as it is transcribing it. Transcription can be activated or repressed by proximal and distal promoter factors act, sometimes acting through co-factors (mediator) This is what is needed for basal transcription, just to transcribe genes in some level, but there are many factors (some cofactors and some gene specific activators) that work by binding a specific sequences in some cases close to the core promotor sequence (where the general machinery assembles) but others very far away, but through these cofactors are able to contribute to both the assembly of the PIC and the clearance of the promotor of the pol. There are many layers of regulation that feed on top of what is the general basic machine.
Rho-independent prokaryotic transcription termination
The core RNA pol pauses after synthesizing a hairpin. If the hairpin is a real terminator, RNA will dissociate from the DNA strand because the dA-rU pairing is unstable. Once the RNA is gone, DNA duplex reforms and the core RNA pol is released. •GC stem loop forms just upstream of U-rich sequence •Steric hinderance of the stem loop and free energy gained from basepairing pulls U-rich sequenc out of active site
Chromosome Conformation Capture (3C)
The detection of long-range interactions. Technique to see all the loops forming in a particular cell type. 1. Crosslinking of interacting Loci. Artificially hold them together. 2. Fragmentation 3. Ligation of both strand. Forming loop 4. DNA Purification 5. Analyze DNA. High-throughput sequencing (HiC-Seq). Tells you where protein is. Can do this through the entire genome.
What gives promotor specificity to RNA Polymerase?
The dissociable sigma subunit gives promoter specificity to prokaryotic RNA polymerase (RNAP) Core enzyme + sigma <--> holoenzyme Core enzyme: Reading genomic information and transcribing into RNA (mRNA, tRNA, rRNA, etc...) •Similar general mechanism and requirements as polymerases we've talked about so far •Template •NTPs •Mg2+ •Not sequence specific
The Eukaryotic transcription cycle: initiation centric view
The initiation process is recovered every time that a new mRNA has to be made. The transcription initiation of mRNA is very critical process for gene expression regulation. At the end of the day, the cell has to respond to the needs of the cell at a particular time. Within the organism cells have to be able to define themselves as being one type of tissue or the other during development and that involves in some cases, very dramatic reprogramming of the gene expression process. Many ways in how you can control the amount of a particular protein in the cell, but arguably the transcription of this mRNA is a major one. The transcription initiation process, where the polymerase is recruited and starts the process of mRNA synthesis is key. This is very important for the cell to have a system that has building the flexibility to be able to allow control at many different scales. Some genes in eukaryotic cells are being produced almost all the time, called housekeeping genes. Others, are able to be regulated in a firm fashion, fine-tuned to different degrees. Then there are genes that have to be very dramatically regulated, call them binary, where either you don't produce them or produce them in very large amounts (those are particularly important for transition development. RNA polymerase requires many factors just to generate basal transcription levels. Just to start transcription, meaning localizing the polymerase to the right place. The polymerase doesn't know where the gene starts or ends, so you need machinery that positions polymerase in the right place to start. The other is the fact that the polymerase does not have the capacity to open the very stable duplex DNA. Energy has to be inputted to be able to separate the strands and then allow the enzyme to start copying the transcribed strand. For just these two things: position the polymerase and opening the duplex and feeding the strand into he active site of the polymerase, a number of additional factors are required. They are called transcription factors. TFII because work with polymerase 2. You need factors D, A, V, F, E and H in order to just be able to do these 2 functions. When all of these come together and add in a sequential fashion, to from transcription pre-initiation complex (PIC). This is a very large complex. This initially engages the duplex DNA and what is engaged to the DNA is referred to as the closed complex. Then through the action of TFIIH, which contains ATPase activity, the DNA is going to be open, generating a transcription bubble, because of the shape of the DNA here. That will allow for the Polymerase active site to have access to the transcribed strand that it needs to copy into mRNA. A very small transcript, about 6 bases would be generated. But then the polymerase can not keep transcribing until it leaves the promotor (region of the DNA where these factors assemble). The polymerase has to leave the promotor and general transcription factors behind. In order to do that one of the requirements id that TFIIH, in addition to ATPase activity, has a kinase. TFIIH has to phosphorylate the CTD of the polymerase. For the polymerase to enter into the PIC has to be hypophosohorylated (no phosphorylation in the CTD). In order to leave the core promotor and general factors, it has to be hyperphosphorylated. Then it goes on its way. It requires additional phosphorylation processes. Then there are factors that allow it to terminate and then the polymerase needs to be recycled. The recycling of the polymerase requires the dephosphorylation of CTD by phosphates in order to be able to reenter. Key: only the hypophosohorylated RNA pol 2 enters the PIC. This multisubunit complex TFIIH, which has both ATPase activity to open the DNA, and kinase activity has to open the DNA and phosphorylate the CTD of the pol. At a certain point the polymerase has to let go of the general transcription factors, it is though that E and H are the last ones to be released. It enters the elongation factor and enters the elongation process.
Control elements that regulate gene expression in eukaryotes
Doing that over and over again with many different cell types, different genes, you come out with a picture about what a control region looks like. Here you have a situation where you have a start site (where core promotor is), but you have elements. Red lines: promotor-proximal elements: little bits of fragments of sequence (some short 5-10 nucleotides long) that are important and sit near promotor. But then you have enhancer sitting downstream, upstream, 50 kb away. This is kind of like your typical promotor. Might take years to map this. Fortunately now can be done more quickly. You have all kinds of other elements. CpG islands: places where DNA can be methylated. When you compare the control region of a gene in yeast (eukaryote) to your typical mammalian gene, the set of elements are much more simple. Rule rather than the exception. Very infrequent that you are going to find an enhancer that is very far away from the start site of transcription. Usually they are within a few hundred bp. In mammalian they will be tens of thousands bp away. Even amongst eukaryotes, you have a whole range of complexity in the cis-regulatory elements. (a) Genes of multicellular organisms contain both promoter-proximal elements and enhancers (collectively referred to as cis-acting control elements) in addition to core promoter element(s). cis-acting control elements are bound by trans-acting transcription factors. (b) Enhancers can function from a far distance. Long distance interactions are achieved by forming looped DNA. Other elements- Exons and introns: regions of the transcript that have to be spliced out to form mature mRNA.
How come during evolution, we have lost self-splicing introns? Spliceosome-catalyzed splicing
During evolution, there has been a transfer of catalytic power from the intron itself to other molecules such as U snRNPs, which are specialized in carrying out splicing reactions. In higher eukaryotes, we have specialized machinery- spliceosome (the 5 U snRNPs and the 50 non-snRNPs) function in an organized manner to remove all the introns. Introns nowadays are variable in size because they are no longer self-splicing and this increases the capacity of regulation. Despite that, if you compare sequence between U2 and U6 snRNA with the group 2 self-splicing introns, you notice similarities. The folding and overall sequence is very similar. Tells you that very likely the U2/U6 which form catalytic center may be traced back to Group2 self splicing. May be the ancestor.
Subunit composition of eukaryotic RNA polymerases
E.coli RNA polymerase is made up of 2 very large subunits ( beta and beta'), as well as 2 copies of the alpha subunit (alpha and omega.). Structure of eukaryotic polymerases is more complicated, have more subunits, but the core of the polymerase is remarkably similar. The core of all these polymerases (1, 2, 3) is very similar to that bacterial bacteria polymerase (which is the functional core of it that includes the active site, where the transcribed strand is going to be read and where the synthesis of mRNA occur). There are equivalents to all these subunits in all these systems. There are subunits 1 and 2 in the polymerase, called RBP1 and RBP2, these are the two largest subunits that make the core of the structure and basically they are the equivalent and are highly related structurally to the Beta subunits. There are also copies that are equivalent to the 2 alpha subunits in the polymerase, as well as to the omega. In addition, there are many others. Depending on the particular system, there are between 4 and 7 unique subunits to the eukaryotic system. 4 of those are equivalent for all 3 eukaryotic polymerases. Have very specific subunits. The smallest of the polymerases is the human one, which has total of 12 subunits. Pol 1 and pol 3 systems have additional subunits. Those systems have incorporated general transcription factors that in the pol 2systems come on and off in a more stable fashion. There are functional equivalents between these excess subunits in the Pol 1, 2, 3 systems with respect to the general transcription factors that work together with the RNA polymerase. Very important for the function of RNA polymerase, especially to get them to cycle through the process of initiation, elongation and termination of the transcription process is the presence of a long C terminal domain in RBP1 that is the substrate for systematic phosphorylation and dephosphorylation that couple to that functional site.
What is the reason behind the selection? What is the mechanism that regulates splicing?
Exonic splicing enhancers (ESEs) and SR proteins contribute to exon definition and regulate alternative splicing It boils down to how you define a sequence as an exon. If you can properly define the EIIIB as exon, it will be maintained as exon. If you can't, it will be removed as intron by default. have to look at how exactly you can define an exon. In addition to those key sequence elements surrounding the exon, exon definition has to rely on ESE sequences. The correct 5' GU and 3' AG splice sites are recognized by splicing factors on the basis of their proximity to exons. The exons contain exonic splicing enhancers (ESEs) that are binding sites for SR proteins. The SR proteins will recognize the ESEs. When bound to ESEs, the SR proteins interact with one another and promote the cooperative binding of the U1 snRNP to the 5' splice site of the downstream intron, the 65- and 35-kD subunits of U2AF to the pyrimidine-rich region and 3' AG splice site of the upstream intron. The bound U2AF also helps recruit U2 snRNP to the branch point. The resulting RNA-protein cross-exon recognition complex spans an exon and activates the correct splice sites for RNA splicing and this exon is retained in the final spliced mRNA. When you form this complex, this exon is properly defined as an exon. It will be retained in final mature mRNA. If U1 snRNP and/or U2AF are not recruited to the splice sites on each side of an exon (no formation of a stable cross-exon recognition complex), this exon will not be recognized and, instead, will be excised as part of the intron.
Discovering the First Eukaryotic Gene Specific Transcription Factor
Finding the all important factors that are recognizing and really instructing the RNA pol complex to go to the right place. RNA pol can't find these important regions without the help of sequence specific factors. Historically easier to do these experiments using viruses. This was pre-DNA sequencing, so didn't have sequences for many things. Using experiments mentioned before- hypersensitive sites or foot printing, reveled the important elements including the TATA box and upstream sequences that bind enhancer proteins. Once we knew that, we could use various tricks, like DNA foot printing, and other biochemical tools to purify and identify the proteins that are responsible for recognizing these elements.
TFs and Their Target Binding Sites Play Key Roles in Diseases
Gene regulation intersects with disease at many different points. Have now developed strategies to manipulate or intervene with the transcriptional process or the gene expression process at any step along the way to try to mitigate various disease states. Genome Analysis Reveals That Many Disease Causing Mutations Occur in Regulatory DNA Sequences Diseases Ensue When Too Much, Too little or an Altered Version of a Key Gene is Expressed. Mutations are in molecular sign posts, makes it difficult because they are hard to recognize.
Eva Nogales Lab work
Interested in making sense of this biochemical data to understand the details of how these thing assembles and how it allows the Pol to find the transcription site or TFIIH to be able to open the duplex DNA. When people pursue structures of proteins, they typically crystallize. The idea is to do x-ray defraction out of crystals and from there deduce the structure and position of all the atoms and get the chemical and physical shape of your protein. That requires crystallization trials, which requires production of large amounts of proteins. Typically, you would use a system like E.coli as a factory. When you introduce the gene that you want, you make it produce it in very large amounts. When it is a single protein, it is easy to do. More proteins become more difficult. This works very well when you are purifying things like ribosomes. We have studied this system in a simplified way. Instead of using TFIID, we use TBP, but work on the TATA promotor. You need the rest of the caps to bind to all the core promotor sequences, but TBP is sufficient to start initiation in the context of a TATA box. We then have this core promotor that is bound to metallic coated beads. This is a way of doing purification without using a color because we have very small materials. In a test tube, with a very small volume, we have DNA and add the factors that we want, let them bind to the DNA, then we wash the beads to get rid of anything that didn't bind. We then recover the protein that is bound to the DNA by catching the DNA with a restriction enzyme. We build this complex one piece at a time. Seeing where each one goes. We are getting structures as we go along and identify where it goes and how it affects the DNA, etc. We can get structures with duplex DNA. We can get structures with transcription bubble. In this case we cheated by creating a bubble with a mismatch. This allows us to get a mimic of the structure in which the complex has engaged the open duplex. Normally that would happen with the ATPase activity of TFIIH If we compare these 2 structures and just concentrate on the DNA. This part is what is being bound by TBP which is in red and by TFIIB, which is in blue. This doesn't change because it is binding to the DNA very tightly. This part of the DNA is what moves in and forms the bubble. We can even generate other states because it would generate a lot of Beta transcription bubbles. Complementary mRNA and the complex bound to it would generate ITC (initially transcribing complex), which is where the pol will be just before Because we were able to get it in different states, we can see what are the motions that both the proteins and the DNA undergo through these initial steps of getting the duplex and opening the DNA. As it walks into the DNA it basically pushes the DNA and the DNA is being distorted and opening the strand. TFIID is very highly mobile. It is able to move with respect to the rest of the protein.
Consensus sequences around 5' and 3' splice sites in vertebrate pre-mRNAs
Introns need to be removed to turn pre-mRNA into mature mRNA. A cell will never allow a pre-mRNA to be exported into the cytoplasm until those introns are removed. The sequence information that accounts for the cellular splicing machinery, the boundary between an exon and an intron are already in the coding sequence. Highly conserved GU at 5' splice site and AG at 3' splice site. In addition to this, there are other recognition sites that tell splicing machinery to remove the intron, such as the branch point A. Not any random A, needs to be in context of other nucleotides. A also has to be in the proper distance from the 3' splice site, about 20-50 bases. The sequence between the branch point and the 3' splice site has to be pyrimidine rich (C or Us). The central region of the intron, which may range from 40 bases-500 kilobases in length, generally is unnecessary for splicing to occur. Some of these big introns can have important information in them. It is important to have these conserved sequences to define the boundary intron and exon, to properly allow splicing machinery to remove correctly the intron sequence.
EMSA (electrophoretic mobility shift assay): Shift and Supershift for studying protein-DNA interaction in vitro
Let's say you have a test tube of liver extracts. Have don't hypersensitive site and have DNA fragment that is important that drives gene expression in the liver cell. Now want to know what is the mysterious protein that binds to that element that drives it. Take DNA fragment, which you have labeled with radioactive or florescent and has putative hypersensitive DNA binding sequence. If you mix test tube with DNA and the test tube containing total crude extract of proteins, and let mixture sit for a while. Then separate everything out on electrophoretic gel. If there is a protein in the liver extract that recognizes the sequence, it will bind to it with preference over other sequences. When this DNA -protein complex forms, it slows down the migration of the DNA. In the absence of the protein, the DNA runs fast. With the protein, it shifts it. You can do a couple of tricks. If you want to test if this is special, you can add DNA that is unlabeled. Can either add unlabeled DNA that is identical to this DNA or completely random sequence. If you do random sequence and do reaction with label, it shouldn't have any effect. If add the specific sequence, unlabeled, it will block . When we do this experiment, we throw in a bunch of non specific DNA to make sure that whatever sequence we do see is the real sequence. Testing sequence specificity. To be doubly sure that the protein is the one doing it, you can take an antibody that recognizes the particular protein and add to the reaction. The antibody will bind with the protein and the protein binds to the DNA. Bigger complex, so will move even slower. This experiment tells you what fragment, what controls you can use, and can figure out identity of the protein.
Cas9 is an RNA-guided DNA endonuclease
Long guide RNA that is binding to a nuclease. Single polypeptide that together with guide RNA will target the chromosome. Has an active site that will cleave the DNA. Sequence that is part of the guide is called a protospacer, part of the guide that directs it to the chromosome. This protospacer associated module is a little piece of DNA in the chromosome that you want to attack. It has to be interacting with the protein to give it enough binding energy to open up the duplex.
RNA-seq analysis
Looking at cell type specific genes. What RNAs are being made. 1. mRNA isolation 2. Illumina Sequencing 3. Align sequences against genome 4. generate sequence counts for all genes in genome
Why is acetylation of lysine rich tails of histones capable of loosing things up?
Lysines are very positively charged, and interact with negative charged phosphates of DNA, and everything is tight. Repress transcription by blocking access of factors to the DNA template. When you acetylate the lysine, basically neutralize the charge and the interaction is much weaker, so can loosen up structure. Fewer interactions going on to hold everything in place. Allows binding of transcription factors.
RNA-seq
Method to detect all mRNAs in a given population of cells 1. input RNA 2. Fragmentation 3. convert cDNA and add sequencing adapters to form DNA library
How do the right genes get expressed in the right cells and at the right times?
Most elusive class of proteins, which are the ones that are doing the directing. The sequence specific transcription factors. Designed to read DNA and find little regions of control sequences, so called cis-regulatory elements (very short 6-8 nucleotide). They have to know that they need to be there and then signal other proteins like remodeling complex or RNA pol or other things to come to that site and start the process. In some ways they are reminiscent of sigma (finds promotor). Hunting for these factors turned out to be a problem, because they are made in very small quantities. They are key because it became very evident that if you have 25,000 genes that need to be expressed at the right place and right time, there have to be a fair number of these factors that are doing the control of gene expression. Back in the 80s, only had 2 ways to get hands on these proteins. 1. Vitro biochemistry. Take cells of a particular type, grind them up, and do the DNA foot printing assay and ask for a particular gene involved in the function if you can find a footprint. If you can then try to purify that protein. In vitro because you kill the cell, ground everything up, and now separating them out on basis of physical and chemical properties. 2. Vivo Genetics: if you make mutations and if a mutation has a consequence phenotype that the gene is now turned off or turned on more than it should, you know that some protein product is probably involved in controlling that gene. Best was using both in combination
Editing after the cut
Once you have that you can edit at cut site. Can make a knockout site specifically. Can add along with the recombinant virus RNP a piece of target DNA that can edit through homologous recombination and change the structure. You can create large deletions by making 2 cut sites, 2 different Cas9. Can insert or precisely edit the gene.
A typical complex regulatory arrangement of looped distal control elements called enhancers
Other consequence of having long distance enhancers. How can these sequences and proteins bound to these sequences communicate with the machinery since they are so far apart? Looping issue solves part of this problem. Here are 3 enhancers and a promotor and a gene that have to form a protein-nucleic acid complex, where these elements are all brought into proximity to each other in the 3D space of the nucleus. Still wondering how this happens, how long does it last.
Identification of promoter-‐‐proximal cis-‐‐acting control elements upstream of a eukaryotic gene
Part 1: Now let's say there are multiple promotors going different ways, and you are trying to figure out which one is regulating which. What you can do is take this fragment of DNA. Get rid of everything in between and hook that up to whatever promotor you want to test. And then ask does it actually regulate that promotor. What you can do by making a recombinant model. Can be a fragment of DNA that you picked up from any one of these techniques, and you now want to know does anything in that fragment actually control a particular gene promotor. Is there a promotor in there somewhere that is going to control the expression of the gene. In our case, since we already know where the start site is, you can just hook this fragment up to a reporter gene (not the actual gene you are going to be studying, you are just trying to figure out if this fragment has promotor or enhancer activity). Put it into a plasmid (like a mini chromosome) so you don't have to deal with this problem in the context of the entire mega size chromosome. You can use various tricks to take a fragment: you can just start chopping it away from either 5' or 3' end, call it promotor or enhancer bashing. Then hook it up to plasmid that has a reporter gene that will usually give you a florescent protein or some kind of signal that you can measure quickly. Then you have generated a series of fragments and can ask do some fragments activate the gene and others do not. •Make a reporter (in purple. GFP, Luciferase, LacZ, selectable marker...) in a plasmid with no or little promoter activity •Clone your region of interest into the vector where a promoter should be •Measure reporter gene expression of full-length sequence •Make truncations/deletions to the putative promoter and compare to the full length Part2: Trick is you want to put that back into the cell you care about. Have to understand which cell type you are looking for and studying and which gene is of interest. Now you introduce collection of plasmids with different fragments of your enhancer promotor region. Can introduce into appropriate cell type. Then reporter gene is usually an enzyme that you can easily quantitate and measure. Sometimes put a fluorescent thing on it so you can look at the amount of light it produces. This will measure the activity of the reporter. Multiple ways to measure this. One way is to measure the amount of RNA that is being produced. Other way is to look at the amount of protein that will come from that RNA. We use both ways. Then will notice that of these plasmids, the longest one is active and then eventually smaller ones will not be active because you have chopped away the important region. Important control: repeat the experiment and put it into a different cell type and it shouldn't have that activity. Unless you picked the gene that happens to be expressed in both cell types because will get activity in both. •Make a reporter (in purple. GFP, Luciferase, LacZ, selectable marker...) in a plasmid with a known promoter sequence •Co-transfect with a plasmid encoding your transcription factor (or stimulate endogenous transcription factor) •Measure reporter
Transcriptional Control in Multiple Cell Types : All contain same DNA but express different genes and establish distinct Identities
Power of embryonic stem cells is that they become any cell type in the entire organism. If you understood what the coding info was, start with stem cell and create any cell you want. We could generate new tissues, new organs at will. But all predicated in our ability to understand what the coding info is that specifies a particular type of cell. How different they look and how different they function between a fat cell, liver cell, motor neuron, and skeletal muscle. Stem cells can make many copies of itself through process called self-renewal, essentially DNA replication. And to reproduce itself over and over again, but at some point it has to differentiate its information, its network of gene expression and becomes fat cells, liver cells, motor neurons, or muscle.
Phosphorylation states of Pol II during transcription cycle
THIS CTD has to be dephosphorylated to join the core promotor, but that TFIIH will phosphorylate the CTD and does it at serine5. CTD is made up repeats, heptapeptide. Number of repeats depends on organism. Within this, this serine 5 is phosphorylated by TFIIH, and needs to be hyperphosphorylated to be able to start the transcription process and leave initiation site. Most polymerases, need another phosphorylation step after they leave the promotor, carried out by P-TEFb complex. This occurs at another serine (serine2 within heptapeptide). This is now the polymerase that is able to leave the Pol site and move on. Typically, all these phosphorylation modification and phosphorylation is a typical one. They generate a mechanical environment that allows specific types of protein-protein interactions. See this as a kind of line that the polymerase extends that can be modified to interact with different factors. Having a set of steps of different phosphorylation gives to rise to different chemical compositions for the CTD tail that allows it to interact with different factors that promote escape from the promotor or release of the pause to move into a fully elongated. Following PIC assembly, the Pol will be recruited and the DNA will be open and the Pol will have to clear the promotor, so separate from the rest of the general transcription factors, but then they pause when the mRNA is a small. A number of factors are recruited and among them this complex that is required for transition to elongation through the release of the paused state that give rise to productive elongation with most mRNA is produced. The pause and many steps gives many checkpoints and regulates Then there has to be a recycling process, after Pol terminates and leaves DNA. All of the phosphorylation marks have to be removed by phosphatases to call back and be able to recycle to a new PIC.
Single Particle Cryo-EM
Technique where you don't need to crystalize. It doesn't matter how large the complexes are. You can use very small amounts of sample. Because of that we can study the fully assembled complex without having to split it into more manageable pieces. Can study in physiological conditions because not crystallizing. Can study different functional states We purify the complex and then we have our solution of our protein and we put it into a EM grid. We put a couple of microliters and then we blot it into a very thin layer where the buffer is just about covering the molecule. We then freeze it very quickly so that the water doesn't have time to crystallize. The water is instead in a vitreous state, meaning it does not interfere with the normal structure of the protein. We put it into an electro microscope. We want to obtain a 3D structure, but what we get is a projection. We use transmission electron microscopy, you get a 2D projection, meaning it is not a surface and it is not a slice. It is an integral. It is a summation of the density in the direction of the electron particle. The pictures are not very clear because we have to take the pictures in very low amount of electrons or else they can fry the protein. We take many images of the object in order to get a clearer picture. Computationally combine all these pictures. Identify images that are the same and then align them and add them up. When you do that, you boost the signal. You do this for many different views of the object and find the relative orientation and go through a process called 3D reconstruction.
What is the attenuator?
The attenuator is a Rho-independent transcription terminator! Attenuation is mediated by tight coupling of transcription and translation •The ribosome translating the trp leader mRNA follows closely behind the RNA polymerase that is transcribing the DNA template. •Alternative conformation adopted by the leader mRNA. In high tryptophan concentration, when the cell doesn't really need a lot of the transcript. Doesn't need to make more trp. These trp codons, the ribosome can go through quickly and be on codon 2, allowing a hairpin to form. This structure, when 2 is blocked, with the UUU sequence can interact with RNA Polymerase and tell it to stop transcription. You get a short RNA. In low trp, and want some of this transcript to be made to make more trp, the ribosome stalls waiting for the concentration of the charged tryptophanyl-tRNA. Stalls at tandem Trp codons. Slows down translation just enough for the 2:3 antiterminator hairpin to form. The 2:3 pair is not an attenuator and is more stable than the 3:4 pair. This doesn't have this structure to interact with polymerase to cause it to terminate.