proteins

Ace your homework & exams now with Quizwiz!

Targeting to the lumen of the rough endoplasmic reticulum

Signal sequence for import into the ER binds to signal recognition particle which stalls translation and directs attachment to the rough ER Signal sequences are usually N-terminal and generally share a common architecture - Short positively charged N-terminal region (n-region) - Central hydrophobic region (h-region) - Slightly polar C-terminal region (c-region)

Protein folding and targeting

Some essential concepts Protein folding in vitro Mechanisms of protein folding Molecular chaperones and problems of in vivo folding/assembly Protein folding and human disease Protein targeting, general concepts Protein targeting in E. coli

Sequence similarity

% of nucleotide positions shared between any two sequences (continuous trait) • Align sequences & score for number of matches • Allow, but penalise, gaps Sequence similarity searching, typically with BLAST (units 3.3, 3.4), is the most widely used, and most reliable, strategy for characterizing newly determined sequences. Sequence similarity searches can identify "homologous" proteins or genes by detecting excess similarity - statistically significant similarity that reflects common ancestry. This unit provides an overview of the inference of homology from significant similarity, and introduces other units in this chapter that provide more details on effective strategies for identifying homologs. Sequence similarity searching to identify homologous sequences is one of the first, and most informative, steps in any analysis of newly determined sequences. Modern protein sequence databases are very comprehensive, so that more than 80% of metagenomic sequence samples typically share significant similarity with proteins in sequence databases. Widely used similarity searching programs, like BLAST programs produce accurate statistical estimates, ensuring protein sequences that share significant similarity also have similar structures. Similarity searching is effective and reliable because sequences that share significant similarity can be inferred to be homologous; they share a common ancestor. The units in this chapter present practical strategies for identifying homologous sequences in DNA and protein database once homologs have been found, more accurate alignments can be built from multiple sequence alignments (unit 3.7), which can also form the basis for more sensitive searches, phenotype prediction, and evolutionary analysis. While similarity searching is an effective and reliable strategy for identifying homologs - sequences that share a common evolutionary ancestor - most similarity searches seek to answer a much more challenging question: "Is there a related sequence with a similar function?". The inference of functional similarity from homology is more difficult, both because functional similarity is more difficult to quantify, and because the relationship between homology (structure) and function is complex. This introduction first discusses how homology is inferred from significant similarity, and how those inferences can be confirmed, and then considers strategies that connect homology to more accurate functional prediction. We infer homology when two sequences or structures share more similarity than would be expected by chance; when excess similarity is observed, the simplest explanation for that excess is that the two sequences did not arise independently, they arose from a common ancestor. Common ancestry explains excess similarity (other explanations require similar structures to arise independently); thus excess similarity implies common ancestry. However, homologous sequences do not always share significant sequence similarity; there are thousands of homologous protein alignments that are not significant, but are clearly homologous based on statistically significant structural similarity or strong sequence similarity to an intermediate sequence. Thus, when a similarity search finds a statistically significant match, we can confidently infer that the two sequences are homologous; but if no statistically significant match is found in a database, we cannot be certain that no homologs are present. Sequence similarity search tools like BLAST, FASTA, and HMMER minimize false positives (non-homologs with significant scores; Type I errors), but do not make claims about false negatives (homologs with non-significant scores; Type II errors). As is discussed below, it is often easier to detect distant homologs when searching a smaller (<100,000-500,000 entry) database than when searching the most comprehensive sequence sets (more than 10,000,000 protein entries). Likewise, when domain annotation databases like InterPro and Pfam annotate a domain on a protein, it is almost certainly there. But these databases can fail to annotate a domain that is present, because it is very distant from other known homologs.

The Chaperone Network of the Eukaryotic Cytosol

(1) Ribosome associated complex (RAC) and nascent chain associated complex (NAC) are both ribosome bound chaperones (2) Some proteins are co- or post- translationally passed onto TRiC in a reaction mediated by Hsp70 and a protein called prefoldin (PFD) (3) PFD is known to directly recognise the nascent chains of some TRiC substrates such as actin and tubulin

The bacterial OMP assembly machinery

After travelling through the periplasm and reaching the outer membrane, OMPs have to fold and insert into this membrane. The first component of the OMP assembly machinery identified was a protein known as Omp85 in N. meningitidis. Homologues of Omp85 were identified in all available Gram-negative bacterial genome sequences (Voulhoux et al., 2003; Voulhoux & Tommassen, 2004), and previous attempts to inactivate the gene in Haemophilus ducreyi and Synechocystis sp. were reported to be unsuccess- ful (Reumann et al., 1999; Thomas et al., 2001), suggesting an important function for the protein. Furthermore, the omp85 gene was found to be located in many genome sequences immediately upstream of the skp gene encoding the periplasmic OMP chaperone, suggesting that Omp85 might be involved in OMP biogenesis as well. To assess the function of Omp85, the gene was cloned under an IPTG- inducible promoter (Voulhoux et al., 2003). In the absence of IPTG, the resulting mutants stopped growing and all OMPs examined were found to accumulate as unfolded proteins as shown (amongst other characteristics) by their protease sensitivity and their lack of heat modifiability. These results demonstrated an essential role of Omp85 in OMP assembly. Non-denaturing SDS-PAGE (Voulhoux et al., 2003) and cross-linking experiments (Manning et al., 1998) indicated that Omp85 is part of a multi-subunit complex in N. meningitidis. These results were confirmed in E. coli, where the Omp85 homologue is now called BamA (Bam stands for b-barrel assembly machinery). BamA forms a complex with four lipoproteins, BamB-E (Fig. 3a) (Wu et al., 2005; Sklar et al., 2007a). Whereas Omp85/BamA homologues are present in all Gram-negative bacteria, the accessory lipo- proteins are less well conserved. For example, in the N. meningitidis Bam complex, the BamB component is lacking and this complex contains an additional component, RmpM, an OMP with a peptidoglycan-binding motif (Fig. 3b) (Volokhina et al., 2009). In the case of Caulobacter crescentus, the BamC component is absent and a different protein with a peptidoglycan-binding motif, the lipoprotein Pal, is present as an additional component (Anwari et al., 2010). In some alphaproteobacteria, both BamB and BamC appear to be absent (Gatsos et al., 2008). Also, the function of the accessory lipoproteins is less vital. In E. coli, BamD is the only essential lipoprotein component of the complex, whereas mutational loss of the other lipoproteins causes only mild OMP assembly defects (Malinverni et al., 2006; Sklar et al., 2007a). However, even in the closely related bacterium Salmonella enterica, BamD appears to be dispensable (Fardini et al., 2009). Also, in Neisseria gonorrhoeae, a viable knockout mutant in the bamD homologue, designated comL, has been described (Fussenegger et al., 1996) but the gene appears essential for viability and OMP assembly in N. meningitidis (Volokhina et al., 2009). Thus, the Bam complex in bacteria consists of one essential central component, Omp85/BamA, and a variable number of accessory components, the importance of which is variable and depends on the specific component and the bacterium being studied.

General Rules for Protein Trafficking/Sorting via Secretory Pathway

All proteins start life being made on soluble cytoplasmic ribosomes If they have the appropriate N-terminal signal sequence, it is recognised by SRP as it emerges from the ribosome and is sequestered to the ER Protein synthesis then restarts on ribosome -now membrane- bound- and nascent growing peptide chain is threaded through a translocon in ER membrane into the lumen of the ER Cleavage of signal sequence takes place by signal peptidase bound to inner face of ER Protein folding in the ER takes place assisted by ER chaperones, calnexin and calreticulin (also post translational modifications). Protein destinations by this route are external space, plasma membrane, ER, Golgi body and lysosomes

Chromophores that can be observed by CD

Amide bond • Two main absorption bands n→ p* (220 nm);p→ p* (190 nm) • Different types of secondary structure in proteins give characteristic CD spectra in the far UV.

The Sec Translocase

Although ER and bacterial proteins share the same signal sequences, early studies suggested that their translocation mechanisms are very different. Bacterial translocation requires both adenosine 5¶-triphosphate (ATP) and the mem- brane electrochemical potential, whereas ER translocation only needs nucleotides. Only short nascent preproteins can initiate trans- location into canine ER (18). Signal recog- nition particle (SRP), a complex of 7S RNA and six polypeptides, coordinates translation and translocation (Fig. 1B) through binding to emerging signal sequences and slowing chain growth (19, 20). Translocation resumes when the complex of polysome, nascent chain, and SRP reaches its ER-bound receptor (21-23). In contrast, preprotein translocation across the bacterial plasma membrane is not coupled to translation, either in vivo (24, 25) or in vitro (26). Specific proteins provide the mechanistic basis for this dichotomy; some full-length preproteins that have left the ribosome bind to the SecB chaperone and then engage the SecA protein as their membrane receptor (Fig. 2) (27). SecA is activated to bind and hydrolyze ATP for powering translocation by its associ- ations with the signal sequence and mature domain of a preprotein and with the SecYEG translocon, the membrane receptor for SecA (27, 28). By the mid-1980s, it appeared that ER and bacterial translocation were fundamen- tally different, although this distinction soon blurred. The membrane-embedded proteins needed for translocation were isolated by solubilizing membranes in detergent, fractionating the mixed micellar extracts, and assaying for proteins needed to reconstitute translocation- competent proteoliposomes upon detergent removal (29-31). In bacteria, three membrane- embedded proteins—SecY, SecE, and SecG— are tightly associated as a complex termed SecYEG (Fig. 2). Proteoliposomes bearing SecA:SecYEG will efficiently translocate pure preprotein, driven by ATP and a membrane potential (29). About 20 amino acyl residues are translocated for each ATP that is bound and hydrolyzed by SecA (32) in a cycle accom- panied by substantial SecA conformational change (33). Each SecA translocates a pre- protein through a single SecYEG (34, 35), and the translocation pathway appears, by crystallography (36) and cross-linking (37), to pass through the center of SecY. The structure of SecYEG (36) shows a narrow constriction through which a polypeptide chain may move, rather than a large opening that would leak small molecules, and has provided the first molecular model of how apolar domains may be laterally released into the lipid bilayer. Cocrystals of SecYEG with SecA and with preprotein substrate may lead to further molecular understanding of the translocation cycle. Fractionated detergent extracts of eukary- otic ER also yield a membrane-embedded het- erotrimeric complex, termed Sec61abg, with marked similarity to SecYEG (30). Proteoli- posomes bearing this Sec61 complex and the SRP receptor will translocate nascent chains initiated in the presence of SRP. The complex of polysome, nascent preprotein, and SRP binds to the SRP receptor. Upon the binding of guanosine 5¶-triphosphate (GTP) to both the SRP and its receptor (38, 39), the polysome and nascent chain are transferred to the Sec61 complex, allowing translation and trans- location to resume (Fig. 1B). Preproteins trans- locate through the Sec61 complex or SecYEG translocons in the N-to-C direction. When È20 largely apolar amino acyl residues enter the translocon, they are released sideways into the lipid bilayer. It is possible that the chaperone systems for bacterial and eukaryotic ER translocation are not really so different. SRP-mediated trans- lational arrest is not required for transloca- tion (40, 41); SRP and the ribosome (42) may be viewed as targeting chaperones. Yeast prepro-a factor and prepro-carboxypeptidase Y can translocate into the ER posttranslation- ally (43-45), and Saccharomyces cerevisiae can grow, albeit slowly, without SRP or its receptor (46, 47). Bacterial SRP and SRP re- ceptor are required for the assembly of many hydrophobic proteins into the membrane (Fig. 2) (48) but generally not for export to the periplasm or outer membrane (49). After im- port across the two envelope membranes, chloroplasts use SRP and its conserved receptor to assemble proteins into the thyla- koid membrane (Fig. 1D) (50). This SRP- mediated thylakoid insertion is clearly posttranslational, emphasizing the receptor and chaperone roles of SRP. Targeting and chaperone functions are at the core of both cotranslational and posttranslational trans-location pathways; irreversible folding or the aggregation of apolar membrane anchors can be prevented by engaging nascent polypeptides with the translocase as soon as its signal sequence emerges from the ribosome, whereas posttranslational translocation uses chaperones to target proteins and maintain translocation competence.

Role of urea and b-mercaptoethanol

Denaturing agents like urea disrupt the noncovalent bonds within the protein that stabilize its native tertiary and quaternary structure. β-mercaptoethanol reduces disulfide bonds in proteins

Anfinsen's experiment

Denature the protein using urea and b- mercaptoethanol Determine the conditions required to correctly refold the protein How is the elaborate three-dimensional structure of proteins attained? The classic work of Christian Anfinsen in the 1950s on the enzyme ribonuclease revealed the relation between the amino acid sequence of a protein and its conformation. Ribonuclease is a single polypeptide chain consisting of 124 amino acid residues cross-linked by four disulfide bonds (Figure 2.51). Anfinsen's plan was to destroy the three-dimensional structure of the enzyme and to then determine what conditions were required to restore the structure. Agents such as urea or guanidinium chloride effectively disrupt a protein's noncovalent bonds. Although the mechanism of action of these agents is not fully understood, computer simulations suggest that they replace water as the molecule solvating the protein and are then able to disrupt the van der Waals interactions stabilizing the protein structure. The disulfide bonds can be cleaved reversibly by reducing them with a reagent such as -mercaptoethanol (Figure 2.52). In the presence of a large excess of b-mercaptoethanol, the disulfides (cystines) are fully converted into sulfhydryls (cysteines). Most polypeptide chains devoid of cross-links assume a random-coil conformation in 8 M urea or 6 M guanidinium chloride. When ribonuclease was treated with b-mercaptoethanol in 8 M urea, the product was a fully reduced, randomly coiled polypeptide chain devoid of enzymatic activity. When a protein is converted into a randomly coiled peptide without its normal activity, it is said to be denatured (Figure 2.53). Anfinsen then made the critical observation that the denatured ribonu- clease, freed of urea and b-mercaptoethanol by dialysis (Section 3.1), slowly regained enzymatic activity. He perceived the significance of this chance finding: the sulfhydryl groups of the denatured enzyme became oxidized by air, and the enzyme spontaneously refolded into a catalytically active form. Detailed studies then showed that nearly all the original enzymatic activity was regained if the sulfhydryl groups were oxidized under suitable condi- tions. All the measured physical and chemical properties of the refolded enzyme were virtually identical with those of the native enzyme. These experiments showed that the information needed to specify the catalytically active structure of ribonuclease is contained in its amino acid sequence. Subsequent studies have established the generality of this central principle of biochemistry: sequence specifies conformation. The dependence of confor- mation on sequence is especially significant because of the intimate connec- tion between conformation and function. A quite different result was obtained when reduced ribonuclease was reoxidized while it was still in 8 M urea and the preparation was then dia- lyzed to remove the urea. Ribonuclease reoxidized in this way had only 1% of the enzymatic activity of the native protein. Why were the outcomes so different when reduced ribonuclease was reoxidized in the presence and absence of urea? The reason is that the wrong disulfides formed pairs in urea. There are 105 different ways of pairing eight cysteine molecules to form four disulfides; only one of these combinations is enzymatically active. The 104 wrong pairings have been picturesquely termed "scrambled" ribonuclease. Anfinsen found that scrambled ribonuclease spontaneously converted into fully active, native ribonuclease when trace amounts of b-mercaptoethanol were added to an aqueous solution of the protein (Figure 2.54). The added b-mercaptoethanol catalyzed the rearrangement of disulfide pairings until the native structure was regained in about 10 hours. This process was driven by the decrease in free energy as the scrambled conformations were converted into the stable, native conformation of the enzyme. The native disulfide pairings of ribonuclease thus contribute to the stabilization of the thermodynamically preferred structure. Similar refolding experiments have been performed on many other pro- teins. In many cases, the native structure can be generated under suitable conditions. For other proteins, however, refolding does not proceed efficiently. In these cases, the unfolded protein molecules usually become tangled up with one another to form aggregates. Inside cells, proteins called chaperones block such undesirable interactions. Additionally, it is now evident that some proteins do not assume a defined structure until they interact with molecular partners, as we will see shortly.

Interaction of substrate OMPs with BamA/Omp85

Electrophysiological experiments demonstrated that puri- fied BamA reconstituted into planar lipid bilayers forms narrow ion-conductive channels (Robert et al., 2006; Stegmeier & Andersen, 2006). The physiological significance of these channels is still unclear, but this property could be used to study the interaction of the protein with its substrate OMPs. Addition of denatured OMPs to BamA-containing planar lipid bilayers increased the conductivity of the pores, demonstrating a direct interaction between BamA and its substrates (Robert et al., 2006). Since addition of periplas- mic proteins to the bilayers had no such effect, this interaction between BamA and its substrates was specific. The specificity of the interaction between BamA and its substrates indicated the presence of a recognition signal within these substrates. Previously, a signature sequence was recognized at the C terminus of the vast majority of bacterial OMPs (Struyve ́ et al., 1991). This signature consists of a phenylalanine (or occasionally tryptophan) at the C-terminal position, a tyrosine or a hydrophobic residue at position -3 relative to the C terminus, and also hydrophobic residues at positions -5, 27 and -9 from the C terminus. Furthermore, the importance of the C-terminal Phe in vivo was demonstrated by its deletion or substitution in porin PhoE (Struyve ́ et al., 1991). Such mutations severely affected the assembly of the protein into the outer membrane. Of note, however, is that Phe was not absolutely essential: while a mutant protein deleted for the C-terminal Phe accumu- lated in periplasmic inclusion bodies when it was highly expressed (Struyve ́ et al., 1991), it was still assembled into the outer membrane when expression levels were reduced (de Cock et al., 1997). This observation could be explained if the mutation decreases but does not abrogate the recog- nition of the mutant protein by the assembly machinery resulting in its periplasmic aggregation. So, reduced expression will decrease the aggregation kinetics, thereby increasing the time span needed for the assembly machinery to deal with the suboptimal mutant protein. The hypothesis that the C-terminal Phe is part of the recognition signal for BamA was confirmed in planar lipid bilayer experiments with reconstituted BamA (Robert et al.,2006). In contrast with wild-type PhoE, the mutant protein lacking the C-terminal Phe did not stimulate the conduc- tivity of the BamA channels. However, at higher concentra- tions, it blocked the BamA channels, indicating that it can still interact with BamA but differently from the wild-type protein. The latter result indicates that either the recognition signal is not completely disrupted by the deletion or the PhoE protein contains additional signals. This is consistent with the observation that a mutant protein lacking the C- terminal Phe can still be assembled in vivo if the expression level is low (de Cock et al., 1997). The existence of a C- terminal recognition signal in PhoE was further confirmed by using synthetic peptides (Robert et al., 2006). Like the full-length PhoE, a synthetic peptide comprising its last 12 aa stimulated the conductivity of the BamA channels, while control peptides did not. Omp85/BamA was predicted to consist of two domains, an N-terminal periplasmic domain and a C-terminal domain embedded as a b-barrel into the outer membrane (Fig. 3a and b) (Voulhoux et al., 2003). The periplasmic part was predicted to consist of five repeated domains, named polypeptide transport-associated (POTRA) domains

Circular Polarization of Light

Electromagnetic radiation consists of oscillating electric and magnetic fields perpendicular to each other and the direction of propagation. Most light sources emit waves where these fields oscillate in all directions perpendicular to the propagation vector. Linear polarized light occurs when the electric field vector oscillates in only one plane. In circularly polarized light, the electric field vector rotates around the propagation axis maintaining a constant magnitude. When looked at down the axis of propagation the vector appears to trace a circle over the period of one wave frequency (one full rotation occurs in the distance equal to the wavelength). In linear polarized light the direction of the vector stays constant and the magnitude oscillates. In circularly polarized light the magnitude stays constant while the direction oscillates. As the radiation propagates the electric field vector traces out a helix. The magnetic field vector is out of phase with the electric field vector by a quarter turn. When traced together the vectors form a double helix. Light can be circularly polarized in two directions: left and right. If the vector rotates counterclockwise when the observer looks down the axis of propagation, the light is left circularly polarized (LCP). If it rotates clockwise, it is right circularly polarized (RCP). If LCP and RCP of the same amplitude, they are superimposed on one another and the resulting wave will be linearly polarized. The most widely used application of CD spectroscopy is identifying structural aspects of proteins and DNA. The peptide bonds in proteins are optically active and the ellipticity they exhibit changes based on the local conformation of the molecule. Secondary structures of proteins can be analyzed using the far-UV (190-250 nm) region of light. The ordered αα-helices, ββ-sheets, ββ-turn, and random coil conformations all have characteristic spectra. These unique spectra form the basis for protein secondary structure analysis. It should be noted that in CD only the relative fractions of residues in each conformation can be determined but not specifically where each structural feature lies in the molecule. In reporting CD data for large biomolecules it is necessary to convert the data into a normalized value that is independent of molecular length. To do this the molar ellipticity is divided by the number of residues or monomer units in the molecule. The real value in CD comes from the ability to show conformational changes in molecules. It can be used to determine how similar a wild type protein is to mutant or show the extent of denaturation with a change in temperature or chemical environment. It can also provide information about structural changes upon ligand binding. In order to interpret any of this information the spectrum of the native conformation must be determined. Some information about the tertiary structure of proteins can be determined using near-UV spectroscopy. Absorptions between 250-300 nm are due to the dipole orientation and surrounding environment of the aromatic amino acids, phenylalanine, tyrosine, and tryptophan, and cysteine residues which can form disulfide bonds. Near-UV techniques can also be used to provide structural information about the binding of prosthetic groups in proteins. Metal containing proteins can be studied by visible CD spectroscopy. Visible CD light excites the d-d transitions of metals in chiral environments. Free ions in solution will not absorb CD light so the pH dependence of the metal binding and the stoichiometry can be determined. Vibrational CD (VCD) spectroscopy uses IR light to determine 3D structures of short peptides, nucleic acids, and carbohydrates. VCD has been used to show the shape and number of helices in A-, B-, and Z-DNA. VCD is still a relatively new technique and has the potential to be a very powerful tool. Resolving the spectra requires extensive ab initio calculations, as well as, high concentrations and must be performed in water, which may force the molecule into a nonnative conformation. Circular dichroism (CD) is dichroism involving circularly polarized light, i.e., the differential absorption of left- and right-handed light.[1][2] Left-hand circular (LHC) and right-hand circular (RHC) polarized light represent two possible spin angular momentum states for a photon, and so circular dichroism is also referred to as dichroism for spin angular momentum.[3] This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century.[4] Circular dichroism and circular birefringence are manifestations of optical activity. It is exhibited in the absorption bands of optically active chiral molecules. CD spectroscopy has a wide range of applications in many different fields. Most notably, UV CD is used to investigate the secondary structure of proteins.[5] UV/Vis CD is used to investigate charge-transfer transitions.[6] Near-infrared CD is used to investigate geometric and electronic structure by probing metal d→dtransitions.[2] Vibrational circular dichroism, which uses light from the infrared energy region, is used for structural studies of small organic molecules, and most recently proteins and DNA.[5]

secondary structure

Formation of a secondary structure is the first step in the folding process that a protein takes to assume its native structure. Characteristic of secondary structure are the structures known as alpha helices and beta sheets that fold rapidly because they are stabilized by intramolecularhydrogen bonds, as was first characterized by Linus Pauling. Formation of intramolecular hydrogen bonds provides another important contribution to protein stability.[12] α-helices are formed by hydrogen bonding of the backbone to form a spiral shape (refer to figure on the right).[10] The β pleated sheet is a structure that forms with the backbone bending over itself to form the hydrogen bonds (as displayed in the figure to the left). The hydrogen bonds are between the amide hydrogen and carbonyl oxygen of the peptide bond. There exists anti-parallel β pleated sheets and parallel β pleated sheets where the stability of the hydrogen bonds is stronger in the anti-parallel β sheet as it hydrogen bonds with the ideal 180 degree angle compared to the slanted hydrogen bonds formed by parallel sheets

Proteins fold by progressive stabilization of intermediates rather than by random search

How does a protein make the transition from an unfolded structure to a unique conformation in the native form? One possibility a priori would be that all possible conformations are sampled to find the energetically most favorable one. How long would such a random search take? Consider a small protein with 100 residues. Cyrus Levinthal calculated that, if each residue can assume three different conformations, the total 53 2.6 Sequence and Structure 100 47 number of structures would be 31013 s to convert one structure into another, the total search time would be531047 31013 s,whichisequalto531034 s,or1.631027 years. In reality, small proteins can fold in less than a second. Clearly, it would take much too long for even a small protein to fold properly by randomly trying out all possible conformations. The enormous difference between calculated and actual folding times is called Levinthal's paradox. This paradox clearly reveals that proteins do not fold by trying every possible conformation; instead, they must follow at least a partly defined folding pathway consisting of intermediates between the fully denatured protein and its native structure. , which is equal to 5 3 10 . If it takes The way out of this paradox is to recognize the power of cumulative selec- tion. Richard Dawkins, in The Blind Watchmaker, asked how long it would take a monkey poking randomly at a typewriter to reproduce Hamlet's remark to Polonius, "Methinks it is like a weasel" (Figure 2.58). An astro- nomically large number of keystrokes, on the order of 1040, would be required. However, suppose that we preserved each correct character and allowed the monkey to retype only the wrong ones. In this case, only a few thousand keystrokes, on average, would be needed. The crucial difference between these cases is that the first employs a completely random search, whereas, in the second, partly correct intermediates are retained. The essence of protein folding is the tendency to retain partly correct inter- mediates. However, the protein-folding problem is much more difficult than the one presented to our simian Shakespeare. First, the criterion of correctness is not a residue-by-residue scrutiny of conformation by an omniscient observer but rather the total free energy of the transient species. Second, proteins are only marginally stable. The free-energy difference between the folded and the unfolded states of a typical 100-residue protein is 42 kJ mol1 (10 kcal mol1), and thus each residue contributes on aver- age only 0.42 kJ mol1 (0.1 kcal mol1) of energy to maintain the folded state. This amount is less than the amount of thermal energy, which is 2.5 kJ mol1 (0.6 kcal mol1) at room temperature. This meager stabilization energy means that correct intermediates, especially those formed early in folding, can be lost. The analogy is that the monkey would be somewhat free to undo its correct keystrokes. Nonetheless, the interactions that lead to cooperative folding can stabilize intermediates as structure builds up. Thus, local regions that have significant structural preference, though not necessarily stable on their own, will tend to adopt their favored structures and, as they form, can interact with one other, leading to increasing stabi- lization. This conceptual framework is often referred to as the nucleation- condensation model.

Gram-negative bacteria

In gram-negative bacteria proteins may be incorporated into the plasma membrane, the outer membrane, the periplasm or secreted into the environment. Systems for secreting proteins across the bacterial outer membrane may be quite complex and play key roles in pathogenesis. These systems may be described as type I secretion, type II secretion, etc.

3 The Twin‐Arginine Translocation Pathway

In the early 1990's, an alternative translocase was discovered in the thylakoid membrane of chloroplasts, which worked in parallel to the Sec pathway [21]. Initially this pathway was named the ΔpH-dependent pathway due to its unusual sole requirement of a transmembrane proton gradient for translo- cation [22]. Three membrane proteins were soon identified in thylakoids as essential for translocation of fully folded proteins via the ΔpH-dependent pathway [23], namely Tha4 [24], Hcf106 [25] and cpTatC [26]. Subsequently, homolo- gous proteins were identified in some bacteria, archaea and even mitochondria [27, 28]. In E. coli, the homologues of Tha4, Hcf106 and cpTatC were also shown to be required for export of proteins with twin-arginine signal peptides and, therefore, they were respectively named TatA, TatB and TatC [9, 29-31]. Combined studies on the thylakoidal and bacterial Tat pathways showed that their function is to transport a sub- set of complex fully folded proteins that require cofactor insertion or immediate oligomerisation [8, 32]. Today, Tat- translocated proteins have been shown to participate in many processes including energy metabolism, cell division, cell envelope biogenesis, quorum sensing, motility, symbiosis and pathogenesis [33-36]. Tat can even export complex heterologous proteins that are Sec-incompatible, like the tightly folded dihydrofolate reductase with bound metho- trexate [37], the green fluorescent protein (GFP) [38], and several bio-pharmaceutically relevant human proteins [39]. Another intriguing attribute of the Tat pathway is that it can detect unfolded or mutated proteins, and reject them for export [40, 41]. Based on the number of Tat components involved in pro- tein translocation, essentially two types of 'translocases' can be distinguished. The prototype Tat translocase that is active in thylakoids and E. coli, consists of the afore-mentioned TatABC components. Further, the minimal Tat translocases, as typified in Bacillus species consist of TatA and TatC com- ponents only.

CD as a structural technique

It is an ideal technique for checking that an expressed protein is folded, or that a site- directed mutant has a similar structure to the wild-type protein. It is not generally sensitive to quaternary structure changes, unless, for example, aromatic side chains are at subunit interfaces as in the case of insulin.

Lipoprotein transport through the bacterial cell envelope

Lipoproteins are sorted and transported by LolABCDE

Diseases associated with protein misfolding

Many diseases are the result of defects in protein folding, e.g., the spongiform encephalopathies, Alzheimer disease These diseases involving deposits of misfolded proteins (amyloid deposits) result from aggregation of a specific protein, different for different diseases, that has misfolded and formed cross-beta structures that form higher order structures (protofibrils and fibrils/fibers) that are very stable. One hypothesis is that the protein quality control machinerycan't keep up with disposal of the abnormally folded protein. Model of cross-beta structurein amyloid fibrils. Extended parallel beta sheet structures (deduced from solid-state NMR studies)

a hierarchic classification of protein domain structures

Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures

Summary: Protein folding mechanisms

Protein folding occurs by cumulative selection. Essence of protein folding is the tendency to retain partially correctly folded intermediates with marginally increased stability. Multiple pathways may be available for protein folding. Folding may switch between alternative pathways upon a change in folding conditions. A folding intermediate is an ensemble of molecules with an ensemble-averaged structure.

Hydrophobic effect

Protein folding must be thermodynamically favorable within a cell in order for it to be a spontaneous reaction. Since it is known that protein folding is a spontaneous reaction, then it must assume a negative Gibbs free energy value. Gibbs free energy in protein folding is directly related to enthalpy and entropy.[10]For a negative delta G to arise and for protein folding to become thermodynamically favorable, then either enthalpy, entropy, or both terms must be favorable. Minimizing the number of hydrophobic side-chains exposed to water is an important driving force behind the folding process.[19] The hydrophobic effect is the phenomenon in which the hydrophobic chains of a protein collapse into the core of the protein (away from the hydrophilic environment).[10] In an aqueous environment, the water molecules tend to aggregate around the hydrophobic regions or side chains of the protein, creating water shells of ordered water molecules.[20] An ordering of water molecules around a hydrophobic region increases order in a system and therefore contributes a negative change in entropy (less entropy in the system). The water molecules are fixed in these water cages which drives the hydrophobic collapse, or the inward folding of the hydrophobic groups. The hydrophobic collapse introduces entropy back to the system via the breaking of the water cages which frees the ordered water molecules.[10] The multitude of hydrophobic groups interacting within the core of the globular folded protein contributes a significant amount to protein stability after folding, because of the vastly accumulated van der Waals forces (specifically London Dispersion forces).[10] The hydrophobic effect exists as a driving force in thermodynamics only if there is the presence of an aqueous medium with an amphiphilic molecule containing a large hydrophobic region.[21] The strength of hydrogen bonds depends on their environment; thus, H-bonds enveloped in a hydrophobic core contribute more than H-bonds exposed to the aqueous environment to the stability of the native state.[22] In proteins with globular folds, hydrophobic amino acids tend to be interspersed along the primary sequence, rather than randomly distributed or clustered together.[23][24]However, proteins that have recently been born de novo, which tend to be intrinsically disordered,[25][26] show the opposite pattern of hydrophobic amino acid clustering along the primary sequence

Ramachandran plot

Shows favorable phi-psi angle combinations. 3 main "wells" for α-helices, ß-sheets, and left-handed α-helices. and together define the main-chain conformation of a residue in a polypeptide Shaded areas show energetically favourable conformations.

Integral outer membrane proteins

The OM bilayer like other biological membranes is relatively impermeable to hydrophilic solutes. Requires protein channels Most biological membranes contain a-helical proteins Bacterial OM proteins are b-barrel structures also found in chloroplasts and mitochondria OM transport proteins can be split into 3 categories » Classical porins (non-specific) e.g. OmpF » Specific channels e.g. LamB » TonB dependent receptors e.g. BtuB

integral outer membrane proteins

The cell envelope of Gram-negative bacteria consists of two membranes separated by the periplasm. In contrast with most integral membrane proteins, which span the membrane in the form of hydrophobic a-helices, integral outer-membrane proteins (OMPs) form b-barrels. Similar b-barrel proteins are found in the outer membranes of mitochondria and chloroplasts, probably reflecting the endosymbiont origin of these eukaryotic cell organelles. How these b-barrel proteins are assembled into the outer membrane has remained enigmatic for a long time. In recent years, much progress has been reached in this field by the identification of the components of the OMP assembly machinery. The central component of this machinery, called Omp85 or BamA, is an essential and highly conserved bacterial protein that recognizes a signature sequence at the C terminus of its substrate OMPs. A homologue of this protein is also found in mitochondria, where it is required for the assembly of b-barrel proteins into the outer membrane as well. Although accessory components of the machineries are different between bacteria and mitochondria, a mitochondrial b-barrel OMP can be assembled into the bacterial outer membrane and, vice versa, bacterial OMPs expressed in yeast are assembled into the mitochondrial outer membrane. These observations indicate that the basic mechanism of OMP assembly is evolutionarily highly conserved. The cell envelope of Gram-negative bacteria is composed of two membranes, the inner membrane and the outer membrane, which are separated by the periplasm containing the peptidoglycan layer. While the inner membrane is a phospholipid bilayer constituted of glycerophospholipids, the outer membrane is highly asymmetrical, containing glycerophospholipids in the inner leaflet and lipopolysac- charides (LPSs) exposed to the cell surface (Fig. 1). The outer membrane functions as a permeability barrier protecting the bacteria against harmful compounds, such as antibiotics and bile salts, from the environment. Most nutrients pass this barrier via a family of integral outer-membrane proteins (OMPs), collectively called porins (Fig. 1). These trimeric proteins form open, water-filled channels in the outer membrane, which allow for the passage of small hydrophilic solutes, such as amino acids and monosaccharides, via passive diffusion (Nikaido, 2003). Other OMPs have more specialized transport functions, such as the secretion of proteins and the extrusion of drugs, or function as enzymes or structural components of the outer membrane (Koebnik et al., 2000). Besides integral OMPs, the membrane also contains lipoproteins, which are attached to the membrane via an N-terminal lipid moiety. All constituents of the outer membrane are synthesized in the cytoplasm or at the inner leaflet of the inner membrane. An area of intense research is how these components are transported and assembled into the outer membrane. An obvious model organism to study such fundamental questions is Escherichia coli, but Neisseria meningitidis has also proven to be a very suitable organism to address these questions. N. meningitidis normally resides as a commensal in the nasopharynx but occasionally causes sepsis and meningitis. Besides generally useful features, such as a relatively small genome size (~2200 genes) and natural competence and recombination proficiency, which facilitate the construction of mutants, the organism has several properties particularly useful for the study of outer membrane biogenesis. Firstly, in contrast with E. coli, N. meningitidis is viable without LPS (Steeghs et al., 1998). Such mutants defective in LPS biosynthesis still produce an outer membrane into which OMPs are assembled (Steeghs et al., 2001). Since N. meningitidis is viable without LPS, the genes encoding the components of the LPS transport route can be knocked out and the properties of such mutants can be studied (Bos et al., 2004; Tefsen et al., 2005). Secondly, studies on OMP assembly in E. coli are thwarted by a stress response that is activated when unfolded OMPs accumulate in the periplasm. Activation of this stress response, which is dependent on the alternative s factor sE, results in the increased production of periplasmic chaperones that aid in OMP assembly and of the protease DegP that degrades these unfolded OMPs (Ruiz & Silhavy, 2005). In addition, small regulatory RNAs are produced that inhibit the translation of the mRNAs for OMPs by stimulating their decay (Johansen et al., 2006; Papenfort et al., 2006). Thus, OMP synthesis is inhibited under these conditions until unfolded OMPs are cleared from the periplasm. Consequently, mutations resulting in OMP assembly defects do not normally result in the extensive accumulation of unfolded OMPs in the periplasm, but in decreased OMP levels (Chen & Henning, 1996; Sklar et al., 2007b). Since other signals such as altered LPS structure (Tam & Missiakas, 2005), and even cytoplas- E mic signals (Costanzo & Ades, 2006) can also trigger the dependent stress response, decreased OMP levels do not E necessarily reflect an OMP assembly defect. Since this s - dependent stress response is absent in N. meningitidis (Bos et al., 2007a), unfolded OMPs normally accumulate in the periplasm of assembly-defective N. meningitidis mutants, which facilitates these studies. This paper focuses on the current knowledge of OMP biogenesis in bacteria and on the evolutionary conservation of the OMP assembly machinery.

Energy landscape of protein folding

The configuration space of a protein during folding can be visualized as an energy landscape. According to Joseph Bryngelson and Peter Wolynes, proteins follow the principle of minimal frustration meaning that naturally evolved proteins have optimized their folding energy landscapes.,[64] and that nature has chosen amino acid sequences so that the folded state of the protein is sufficiently stable. In addition, the acquisition of the folded state had to become a sufficiently fast process. Even though nature has reduced the level of frustration in proteins, some degree of it remains up to now as can be observed in the presence of local minima in the energy landscape of proteins. A consequence of these evolutionarily selected sequences is that proteins are generally thought to have globally "funneled energy landscapes" (coined by José Onuchic)[65] that are largely directed toward the native state. This "folding funnel" landscape allows the protein to fold to the native state through any of a large number of pathways and intermediates, rather than being restricted to a single mechanism. The theory is supported by both computational simulations of model proteins and experimental studies,[64] and it has been used to improve methods for protein structure prediction and design.[64] The description of protein folding by the leveling free-energy landscape is also consistent with the 2nd law of thermodynamics.[66] Physically, thinking of landscapes in terms of visualizable potential or total energy surfaces simply with maxima, saddle points, minima, and funnels, rather like geographic landscapes, is perhaps a little misleading. The relevant description is really a high-dimensional phase space in which manifolds might take a variety of more complicated topological forms.[67] The unfolded polypeptide chain begins at the top of the funnel where it may assume the largest number of unfolded variations and is in its highest energy state. Energy landscapes such as these indicate that there are a large number of initial possibilities, but only a single native state is possible; however, it does not reveal the numerous folding pathways that are possible. A different molecule of the same exact protein may be able to follow marginally different folding pathways, seeking different lower energy intermediates, as long as the same native structure is reached.[68] Different pathways may have different frequencies of utilization depending on the thermodynamic favorability of each pathway. This means that if one pathway is found to be more thermodynamically favorable than another, it is likely to be used more frequently in the pursuit of the native structure.[68] As the protein begins to fold and assume its various conformations, it always seeks a more thermodynamically favorable structure than before and thus continues through the energy funnel. Formation of secondary structures is a strong indication of increased stability within the protein, and only one combination of secondary structures assumed by the polypeptide backbone will have the lowest energy and therefore be present in the native state of the protein.[68] Among the first structures to form once the polypeptide begins to fold are alpha helices and beta turns, where alpha helices can form in as little as 100 nanoseconds and beta turns in 1 microsecond.[28] There exists a saddle point in the energy funnel landscape where the transition state for a particular protein is found.[28] The transition state in the energy funnel diagram is the conformation that must be assumed by every molecule of that protein if the protein wishes to finally assume the native structure. No protein may assume the native structure without first passing through the transition state.[28] The transition state can be referred to as a variant or premature form of the native state rather than just another intermediary step.[69] The folding of the transition state is shown to be rate-determining, and even though it exists in a higher energy state than the native fold, it greatly resembles the native structure. Within the transition state, there exists a nucleus around which the protein is able to fold, formed by a process referred to as "nucleation condensation" where the structure begins to collapse onto the nucleus

How do proteins get into the ER?

The peptide moves through the translocation channel into the lumen of the ER. The signal peptide sequence remains attached to the membrane. It is later cleaved off by a signal peptidase. Leaving the protein free in the lumen of the ER.

distant evolutionary relationships can be detected through the use of substitution matrices

The scoring scheme heretofore described assigns points only to positions occupied by identical amino acids in the two sequences being compared. No credit is given for any pairing that is not an identity. However, as already discussed, two proteins related by evolution undergo amino acid substitu- tions as they diverge. A scoring system based solely on amino acid identity cannot account for these changes. To add greater sensitivity to the detection of evolutionary relationships, methods have been developed to compare two amino acids and assess their degree of similarity. Not all substitutions are equivalent. For example, amino acid changes can be classified as structurally conservative or nonconservative. A conser- vative substitution replaces one amino acid with another that is similar in size and chemical properties. Conservative substitutions may have only minor effects on protein structure and often can be tolerated without compromis- ing protein function. In contrast, in a nonconservative substitution, an amino acid is replaced by one that is structurally dissimilar. Amino acid changes can also be classified by the fewest number of nucleotide changes necessary to achieve the corresponding amino acid change. Some substitutions arise from the replacement of only a single nucleotide in the gene sequence; whereas others require two or three replacements. Conservative and single- nucleotide substitutions are likely to be more common than are substitu- tions with more radical effects. How can we account for the type of substitution when comparing sequences? We can approach this problem by first examining the substitu- tions that have been observed in proteins known to be evolutionarily related. From an examination of appropriately aligned sequences, substitution matrices have been deduced. A substitution matrix describes a scoring system for the replacement of any amino acid with each of the other 19 amino acids. In these matrices, a large positive score corresponds to a substitution that occurs relatively frequently, whereas a large negative score corresponds to a substitution that occurs only rarely. A commonly used substitution matrix, the Blosum-62 (for Blocks of amino acid substitution matrix), is illustrated in Figure 6.9. In this depiction, each column in this matrix represents one of the 20 amino acids, whereas the position of the single-letter codes within each column specifies the score for the corresponding substitution. Notice that scores corresponding to identity (the boxed codes at the top of each column) are not the same for each residue, owing to the fact that less fre- quently occurring amino acids such as cysteine (C) and tryptophan (W) will align by chance less often than the more common residues. Furthermore, structurally conservative substitutions such as lysine (K) for arginine (R) and isoleucine (I) for valine (V) have relatively high scores, whereas noncon- servative substitutions such as lysine for tryptophan result in negative scores (Figure 6.10). When two sequences are compared, each pair of aligned resi- dues is assigned a score based on the matrix. In addition, gap penalties are often assessed. For example, the introduction of a single-residue gap lowers the alignment score by 12 points and the extension of an existing gap costs 2 points per residue. With the use of this scoring system, the alignment shown in Figure 6.6 receives a score of 115. In many regions, most substitutions are conservative (defined as those substitutions with scores greater than 0) and relatively few are strongly disfavored (Figure 6.11). This scoring system detects homology between less obviously related sequences with greater sensitivity than would a comparison of identities only. Consider, for example, the protein leghemoglobin, an oxygen-binding protein found in the roots of some plants. The amino acid sequence of leghe- moglobin from the herb lupine can be aligned with that of human myoglo- bin and scored by using either the simple scoring scheme based on identities only or the Blosum-62 (Figure 6.9). Repeated shuffling and scoring pro- vides a distribution of alignment scores (Figure 6.12). Scoring based solely on identities indicates that the probability of the alignment between myo- globin and leghemoglobin occurring by chance alone is 1 in 20. Thus, although the level of similarity suggests a relationship, there is a 5% chance that the similarity is accidental on the basis of this analysis. In contrast, users of the substitution matrix are able to incorporate the effects of conservative substitutions. From such an analysis, the odds of the alignment occurring by chance are calculated to be approximately 1 in 300. Thus, an analysis per- formed with the substitution matrix reaches a much firmer conclusion about the evolutionary relationship between these proteins (Figure 6.13).Experience with sequence analysis has led to the development of sim- pler rules of thumb. For sequences longer than 100 amino acids, sequence identities greater than 25% are almost certainly not the result of chance alone; such sequences are probably homologous. In contrast, if two sequences are less than 15% identical, their alignment alone is unlikely to indicate statistically significant similarity. For sequences that are between 15 and 25% identical, further analysis is necessary to determine the statisti- cal significance of the alignment. It must be emphasized that the lack of a statistically significant degree of sequence similarity does not rule out homology. The sequences of many proteins that have descended from common ances- tors have diverged to such an extent that the relationship between the pro- teins can no longer be detected from their sequences alone. As we will see, such homologous proteins can often be detected by examining three- dimensional structures

Role of Signal Recognition Particle

The signal recognition particle (SRP) is an abundant, cytosolic, universally conserved ribonucleoprotein (protein-RNA complex) that recognizes and targets specific proteins to the endoplasmic reticulum in eukaryotes and the plasma membrane in prokaryotes. In eukaryotes, SRP binds to the signal sequence of a newly synthesized peptide as it emerges from the ribosome. This binding leads to the slowing of protein synthesis known as "elongation arrest", a conserved function of SRP that facilitates the coupling of the protein translation and the protein translocation processes.[4] SRP then targets this entire complex (the ribosome-nascent chain complex) to the protein-conducting channel, also known as the translocon, in the ER (Endoplasmic reticulum) membrane. This occurs via the interaction and docking of SRP with its cognate SRP receptor[5] that is located in close proximity to the translocon. In eukaryotes there are three domains between SRP and its receptor that function in guanosine triphosphate (GTP) binding and hydrolysis. These are located in two related subunits in the SRP receptor (SRα and SRβ)[6] and the SRP protein SRP54 (known as Ffh in bacteria).[7] The coordinated binding of GTP by SRP and the SRP receptor has been shown to be a prerequisite for the successful targeting of SRP to the SRP receptor.[8][9] Upon docking, the nascent peptide chain is inserted into the translocon channel where it enters into the ER. Protein synthesis resumes as SRP is released from the ribosome.[10][11] The SRP-SRP receptor complex dissociates via GTP hydrolysis and the cycle of SRP-mediated protein translocation continues.[12] Once inside the ER, the signal sequence is cleaved from the core protein by signal peptidase. Signal sequences are therefore not a part of mature proteins. Translocation into the ER: Post- and cotranslational translocation requires four steps. 1) Targeting to the ER membrane, which can be SRP-dependent or independent. Post-translationally translocated preproteins require molecular chaperones to maintain their solubility. 2) Insertion into the Sec61 translocon. 3) Energy-dependent import through the translocon. 4) Protein folding in the ER lumen, which may be chaperone-dependent

procedure

The technique of single-crystal X-ray crystallography has three basic steps. The first—and often most difficult—step is to obtain an adequate crystal of the material under study. The crystal should be sufficiently large (typically larger than 0.1 mm in all dimensions), pure in composition and regular in structure, with no significant internal imperfections such as cracks or twinning. In the second step, the crystal is placed in an intense beam of X-rays, usually of a single wavelength (monochromatic X-rays), producing the regular pattern of reflections. The angles and intensities of diffracted X-rays are measured, with each compound having a unique diffraction pattern.[103] As the crystal is gradually rotated, previous reflections disappear and new ones appear; the intensity of every spot is recorded at every orientation of the crystal. Multiple data sets may have to be collected, with each set covering slightly more than half a full rotation of the crystal and typically containing tens of thousands of reflections. In the third step, these data are combined computationally with complementary chemical information to produce and refine a model of the arrangement of atoms within the crystal. The final, refined model of the atomic arrangement—now called a crystal structure—is usually stored in a public database. When a narrow beam of x-rays is directed at the protein crystal, most of the beam passes directly through the crystal while a small part is scattered in various directions. These scattered, or diffracted, x-rays can be detected by x-ray film or by a solid-state electronic detector. The scattering pattern provides abundant information about protein structure. The basic physical principles underly- ing the technique are: 1. Electrons scatter x-rays. The amplitude of the wave scattered by an atom is proportional to its number of elec- trons. Thus, a carbon atom scatters six times as strongly as a hydrogen atom does. 2. The scattered waves recombine. Each diffracted beam comprises waves scattered by each atom in the crystal. The scattered waves reinforce one another at the film or detector if they are in phase (in step) there, and they cancel one another if they are out of phase. 3. The way in which the scattered waves recombine depends only on the atomic arrangement. The protein crystal is mounted and positioned in a pre- cise orientation with respect to the x-ray beam and the film. The crystal is rotated so that the beam can strike the crystal from many directions. This rotational motion results in an x-ray photograph consisting of a regular array of spots called reflections. The x-ray photograph shown in Figure 3.39 is a two-dimensional section through a three-dimensional array of 72,000 reflections. The intensities and positions of these reflections are the basic experimental data of an x-ray crystallographic analysis. Each reflection is formed from a wave with an amplitude proportional to the square root of the observed intensity of the spot. Each wave also has a phase—that is, the timing of its crests and troughs relative to those of other waves. Additional experiments or calcula- tions must be performed to determine the phases corre- sponding to each reflection. The next step is to reconstruct an image of the protein from the observed reflections. In light microscopy or electron microscopy, the diffracted beams are focused by lenses to directly form an image. However, appropriate lenses for focusing x-rays do not exist. Instead, the image is formed by applying a mathematical relation called a Fourier transform to the measured amplitudes and calculated phases of every observed reflection. The image obtained is referred to as the electron-density map. It is a three-dimensional graphic representation of where the electrons are most densely localized and is used to determine the positions of the atoms in the crystallized molecule (Figure 3.40). Critical to the interpretation of the map is its resolution, which is determined by the number of scattered intensities used in the Fourier transform. The fidelity of the image depends on this resolution, as shown by the optical analogy in Figure 3.41 A resolution of 6 Å reveals the course of the polypeptide chain but few other structural details. The reason is that polypeptide chains pack together so that their centers are between 5 Å and 10 Å apart. Maps at higher resolution are needed to delineate groups of atoms, which lie between 2.8 Å and 4.0 Å apart, and individual atoms, which are between 1.0 Å and 1.5 Å apart (Figure 3.42). The ultimate resolution of an x-ray analysis is determined by the degree of perfection of the crystal. For proteins, this limiting resolution is often about 2 Å; however, in exceptional cases, resolutions of 1.0 Å have been obtained.

Structure of bacterial OMPs

Whereas most integral membrane proteins, including bacterial inner-membrane proteins, span the membrane in the form of a-helices entirely composed of hydrophobic amino acids, bacterial OMPs present an entirely different structure (Fig. 1). These proteins form b-barrels composed of antiparallel amphipathic b-strands (Koebnik et al., 2000). The hydrophobic residues in these b-strands are exposed to the lipid environment of the membrane, whereas the hydrophilic residues point towards the interior of the protein, which is the aqueous channel in the case of porins. These b-barrel structures are very stable, usually with- standing incubation in 2 % SDS (i.e. as present in standard sample buffer for SDS-PAGE) at ambient temperature. This property explains the heat-modifiable behaviour of many OMPs in SDS-PAGE analysis: the native form of these proteins migrates differently in the gel compared with the heat-denatured form (Dekker et al., 1995; Nakamura & Mizushima, 1976). Also, natively folded OMPs are usually highly resistant to proteases. Heat modifiability and protease resistance are facile parameters to probe the folding of OMPs into their native configuration.

Domain classification

Why?-structuralsimilarityoften reveals functional/evolutionary relationship Domains each adopt a fold- 3-D arrangement or topology of regular secondary structure elements Curated databases of classified structures e.g. CATH, SCOP - Classification by class, fold/architecture, superfamily,...

proteins

proteins are synthesised at ribosomes in cytosol. all proteins that function outside of the cytosol either need to insert or need to pass the cytoplasmic membrane to reach their final destination route for protein transport - Sec translocase

possible fates of an unfolded protein

we can go straight from an unfolded protein to a disordered aggregate ( multiple polypeptide chains interacting) - intermediate states ( highly variable to forming aggregates ) to disordered aggregates, amyloid precursor which is a very ordered aggregate - amyloid precursors can then stable amyloid that consist of multiple polypeptide chains in a highly ordered conformation which is highy resistant to break down and refold what we want is to reach to a native state from an unfolded protein in a crowded environment, cell, where protein is high this favours those pathways in particular ( formation of disordered aggregates) rather than ordered aggregated because these can occur with many proteins while highly ordered all contain same copies of polypeptide chain

Scoring Alignments

• Common amino acid substitutions • Stand out cases: - C - position of disulphides in extra-cellular & zinc finger proteins highly conserved - W - tryptophan's properties difficult to replace; rare

The 3D Structure of a Protein is Determined by its Amino Acid Sequence

• How do we know this? • From Anfinsen's work on Ribonuclease A

Detecting homology: how low can you go?

• Identity > 50% almost identical structures, functions closely related. • Identity > 25% with higher sequence similarity over 80+ residues virtual certainty of structural similarity. • 15% < Identity < 25% with higher sequence similarity twilight zone • Identity < 15% "undetectable" NB even alignment of randomly chosen sequences will produce some level of identity, especially with gaps allowed

Non-specific porins

• InE.coliOmpF,OmpCandPhoE Differ mostly in channel size and anion/cation specificity

Post translational modifications modulate protein function

• Intracellular- Phosphorylation - Acetylation- Methylation- Ubiquitinylation - Lipidation- ADP-ribosylation • Extracellular - Glycosylation

getting phase

• Isomorphous replacement - Heavy atoms at a few fixed positions per protein molecule with same crystal form • Multiple wavelength Anomalous Dispersion - Use scattering of X-rays by rare atoms (e.g. S, Se) to get phases • Molecular replacement - Use model based on (suspected) similar structure to estimate phases

Polypeptide

• Proteins are polypeptides: linear polymers of L-amino acids linked by peptide bonds. - L-aminoacids: isomers found in proteins within the body. they form short polymer chains ( peptides)+ longer chains ( polypeptides)/proteins

Proteins can be synthesised on free or membrane bound ribosomes

• Proteins destined for the cytoplasm, mitochondrion, chloroplast, nucleus and peroxisome are synthesised on free cytosolic ribosomes • Proteins destined for secretion, plasma membrane or lysosome are synthesised on ribosomes bound to the rough endoplasmic reticulum and translocated first into the lumen of the ER

Quality of X-ray structures

• Resolution of diffraction pattern - High resolution = low figure in Å • R factor and free R factor - How well does the model represent the data? - Low is good, Rfree most important • Ramachandran plot quality: - How much does the model look like we expect a protein to? - High %age of residues in favoured regions is good

Detecting homology: how low can you go?

• Sequence identity (%age) • Sequence similarity (%age - depends on scoring matrix)

What can NMR do?

• Spectroscopy- Molecular structure in solution (or solids) - Enzyme function/mechanism- Molecular motion- Inter-molecular interactions- Metabolite analysis • Imaging

Evolution & fate of protein

• Theory: As species diverge, accumulated mutations will change the sequence of genes such that proteins (gene products) largely don't disrupt their function. One approach to studying structure evolution is to examine how proteins' structural similarity varies over a range of sequence identities. Such investigations proceed by aligning many pairs of proteins so that their sequence identities (or another measure of sequence similarity) and structural similarities can be assessed (5-9). The result is a cusped relationship between sequence and structure diver- gence: sequences reliably diverge up to 70% without signif- icant protein structure evolution. Below 30% sequence identity, however, the structural similarity between proteins abruptly decreases, giving rise to a ''twilight zone'' where little can be said about the relationship between sequence identity and structural similarity without more advanced methods. This finding is the foundation of one of the most important methods in protein biophysics: structure homol- ogy modeling

Drawing free amino acids

• alpha-amino acids• ionisable groups as for neutral pH • depict approximate 3D geometry

Forces stabilising protein tertiary structure

•Hydrophobic interactions -- the tendency of non-polar groups to cluster together to exclude water•Hydrogen bonding, as part of any secondary structure, as well as other hydrogen bonds •Ionic/electrostatic interactions. Attraction between unlike electric charges of ionized R-groups•Disulfide bridges between cysteine residues

Anfinsen's Experiment

•Unfold RNase with denaturing agent (8 M urea) and β-mercaptoethanol to reduce disulfide so "unfolded" protein is entirely unfolded. •Loss of native structure inactivates RNase (no enzymatic activity). • Refold protein by: Removal of urea (by dialysis) Reoxidation by exposure to O2 to convert SH groups to disulfides Enzyme refolds and regains activity -- proof the right combinations of S-S bonds have formed (i.e., structure was correct)

antiparallel beta sheet

Adjacent b strands run in opposite directions, as indicated by the arrows. Hydrogen bonds between NH and CO groups connect each amino acid to a single amino acid on an adjacent strand, stabilizing the structure.

parallel beta sheet

Adjacent b strands run in the same direction, as indicated by the arrows. Hydrogen bonds connect each amino acid on one strand with two different amino acids on the adjacent strand.

Structural information: types of information

Chemical shift - Hα, Cα, Cβ & CO "secondary" shifts report secondary structure Nuclear Overhauser effect (NOE)- through space interaction that reports what's close to what Dipolar couplings- through bond interactions that report dihedral angles Residual dipolar couplings - report inter-atomic vectors relative to overall molecular framework

Quality of NMR structures

Coordinate root mean square deviation - How well do the models in the ensemble agree with each other? - Low is good (expect 0.5-1Å over backbone atoms of regular 2ndary structure elements) - Don't confuse with resolution of X-ray structures Violations of experimental restraints- How well do the models agree with the experimental data - Low is good Quality of Ramachandran plot - How well do models agree with what we expect protein stuctures to look like? - High proportion of residues in most favoured regions.

NMR: spectrometer

measures changes in energy of magnetic dipoles of protons

Sic1 interacting with E2/E3 ligase complex

The modification of proteins with ubiquitin chains can change their localization, activity and/or stability. Although ubiquitylation requires the concerted action of ubiquitin-activating enzymes (E1s), ubiquitin-conjugating enzymes (E2s) and ubiquitin ligases (E3s), it is the E2s that have recently emerged as key mediators of chain assembly. These enzymes are able to govern the switch from ubiquitin chain initiation to elongation, regulate the processivity of chain formation and establish the topology of assembled chains, thereby determining the consequences of ubiquitylation for the modified proteins.

statistical analysis of sequence alignments can detect homology

A significant sequence similarity between two molecules implies that they are likely to have the same evolutionary origin and, therefore, similar three- dimensional structures, functions, and mechanisms. Both nucleic acid and protein sequences can be compared to detect homology. However, the pos- sibility exists that the observed agreement between any two sequences is solely a product of chance. Because nucleic acids are composed of fewer building blocks than proteins (4 bases versus 20 amino acids), the likelihood of random agreement between two DNA or RNA sequences is significantly greater than that for protein sequences. For this reason, detection of homol- ogy between protein sequences is typically far more effective. To illustrate sequence-comparison methods, let us consider a class of proteins called the globins. Myoglobin is a protein that binds oxygen in muscle, whereas hemoglobin is the oxygen-carrying protein in blood (Chapter 7). Both proteins cradle a heme group, an iron-containing organic molecule that binds the oxygen. Each human hemoglobin molecule is com- posed of four heme-containing polypeptide chains, two identical a chains and two identical b chains. Here, we shall consider only the a chain. To examine the similarity between the amino acid sequence of the human a chain and that of human myoglobin (Figure 6.4), we apply a method, referred to as a sequence alignment, in which the two sequences are systematically aligned with respect to each other to identify regions of significant overlap.How can we tell where to align the two sequences? In the course of evo- lution, the sequences of two proteins that have an ancestor in common will have diverged in a variety of ways. Insertions and deletions may have occurred at the ends of the proteins or within the functional domains them- selves. Individual amino acids may have been mutated to other residues of varying degrees of similarity. To understand how the methods of sequence alignment take these potential sequence variations into account, let us first consider the simplest approach, where we slide one sequence past the other, one amino acid at a time, and count the number of matched residues, or sequence identities (Figure 6.5). For a-hemoglobin and myoglobin, the best alignment reveals 23 sequence identities, spread throughout the central parts of the sequences. However, careful examination of all the possible alignments and their scores suggests that important information regarding the relationship between myoglobin and hemoglobin a has been lost with this method. In particular, we see that another alignment, featuring 22 identities, is nearly as good. This alignment is shifted by six residues relative to the preceding alignment and yields identities that are concentrated toward the amino-terminal end of the sequences. By introducing a gap into one of the sequences, the identities found in both alignments will be repre- sented (Figure 6.6). Insertion of gaps allows the alignment method to compensate for the insertions or deletions of nucleotides that may have taken place in the gene for one molecule but not the other in the course of evolution.The use of gaps substantially increases the complexity of sequence alignment because a vast number of possible gaps, varying in both position and length, must be considered throughout each sequence. Moreover, the introduction of an excessive number of gaps can yield an artificially high number of identities. Nevertheless, methods have been developed for the insertion of gaps in the automatic alignment of sequences. These methods use scoring systems to compare different alignments, including penalties for gaps to prevent the insertion of an unreasonable number of them. For example, in one scoring system, each identity between aligned sequences is counted as 110 points, whereas each gap introduced, regardless of size, counts for 225 points. For the alignment shown in Figure 6.6, there are 38 identities (38 3 10 5 380) and 1 gap (1 3 225 5 225), producing a score of (380 1 225 5 355). Overall, there are 38 matched amino acids in an average length of 147 residues; thus, the sequences are 25.9% identical. Next, we must determine the significance of this score and level of identity.

alpha-helix

A spiral shape constituting one form of the secondary structure of proteins, arising from a specific hydrogen-bonding structure. Can a polypeptide chain fold into a regularly repeating structure? In 1951, Linus Pauling and Robert Corey proposed two periodic structures called the helix (alpha helix) and the pleated sheet (beta pleated sheet). Subsequently, other structures such as the turn and omega (V) loop were identified. Although not periodic, these common turn or loop structures are well defined and contribute with a helices and b sheets to form the final protein structure. Alpha helices, b strands, and turns are formed by a regu- lar pattern of hydrogen bonds between the peptide N}H and C"O groups of amino acids that are near one another in the linear sequence. Such folded segments are called secondary structure. the alpha helix is a coiled structure stabilised by intrachain hydrogen bonds In evaluating potential structures, Pauling and Corey considered which con- formations of peptides were sterically allowed and which most fully exploited the hydrogen-bonding capacity of the backbone NH and CO groups. The first of their proposed structures, the helix, is a rodlike structure (Figure 2.24). A tightly coiled backbone forms the inner part of the rod and the side chains extend outward in a helical array. The a helix is stabilized by hydrogen bonds between the NH and CO groups of the main chain. In par- ticular, the CO group of each amino acid forms a hydrogen bond with the NH group of the amino acid that is situated four residues ahead in the sequence (Figure 2.25). Thus, except for amino acids near the ends of an a helix, all the main-chain CO and NH groups are hydrogen bonded. Each resi- due is related to the next one by a rise, also called translation, of 1.5 Å along the helix axis and a rotation of 100 degrees, which gives 3.6 amino acid resi- dues per turn of helix. Thus, amino acids spaced three and four apart in the sequence are spatially quite close to one another in an a helix. In contrast, s amino acids spaced two apart in the sequence are situated on opposite sides of the helix and so are unlikely to make contact. The pitch of the a helix is the length of one complete turn along the helix axis and is equal to the product of the rise (1.5 Å) and the number of residues per turn (3.6), or 5.4 Å. The screw sense of an a helix can be right-handed (clockwise) or left-handed (counter- clockwise). The Ramachandran plot reveals that both the right-handed and the left-handed helices are among allowed conformations (Figure 2.26). However, right-handed helices are energetically more favorable because there is less steric clash between the side chains and the backbone. Essentially all helices found in proteins are right-handed. In schematic representations of proteins, a helices are depicted as twisted ribbons or rods (Figure 2.27). Not all amino acids can be readily accommodated in an a helix. Branching at the b-carbon atom, as in valine, threonine, and isoleucine, tends to destabilize a helices because of steric clashes. Serine, aspartate, and asparagine also tend to disrupt a helices because their side chains contain hydrogen-bond donors or acceptors in close proximity to the main chain, where they compete for main-chain NH and CO groups. Proline also is a helix breaker because it lacks an NH group and because its ring structure prevents it from assuming the value to fit into an a helix. The a-helical content of proteins ranges widely, from none to almost 100%. For example, about 75% of the residues in ferritin, a protein that helps store iron, are in a helices (Figure 2.28). Indeed, about 25% of all soluble proteins are composed of a helices connected by loops and turns of the polypeptide chain. Single a helices are usually less than 45 Å long. Many proteins that span biological membranes also contain a helices.

X-ray crystallography : atomic detail

A technique that depends on the diffraction of an X-ray beam by the individual atoms of a crystallized molecule to study the three-dimensional structure of the molecule. Constructive and destructive interference of X-rays scattered by the regularly spaced unit cells in the crystal mean scattering is only observed at specific angles Intensity & phase of scattered X-rays depends on arrangement of atoms within unit cell X-ray crystallography (XRC) is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, their crystallographic disorder, and various other information. Since many materials can form crystals—such as salts, metals, minerals, semiconductors, as well as various inorganic, organic, and biological molecules—X-ray crystallography has been fundamental in the development of many scientific fields. In its first decades of use, this method determined the size of atoms, the lengths and types of chemical bonds, and the atomic-scale differences among various materials, especially minerals and alloys. The method also revealed the structure and function of many biological molecules, including vitamins, drugs, proteins and nucleic acids such as DNA. X-ray crystallography is still the primary method for characterizing the atomic structure of new materials and in discerning materials that appear similar by other experiments. X-ray crystal structures can also account for unusual electronic or elastic properties of a material, shed light on chemical interactions and processes, or serve as the basis for designing pharmaceuticals against diseases. In a single-crystal X-ray diffraction measurement, a crystal is mounted on a goniometer. The goniometer is used to position the crystal at selected orientations. The crystal is illuminated with a finely focused monochromatic beam of X-rays, producing a diffraction pattern of regularly spaced spots known as reflections. The two-dimensional images taken at different orientations are converted into a three-dimensional model of the density of electrons within the crystal using the mathematical method of Fourier transforms, combined with chemical data known for the sample. Poor resolution (fuzziness) or even errors may result if the crystals are too small, or not uniform enough in their internal makeup. X-ray crystallography is related to several other methods for determining atomic structures. Similar diffraction patterns can be produced by scattering electrons or neutrons, which are likewise interpreted by Fourier transformation. If single crystals of sufficient size cannot be obtained, various other X-ray methods can be applied to obtain less detailed information; such methods include fiber diffraction, powder diffraction and (if the sample is not crystallized) small-angle X-ray scattering (SAXS). If the material under investigation is only available in the form of nanocrystalline powders or suffers from poor crystallinity, the methods of electron crystallography can be applied for determining the atomic structure. For all above mentioned X-ray diffraction methods, the scattering is elastic; the scattered X-rays have the same wavelengthas the incoming X-ray. By contrast, inelastic X-ray scattering methods are useful in studying excitations of the sample such as plasmons, crystal-field and orbital excitations, magnons, and phonons, rather than the distribution of its atoms. the oldest and most precise method of X-ray crystallography is single-crystal X-ray diffraction, in which a beam of X-rays strikes a single crystal, producing scattered beams. When they land on a piece of film or other detector, these beams make a diffraction pattern of spots; the strengths and angles of these beams are recorded as the crystal is gradually rotated.[102] Each spot is called a reflection, since it corresponds to the reflection of the X-rays from one set of evenly spaced planes within the crystal. For single crystals of sufficient purity and regularity, X-ray diffraction data can determine the mean chemical bond lengths and angles to within a few thousandths of an angstrom and to within a few tenths of a degree, respectively. The atoms in a crystal are not static, but oscillate about their mean positions, usually by less than a few tenths of an angstrom. X-ray crystallography allows measuring the size of these oscillations. X-ray crystallography enables us to visualise protein structures at the atomic level and enhances our understanding of protein function. X-ray crystallography can be considered a form of microscopy. The amount of detail or the resolution of any microscope is limited by the wavelength of the electro-magnetic radiation used. With light microscopy, where the shortest wavelength is about 300 nm, one can see individual cells and sub-cellular organelles. With electron microscopy, where the wavelength may be below 10 nm, one can see detailed cellular architecture and the shapes of large protein molecules. In order to see proteins in atomic detail, we need to work with electro-magnetic radiation with a wavelength of around 0.1 nm [or 1Å] = X-rays. X-ray crystallography was the first method developed to determine protein structure in atomic detail. This technique provides the clearest visualization of the precise three-dimensional positions of most atoms within a protein. Of all forms of radiation, x-rays provide the best resolution for the determination of molecular structures because their wavelength approximately corresponds to the length of a covalent bond. The three components in an x-ray crystallographic analysis are a protein crystal, a source of x-rays, and a detector (Figure 3.38). X-ray crystallography first requires the preparation of a protein or pro- tein complex in crystal form, in which all protein molecules are oriented in a fixed, repeated arrangement with respect to one another. Slowly adding ammonium sulfate or another salt to a concentrated solution of protein to reduce its solubility favors the formation of highly ordered crystals—the process of salting out discussed on page 68. For example, myoglobin crys- tallizes in 3 M ammonium sulfate. Protein crystallization can be quite chal- lenging: a concentrated solution of highly pure material is required and it is often difficult to predict which experimental conditions will yield the most- effective crystals. Methods for screening many different crystallization conditions using a small amount of protein sample have been developed. Typically, hundreds of conditions must be tested to obtain crystals fully suit- able for crystallographic studies. Nevertheless, increasingly large and com- plex proteins have been crystallized. For example, poliovirus, an 8500-kDa assembly of 240 protein subunits surrounding an RNA core, has been crys- tallized and its structure solved by x-ray methods. Crucially, proteins fre- quently crystallize in their biologically active configuration. Enzyme crystals may display catalytic activity if the crystals are suffused with substrate. After a suitably pure crystal of protein has been obtained, a source of x-rays is required. A beam of x-rays of wavelength 1.54 Å is produced by accelerating electrons against a copper target. Equipment suitable for gener- ating x-rays in this manner is available in many laboratories. Alternatively, x-rays can be produced by synchrotron radiation, the acceleration of electrons in circular orbits at speeds close to the speed of light. Synchrotron-generated x-ray beams are much more intense than those generated by electrons hitting copper. The higher intensity enables the acquisition of high quality data from smaller crystals over a shorter exposure times. Several facilities throughout the world generate synchrotron radiation, such as the Advanced Light Source at Argonne National Laboratory outside Chicago and the Photon Factory in Tsukuba City, Japan.

torsion angles

A measure of the rotation about a bond, usually taken to lie between 2180 and 1180 degrees. Torsion angles are sometimes called dihedral angles.

peptide bonds are planar

Examination of the geometry of the protein backbone reveals several important features. First, the peptide bond is essentially planar (Figure 2.18). Thus, for a pair of amino acids linked by a peptide bond, six atoms lie in the same plane: the a-carbon atom and CO group of the first amino acid and the NH group and a-carbon atom of the second amino acid. The nature of the chemical bonding within a peptide accounts for the bond's planarity. The bond resonates between a single bond and a double bond. Because of this partial double-bond character, rotation about this bond is prevented and thus the conforma- tion of the peptide backbone is constrained.

Peptide bond

Formed by "condensation" reactionbetween amine and carboxylate to form amide Proteins are linear polymers formed by linking the a-carboxyl group of one amino acid to the a-amino group of another amino acid. This type of linkage is called a peptide bond or an amide bond. The formation of a dipeptide from two amino acids is accompanied by the loss of a water molecule (Figure 2.13). The equilibrium of this reaction lies on the side of hydrolysis rather than synthesis under most conditions. Hence, the biosynthesis of peptide bonds requires an input of free energy. Nonetheless, peptide bonds are quite stable kinetically because the rate of hydrolysis is extremely slow; the lifetime of a peptide bond in aqueous solution in the absence of a catalyst approaches 1000 years. A series of amino acids joined by peptide bonds form a polypeptide chain, and each amino acid unit in a polypeptide is called a residue. A poly- peptide chain has directionality because its ends are different: an a-amino group is present at one end and an a-carboxyl group at the other

Glycine Ramachandran plot

In the diagram above the white areas correspond to conformations where atoms in the polypeptide come closer than the sum of their van der Waals radi. These regions are sterically disallowed for all amino acids except glycine which is unique in that it lacks a side chain. The red regions correspond to conformations where there are no steric clashes, ie these are the allowed regions namely the alpha-helical and beta-sheet conformations. The yellow areas show the allowed regions if slightly shorter van der Waals radi are used in the calculation, ie the atoms are allowed to come a little closer together. This brings out an additional region which corresponds to the left-handed alpha-helix. L-amino acids cannot form extended regions of left-handed helix but occassionally individual residues adopt this conformation. These residues are usually glycine but can also be asparagine or aspartate where the side chain forms a hydrogen bond with the main chain and therefore stabilises this otherwise unfavourable conformation. The 3(10) helix occurs close to the upper right of the alpha-helical region and is on the edge of allowed region indicating lower stability. Disallowed regions generally involve steric hindrance between the side chain C-beta methylene group and main chain atoms. Glycine has no side chain and therefore can adopt phi and psi angles in all four quadrants of the Ramachandran plot. Hence it frequently occurs in turn regions of proteins where any other residue would be sterically hindered. check saved article

Why do some molecules absorb L or R better?

Many amino acids exist as chiral moleculesChiral molecules are mirror images or enantiomers. There is no symmetry operation in 3D-space that can be performed on one enantiomer to make it overlay the other. Enantiomerism is a special form of isomerism. The physical properties of enantiomers are identical in every way except two: how the molecules interact with polarized light and how they interact with other chiral molecules. Circular dichroism (CD), measured as a function of wavelength, is the difference in absorbance of left-handed circularly polarized light (L-CPL) and right-handed circularly polarized light (R-CPL). This difference can be detected when a chiral molecule contains one or more light-absorbing groups - so-called chiral chromophores. • A helical displacement of charge = an optically active transition

Domains and function

Some common types of domains perform similar functions in many of the proteins that contain them For example SH2 domains are used to bind phospho-Tyr containing peptides on other proteins Domains is a conserved part of a protein structure that can evolve, function and exist independently. They are an independently folding unit of a polypeptide chain that usually carries specific function. Vary from 25 - 500 amino acids in length. Many proteins consist of different domains linked together. Because they fold independently, it is 'easy' to move these between different proteins either by evolution or genetic engineering. In molecular evolution such domains may have been utilised as building blocks, and may have been recombined in different arrangements to modulate protein function through evolution. A protein domain is a conserved part of a given protein sequence and (tertiary) structure that can evolve, function, and exist independently of the rest of the protein chain. Many proteins have multiple domains linked together. ~75% of eukaryotic proteins have multiple structural domains. Each domain forms a compact three- dimensional structure and often can be independently stable and folded. Genome sequencing information and structural studies have revolutionised our understanding of protein structure - bioinformatics. Domains begin to fold as soon as they have been synthesised For 'small' proteins this is valid. For larger proteins, e.g. 1000 aa's, this is unlikely to ever fold by itself into the final structure. Sub-dividing the structure into domains which fold autonomously means that each domain reaches its energy minima (stable final conformation) sometimes while the rest of the protein is being synthesised on the ribosome. 'Linker' regions between domains are often unstructured. Act like a flexible hinge. • While identifying structures like α-helix or ß-sheet or domains (SH2 etc.) is straightforward, defining key residues within (for example) the active site of an enzyme can be difficult. This is particularly true in enzymes, because key residues at the active site may be very well spaced out in a protein primary sequence but end up close together in the folded protein.

What's this phase thing?

Waves are sine/cosine functions e.g. cos(0) = 1 cos(90) = 0 cos(180) = -1 Phase angle describes whether spot on detector is at peak, or trough of wave or somewhere in between when they add together we get constructive interference that'll give you the spots you can't detect phase from X-ray crystallography arrangement of atoms relation to diffraction pattern - Fourier transform

Simple Rule for Scoring Alignments

We give a score to each possible column, then add scores of an alignment's columns. Let a match (column with identical symbols) score 1 and each other column score −1. For example:

NMR

We can garner even more information by examining how the spins on dif- ferent protons affect their neighbors. By inducing a transient magnetization in a sample through the application of a radio-frequency pulse, we can alter the spin on one nucleus and examine the effect on the spin of a neighboring nucleus. Especially revealing is a two-dimensional spectrum obtained by nuclear Overhauser enhancement spectroscopy (NOESY), which graphically displays pairs of protons that are in close proximity, even if they are not close together in the primary structure. The basis for this technique is the nuclear Overhauser effect (NOE), an interaction between nuclei that is proportional to the inverse sixth power of the distance between them. Magnetization is transferred from an excited nucleus to an unexcited one if the two nuclei are less than about 5 Å apart (Figure 3.45A). In other words, the effect provides a means of detecting the location of atoms relative to one another in the three-dimensional structure of the protein. The peaks that lie along the diagonal of a NOESY spectrum (shown in white in Figure 3.45B) correspond to those present in a one-dimen- sional NMR experiment. The peaks apart from the diagonal (shown in red in Figure 3.45B), referred to as off-diagonal peaks or cross-peaks, provide crucial new information: they identify pairs of protons that are less than 5 Å apart. A two-dimensional NOESY spectrum for a protein comprising 55 amino acids is shown in Figure 3.46. The large number of off-diagonal peaks reveals short proton-proton distances. The three-dimensional structure of a protein can be reconstructed with the use of such proximity relations. Structures are calculated such that protons that must be separated by less than 5 Å on the basis of NOESY spectra are close to one another in the three-dimensional structure (Figure 3.47). If a sufficient number of distance constraints are applied, the three-dimensional structure can nearly be determined uniquely. In practice, a family of related structures is generated by NMR spectros- copy for three reasons (Figure 3.48). First, not enough constraints may be experimentally accessible to fully specify the structure. Second, the distances obtained from analysis of the NOESY spectrum are only approximate. Finally, the experimental observations are made not on single molecules but on a large number of molecules in solution that may have slightly different structures at any given moment. Thus, the family of structures generated from NMR structure analysis indicates the range of conformations for the protein in solution. At present, structural determination by NMR spectros- copy is generally limited to proteins less than 50 kDa, but its resolving power is certain to increase. The power of NMR has been greatly enhanced by the ability of recombinant DNA technology to produce proteins labeled uni- formly or at specific sites with 13C, 15N, and 2H (Chapter 5). The structures of nearly 97,000 proteins had been elucidated by x-ray crystallography and NMR spectroscopy by the end of 2013 and several new structures are now determined each day. The coordinates are collected at the Protein Data Bank (www.pdb.org), and the structures can be accessed for visualization and analysis. Knowledge of the detailed molecular architec- ture of proteins has been a source of insight into how proteins recognize and bind other molecules, how they function as enzymes, how they fold, and how they evolved. This extraordinarily rich harvest is continuing at a rapid pace and is greatly influencing the entire field of biochemistry as well as other biological and physical sciences.

databases can be searched to identify homologous sequences

When the sequence of a protein is first determined, comparing it with all previously characterized sequences can be a source of tremendous insight into its evolutionary relatives and, hence, its structure and function. Indeed, an extensive sequence comparison is almost always the first analysis per- formed on a newly elucidated sequence. The sequence-alignment methods just described are used to compare an individual sequence with all members of a database of known sequences. Database searches for homologous sequences are most often accom- plished by using resources available on the Internet at the National Center for Biotechnology Information (www.ncbi.nih.gov). The procedure used is referred to as a BLAST (Basic Local Alignment Search Tool) search. An amino acid sequence is typed or pasted into the Web browser, and a search is performed, most often against a nonredundant database of all known sequences. At the end of 2013, this database included more than 35 million sequences. A BLAST search yields a list of sequence alignments, each accompanied by an estimate giving the likelihood that the alignment occurred by chance (Figure 6.14). In 1995, investigators reported the first complete sequence of the genome of a free-living organism, the bacterium Haemophilus influenzae. With the sequences available, they performed a BLAST search with each deduced protein sequence. Of 1743 identified protein-coding regions, also called open reading frames (ORFs), 1007 (58%) could be linked to some pro- tein of known function that had been previously characterized in another organism. An additional 347 ORFs could be linked to sequences in the database for which no function had yet been assigned ("hypothetical pro- teins"). The remaining 389 sequences did not match any sequence present in the database at that time. Thus, investigators were able to identify likely functions for more than half the proteins within this organism solely by sequence comparisons.

summary

X-ray crystallography and nuclear magnetic resonance spectroscopy have greatly enriched our understanding of how proteins fold, recognize other molecules, and catalyze chemical reactions. X-ray crystallography is possible because electrons scatter x-rays. The diffraction pattern pro- duced can be analyzed to reveal the arrangement of atoms in a protein. The three-dimensional structures of tens of thousands of proteins are now known in atomic detail. Nuclear magnetic resonance spectroscopy reveals the structure and dynamics of proteins in solution. The chemical shift of nuclei depends on their local environment. Furthermore, the spins of neighboring nuclei interact with each other in ways that provide definitive structural information. This information can be used to determine complete three-dimensional structures of proteins.

The Phase Problem in X-ray Crystallography

X-ray crystallography can provide detailed information about the structure of biological molecules if the 'phase problem' can be solved for the molecule under study. The phase problem arises because it is only possible to measure the amplitude of diffraction spots: information on the phase of the diffracted radiation is missing. Techniques are available to reconstruct this information. see printed paper

targeting signals

Targeting signals are the pieces of information that enable the cellular transport machinery to correctly position a protein inside or outside the cell. This information is contained in the polypeptide chain or in the folded protein. The continuous stretch of amino acid residues in the chain that enables targeting are called signal peptides or targeting peptides. There are two types of targeting peptides, the presequences and the internal targeting peptides. The presequences of the targeting peptide are often found at the N-terminal extension and is composed of between 6-136 basic and hydrophobic amino acids. In case of peroxisomes the targeting sequence is on the C-terminal extension mostly. Other signals, known as signal patches, are composed of parts which are separate in the primary sequence. They become functional when folding brings them together on the protein surface. In addition, protein modificationslike glycosylations can induce targeting.

Quaternary structure

Tertiary structure may give way to the formation of quaternary structure in some proteins, which usually involves the "assembly" or "coassembly" of subunits that have already folded; in other words, multiple polypeptide chains could interact to form a fully functional quaternary protein

Tertiary structure

The alpha helices and beta pleated sheets can be amphipathic in nature, or contain a hydrophilic portion and a hydrophobic portion. This property of secondary structures aids in the tertiary structure of a protein in which the folding occurs so that the hydrophilic sides are facing the aqueous environment surrounding the protein and the hydrophobic sides are facing the hydrophobic core of the protein.[13] Secondary structure hierarchically gives way to tertiary structure formation. Once the protein's tertiary structure is formed and stabilized by the hydrophobic interactions, there may also be covalent bonding in the form of disulfide bridges formed between two cysteine residues. Tertiary structure of a protein involves a single polypeptide chain; however, additional interactions of folded polypeptide chains give rise to quaternary structure formation

Intracellular Protein Targeting

All cells contain proteins that carry out specialized functions within various subcellular membranes or aqueous spaces Even a simple bacterium such as E. coli there are 5 possible destinations for a protein (cytoplasm, inner membrane, periplasm, outer membrane and extracellular medium) In addition to the plasma membrane eukaryotic cells also contain membrane bound organelles (nucleus, chloroplast, mitochondrion, peroxisome, lysosome etc) The vast majority of proteins are synthesised in the cytoplasm (small number in mitochondria and chloroplasts) How are proteins targeted to their correct cellular compartments?

OM Lipoproteins

E. coli has >90 different lipoproteins. Diverse roles (signalling, OM integrity, antibiotic resistance etc). Consensus sequence Leu-(Ala/Ser)-(Gly-Ala)-Cys around the signal peptide cleavage site. Lgt (diacylglycerol transferase) links a diacylglycerol group from PG to N-terminal Cys residue

Models of OM protein assembly

Early models suggested delivery of the unfolded substrate, via SurA to the polypeptide transport domains followed by folding within the BamA barrel and release into the OM.

EchoBASE

EchoBASE (http://www.ecoli-york.org) is a relational database designed to contain and manipulate information from post-genomic experiments using the model bacterium Escherichia coli K-12. Its aim is to collate information from a wide range of sources to provide clues to the functions of the approximately 1500 gene products that have no confirmed cellular function. The database is built on an enhanced annotation of the updated genome sequence of strain MG1655 and the association of experimental data with the E.coli genes and their products. Experiments that can be held within EchoBASE include proteomics studies, microarray data, protein-protein interaction data, structural data and bioinformatics studies. EchoBASE also contains annotated information on 'orphan' enzyme activities from this microbe to aid characterization of the proteins that catalyse these elusive biochemical reactions.

Protein Folding by GroEL-GroES Chaperonins

(1) Substrate protein may be delivered to GroEL by DnaK-DnaJ in a non-aggregated, but kinetically trapped, state. Upon binding to GroEL it undergoes local unfolding to an ensemble of expanded and more compact conformations. (2) ATP-dependent domain movement of the apical GroEL domains result in stretching of tightly bound regions of substrate and in release and partial compaction of less stably bound regions. (3) Compaction is completed upon substrate encapsulation by GroES. (4) Folding in the chaperonin cage.(5) Substrate release upon GroES dissociation.(6) Rebinding of incompletely folded states

Cooperativity in protein folding : How a globally optimal state canbe found without a global search

- Origin of cooperativity -- The probability of forming contact C2is much higher if C1 is formed than in the absence of C1. - Proteins fold by progressive stabilisation of intermediates

Single-particle electron microscopy

Electron microscopy (EM) in combination with image analysis is a powerful technique to study protein structures at low, medium, and high resolution. Since electron micrographs of biological objects are very noisy, improvement of the signal-to-noise ratio by image processing is an integral part of EM, and this is performed by averaging large numbers of individual projections. Averaging procedures can be divided into crystallographic and non-crystallographic methods. The crystallographic averaging method, based on two-dimensional (2D) crystals of (membrane) proteins, yielded in solving atomic protein structures in the last century. More recently, single particle analysis could be extended to solve atomic structures as well. It is a suitable method for large proteins, viruses, and proteins that are difficult to crystallize. Because it is also a fast method to reveal the low-to-medium resolution structures, the impact of its application is growing rapidly. Technical aspects, results, and possibilities are presented.

Post-translational translocation

Even though most secretory proteins are co-translationally translocated, some are translated in the cytosol and later transported to the ER/plasma membrane by a post-translational system. In prokaryotes this requires certain cofactors such as SecA and SecB. This pathway is facilitated by Sec62 and Sec63, two membrane-bound proteins. The Sec63 complex is embedded in the ER membrane. The Sec63 complex causes hydrolysis of ATP, which allows chaperone proteins to bind to an exposed peptide chain and slide the polypeptide into the ER lumen. Once in the lumen the polypeptide chain can be folded properly. This occurs in only unfolded proteins that are in the cytosol.[3] In addition, proteins targeted to other destinations, such as mitochondria, chloroplasts, or peroxisomes, use specialized post-translational pathways. Also, proteins targeted for the nucleus are translocated post-translation. They pass through the nuclear envelope via nuclear pores.

Transmissible Spongiform Encephalopathies (Prion diseases)

Examples: bovine spongiform encephalopathy (BSE, "mad cow disease"), scrapie (sheep), kuru (humans), Creutzfeldt-Jakob disease CJD (humans), chronic wasting diesase (elk, mule deer) Fatal, neurodegenerative diseases, with characteristic "holes" appearing in brain ("sponge"-like appearance) due to formation of amyloid plaques In prion diseases, there's a normal cellular protein (function often unknown, involving different proteins in different prion diseases) that also occurs in an abnormal conformation. Infectious agent is an abnormal protein, a "prion" ("proteinaceous infectious only") (Stanley Prusiner, Nobel Prize in Physiology/Medicine 1997). 1. Transmissible agent: various sized aggregates of a specific protein 2. Aggregates are resistant to treatment by most protein-degrading enzymes. 3. Protein is derived from a cellular protein

Homology Modelling

Experimental elucidation of a protein structure may often be delayed by difficulties in obtaining sufficient amount of material (cloning, expression and purification of milligram quantities of the protein) and difficulties associated with crystallisation. It is not surprising that methods dealing with the prediction of protein structure have gained much interest. Among these methods, the method of homology modelling usually provides the most reliable result. The use of this method is based on the observation that two proteins belonging to the same family (and sharing similar amino acid sequences), will have similar three-dimensional structures. In reality, the degree of conservation of protein three-dimensional structure within a family is much higher than conservation of the sequence. Proteins undergo changes in their amino acid sequences over evolutionary time, as a result of the accumulation of nonsynonymous mutations in their encoding genes. Genes and proteins act as "molecular clocks", accumulating changes at a relatively constant rate, as mutations occur with a certain probability each time a nucleotide is replicated. From the very beginning of molecular evolution studies, it became apparent that different proteins evolve at very different rates, each evolving according to its own "molecular clock". he subsequent accumulation of molecular data for other proteins revealed a huge diversity in proteins' rates of evolution. For instance, using DNA sequence data from 36 genes in different mammalian species

Driving forces of protein folding

Folding is a spontaneous process that is mainly guided by hydrophobic interactions, formation of intramolecular hydrogen bonds, van der Waals forces, and it is opposed by conformational entropy.[15] The process of folding often begins co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still being synthesized by the ribosome; however, a protein molecule may fold spontaneously during or after biosynthesis.[16] While these macromolecules may be regarded as "folding themselves", the process also depends on the solvent (water or lipid bilayer),[17] the concentration of salts, the pH, the temperature, the possible presence of cofactors and of molecular chaperones. Proteins will have limitations on their folding abilities by the restricted bending angles or conformations that are possible. These allowable angles of protein folding are described with a two-dimensional plot known as the Ramachandran plot, depicted with psi and phi angles of allowable rotation

bacteria lack a membrane bound nucleus // composition of cell membrane

GRAM POSITIVE - many layers of peptidoglycan surround cytoplasmic membrane GRAM NEGATIVE - MORE COMPLEX cytoplasm- inner membrane - periplasm ( peptidoglycan layers) - outer membrane ( phospholipids/lipopolysacchrides)

Organisation and Catalytic Properties of GroES- GroEL Chaperonins

GroEL (Hsp60) -2 homoheptameric rings forming a central barrel. It has an equatorial ATPase domain and an apical substrate binding domain at its open end. On substrate binding, ATP and heptameric GroES (Hsp10) are bound by the same ring displacing the unfolded substrate into an enclosed cavity. ATP hydrolysis powers protein refolding in an isolated environment. • • After ATP turnover, GroES and the substrate are released Some polypeptides require several cycles of binding and release to reach their native states Eukaryotic equivalent to GroEL is called TRiC -2 ring barrel of 8 distinct subunits

Protein Folding in vivo versus in vitro

In vivo: Anfinsen's work on RNaseA (and work on numerous other proteins) indicates that all the information required for attaining the correct 3D fold is contained within the primary sequence However in vivo a large fraction of cellular proteins require molecular chaperones for efficient folding and to minimise aggregation Same laws of physics apply in vitro and in vivo, so what's different? Whereas folding experiments in vitro are typically performed in dilute solution to minimize aggregation, in the cell, folding occurs in the presence of 200-300 g l-1 of protein. Because translation is relatively slow (~15-75 s for a 300-amino-acid protein), nascent chains are exposed in partially folded, aggregation- sensitive states for prolonged periods of time. Chaperones act not by contributing steric information to the folding process but rather by optimizing the efficiency of folding.

Structure and Function of the HSP70 system

Hsp40 delivers substrate to ATP bound Hsp70 ATP hydrolysis results in closing of the helical lid and tight binding of the substrate NEF mediates release of ADP and ATP binding induces lid opening and substrate release

Transport of OMPs through the periplasm

In E. coli, three chaperones have been reported to guide nascent OMPs during their intermediate periplasmic stage (Fig. 2): Skp, SurA and the protease DegP, which also has chaperone qualities (Spiess et al., 1999). Recent structural analysis showed that DegP in its activated state can form large oligomeric cage-like structures of 12 or 24 subunits that could harbour a folded OMP in its central cavity without degrading it (Krojer et al., 2008). None of these chaperones is essential in E. coli, but double mutants show synthetic, often lethal, phenotypes, suggesting redundancy in chaperone activities. Detailed analyses of single and double mutants suggested the existence of two parallel pathways of chaperone activity in the periplasm, a major SurA-dependent route and an alternative Skp- and DegP- dependent route that deals with substrates that fall off the SurA pathway (Rizzitello et al., 2001; Sklar et al., 2007b). However, skp and degP mutations have also been reported to show a synthetic phenotype (Scha ̈fer et al., 1999), which is inconsistent with the idea that these chaperones operate within the same pathway. Furthermore, a recent proteomic analysis indicated that SurA has only a few substrates, including the OMP LptD, which is involved in LPS biogenesis, and that the reduced levels of many other OMPs in surA mutants may be solely a consequence of E activation of the s -dependent stress response (Vertommen et al., 2009). The study of Vertommen and colleagues argues against the hypothesis that the SurA pathway is the major periplasmic chaperone pathway for OMPs in the periplasm. An alternative explanation for the synthetic phenotypes of double chaperone mutants is that these proteins have different, but complementary functions (Bos et al., 2007a; Walther et al., 2009b). Skp selectively binds unfolded OMPs (Chen & Henning, 1996; de Cock et al., 1999), presumably while they are still engaged with the Sec translocon (Harms et al., 2001). The crystal structure of this trimeric protein has been solved (Korndo ̈rfer et al., 2004; Walton & Sousa, 2004); it resembles a jellyfish that can hold nascent OMPs between its tentacles, thereby preventing their aggregation in the aqueous environment of the periplasm (Walton et al., 2009). SurA appears to play a role in the folding of OMPs into their native configuration (Lazar & Kolter, 1996; Rouvie`re & Gross, 1996). SurA is a peptidyl-prolyl cis/trans isomerase (PPIase) with two PPIase domains, which, however, appear to be dispensable for the chaperone qualities of the protein (Behrens et al., 2001). In this model, Skp is a 'holding chaperone' that prevents folding and aggregation of OMPs in the periplasm, whereas SurA acts as a 'folding chaperone' that assists in the folding of OMPs once they arrive at the assembly machinery in the outer membrane.

Protein folding and misfolding in the cell

In a cell, proteins are synthesized on ribosomes from the genetic information encoded in the cellular DNA. Folding in vivo is in some cases co-translational; that is, it is initiated before the completion of protein synthesis, whereas the nascent chain is still attached to the ribosome26. Other proteins, however, undergo the major part of their folding in the cytoplasm after release from the ribosome, whereas yet others fold in specific compartments, such as mitochondria or the endoplasmic reticulum (ER), after trafficking and translocation through membranes27,28. Many details of the folding process depend on the particular environment in which folding takes place, although the fundamental principles of folding, discussed above, are undoubtedly universal. But because incompletely folded proteins must inevitably expose to the solvent at least some regions of structure that are buried in the native state, they are prone to inappropriate interaction with other molecules within the crowded environment of a cell29. Living systems have therefore evolved a range of strategies to prevent such behaviour27,28,29. Of particular importance in this context are the many molecular chaperones that are present in all types of cells and cellular compartments. Some chaperones interact with nascent chains as they emerge from the ribosome, whereas others are involved in guiding later stages of the folding process27,28. Molecular chaperones often work in tandem to ensure that the various stages in the folding of such systems are all completed efficiently. Many of the details of the functions of molecular chaperones have been determined from studies of their effects on folding in vitro. The best characterized of the chaperones studied in this manner is the bacterial complex involving GroEL, a member of the family of 'chaperonins', and its 'co-chaperone' GroES. Many aspects of the sophisticated mechanism through which this coupled system functions are now well understood27,28. Of particular interest is that GroEL, and other members of this class of molecular chaperone, contains a cavity in which incompletely folded polypeptide chains can enter and undergo the final steps in the formation of their native structures while sequestered and protected from the outside world. Molecular chaperones do not themselves increase the rate of individual steps in protein folding; rather, they increase the efficiency of the overall process by reducing the probability of competing reactions, particularly aggregation. However, there are several classes of folding catalyst that accelerate potentially slow steps in the folding process. The most important are peptidylprolyl isomerases, which increase the rate of cis-trans isomerization of peptide bonds involving proline residues, and protein disulphide isomerases, which enhance the rate of formation and reorganization of disulphide bonds30. Despite these factors, given the enormous complexity and the stochastic nature of the folding process, it would be remarkable if misfolding never occurred. Clear evidence that molecular chaperones are needed to prevent misfolding and its consequences comes from the fact that the concentrations of many of these species are substantially increased during cellular stress; indeed, the designation of many as heat shock proteins (Hsps) reflects this fact. It is also clear that some molecular chaperones are able not only to protect proteins as they fold but also to rescue misfolded and even aggregated proteins and enable them to have a second chance to fold correctly27,28. Active intervention in the folding process requires energy, and ATP is required for most of the molecular chaperones to function with full efficiency. In eukaryotic systems, many of the proteins that are synthesized in a cell are destined for secretion to the extracellular environment. These proteins are translocated into the ER, where folding takes place before secretion through the Golgi apparatus. The ER contains a wide range of molecular chaperones and folding catalysts, and in addition the proteins that fold here must satisfy a 'quality-control' check before being exported (Fig. 2)31,32. Such a process is particularly important because there seem to be few molecular chaperones outside the cell, although one (clusterin), at least, has recently been discovered33. This quality-control mechanism involves a remarkable series of glycosylation and deglycosylation reactions that enables correctly folded proteins to be distinguished from misfolded ones31. The importance of these regulatory systems is underlined by recent experiments that suggest that a large fraction of all polypeptide chains synthesized in a cell fail to pass this test and are targeted for degradation34. Like the 'heat shock response' in the cytoplasm, the 'unfolded protein response' in the ER is also stimulated (upregulated) during stress and, as we shall see below, is strongly linked to the avoidance of misfolding diseases Published: 18 December 2003 Protein folding and misfolding Christopher M. Dobson Nature volume 426, pages884-890(2003)Cite this article 8735 Accesses 3107 Citations 33 Altmetric Metrics details Abstract The manner in which a newly synthesized chain of amino acids transforms itself into a perfectly folded protein depends both on the intrinsic properties of the amino-acid sequence and on multiple contributing influences from the crowded cellular milieu. Folding and unfolding are crucial ways of regulating biological activity and targeting proteins to different cellular locations. Aggregation of misfolded proteins that escape the cellular quality-control mechanisms is a common feature of a wide range of highly debilitating and increasingly prevalent diseases. You have full access to this article via University of Glasgow Download PDF Main One of the defining characteristics of a living system is the ability of even the most intricate of its component molecular structures to self-assemble with precision and fidelity. Uncovering the mechanisms through which such processes take place is one of the grand challenges of modern science1. The folding of proteins into their compact three-dimensional structures is the most fundamental and universal example of biological self-assembly; understanding this complex process will therefore provide a unique insight into the way in which evolutionary selection has influenced the properties of a molecular system for functional advantage. The wide variety of highly specific structures that result from protein folding and that bring key functional groups into close proximity has enabled living systems to develop astonishing diversity and selectivity in their underlying chemical processes. In addition to generating biological activity, however, we now know that folding is coupled to many other biological processes, including the trafficking of molecules to specific cellular locations and the regulation of cellular growth and differentiation2. In addition, only correctly folded proteins have long-term stability in crowded biological environments and are able to interact selectively with their natural partners. It is therefore not surprising that the failure of proteins to fold correctly, or to remain correctly folded, is the origin of a wide variety of pathological conditions. In this article we explore the underlying mechanism of protein folding and of the nature and consequences of misfolding and its links with disease. The fundamental mechanism of protein folding The concept of an energy landscape The mechanism by which a polypeptide chain folds to a specific three-dimensional protein structure has until recently been shrouded in mystery. Native states of proteins almost always correspond to the structures that are most thermodynamically stable under physiological conditions3. Nevertheless, the total number of possible conformations of any polypeptide chain is so large that a systematic search for this particular structure would take an astronomical length of time. However, it is now clear that the folding process does not involve a series of mandatory steps between specific partly folded states, but rather a stochastic search of the many conformations accessible to a polypeptide chain3,4,5. The inherent fluctuations in the conformation of an unfolded or incompletely folded polypeptide chain enable even residues that are highly separated in the amino-acid sequence to come into contact with one other. Because, on average, native-like interactions between residues are more stable than non-native ones, they are more persistent and the polypeptide chain is able to find its lowest-energy structure by a process of trial and error. Moreover, if the energy surface or 'landscape' has the right shape (see Fig. 1) only a small number of all possible conformations needs to be sampled by any given protein molecule during its transition from a random coil to a native structure3,4,5,6. Because the landscape is encoded by the amino-acid sequence, natural selection has enabled proteins to evolve so that they are able to fold rapidly and efficiently. Figure 1: A schematic energy landscape for protein folding. The surface is derived from a computer simulation of the folding of a highly simplified model of a small protein. The surface 'funnels' the multitude of denatured conformations to the unique native structure. The critical region on a simple surface such as this one is the saddle point corresponding to the transition state, the barrier that all molecules must cross if they are to fold to the native state. Superimposed on this schematic surface are ensembles of structures corresponding to different stages of the folding process. The transition state ensemble was calculated by using computer simulations constrained by experimental data from mutational studies of acylphosphatase18. The yellow spheres in this ensemble represent the three 'key residues' in the structure; when these residues have formed their native-like contacts the overall topology of the native fold is established. The structure of the native state is shown at the bottom of the surface; at the top are indicated schematically some contributors to the distribution of unfolded species that represent the starting point for folding. Also indicated on the surface are highly simplified trajectories for the folding of individual molecules. Adapted from ref. 6. Full size image Such a description, based more on the ideas of statistical mechanics and polymer physics than on those of classic chemical dynamics, is often referred to as the 'new view' of protein folding7. As well as providing a firm conceptual basis for folding, it has shown that many of the earlier phenomenological descriptions of the folding process are important limiting cases of a general mechanism. These ideas are stimulating the investigation of the most elementary steps in the folding process by both experimental and theoretical procedures. For example, biophysical measurements and computer simulations have revealed that many of the local elements of protein structures can be generated very rapidly; for example, individual α-helices are able to form in less than 100 ns, and β-turns in as little as 1 µs (refs 8, 9). Indeed, the folding in vitro of some of the simplest proteins, such as small helical bundles, is completed in less than 50 µs (refs 10, 11). Intriguingly, some other small proteins, particularly those based on β-sheet structures, can take many orders of magnitude longer to fold, as we see below, but such rate changes can be understood to a significant extent in terms of the characteristics of the native structures12. A key question is how does the correct fold emerge from such fundamental steps; that is, how is the energy landscape unique to a specific protein defined by its amino-acid sequence. The structural transitions taking place during folding in vitro can be investigated in detail by a variety of techniques, ranging from optical methods to NMR spectroscopy3, some of which can now even be used to follow the behaviour of single molecules13. The latter capability is of particular significance in the context of probing the stochastic nature of the folding process (see Fig. 1). Studies of a series of small proteins, typically with 60-100 residues, have been crucial for investigating the most basic steps in folding because these proteins convert from their unfolded states to their native states without the complication of highly populated intermediates. For these systems, monitoring the effects of specific mutations on the kinetics of folding and unfolding has proved to be a seminal technique, because of its ability to probe the role of individual residues in the folding process14. Particular insight has come from the use of this approach to analyse the transition states for folding, namely the critical regions of energy surfaces through which all molecules must pass to reach the native fold (see Fig. 1). The results of many studies of these species suggest that the fundamental mechanism of protein folding involves the interaction of a relatively small number of residues to form a folding nucleus, about which the remainder of the structure rapidly condenses15. More details of how such a mechanism is able to generate a unique fold have emerged from a range of theoretical studies, particularly involving computer simulation techniques16. Of particular significance are investigations that compare the simulation results with experimental observations6,17. One approach incorporates experimental measurements directly into the simulations as restraints limiting the regions of conformational space that are explored in each simulation; this strategy has enabled rather detailed structures to be generated for transition states18 (see Fig. 1). The results suggest that, despite a high degree of disorder, these structures have the same overall topology as the native fold. In essence, interactions involving the key residues force the chain to adopt a rudimentary native-like architecture. Although it is not yet clear exactly how the sequence encodes such characteristics, the essential elements of the fold are likely to be determined primarily by the pattern of hydrophobic and polar residues that favours preferential interactions of specific residues as the structure becomes increasingly compact. Once the correct topology has been achieved, the native structure will then almost invariably be generated during the final stages of folding18. Conversely, if these key interactions are not formed, the protein cannot fold to a stable globular structure; this mechanism therefore acts also as a 'quality-control' process by which misfolding can generally be avoided. The determinants of protein folds Secondary structure, the helices and sheets that are found in nearly every native protein structure, is stabilized primarily by hydrogen bonding between the amide and carbonyl groups of the main chain. The formation of such structure is an important element in the overall folding process, although it might not have as fundamental a role as the establishment of the overall chain topology19. Perhaps the most dramatic evidence for such a conclusion is the observation of a remarkable correlation between the experimental folding rates of a wide range of small proteins and the complexity of their folds, measured by the contact order12. The latter is the average separation in the sequence between residues that are in contact with each other in the native structure. The existence of such a correlation can be rationalized by the argument that a stochastic search process will be more time consuming if the residues that form the nucleus are further away from each other in the sequence. This evidence strongly supports the conclusion that there are relatively simple underlying principles by which the sequence of a protein encodes its structure20. Not only will the establishment of such principles reveal in more depth how proteins are able to fold, but it should advance significantly our ability to predict protein folds directly from their sequences and to design sequences that encode novel folds. For proteins with more than about 100 residues, experiments generally reveal that one (or more) intermediate is significantly populated during the folding process. There has, however, been considerable discussion about the significance of such species: whether they assist the protein to find its correct structure or whether they are traps that inhibit the folding process21,22,23. Regardless of the outcome of this debate, the structural properties of intermediates provide important evidence about the folding of these larger proteins. In particular, they suggest that these proteins generally fold in modules, in other words, folding can take place largely independently in different segments or domains of the protein6,14. In such cases, interactions involving key residues are likely to establish the native-like fold within local regions or domains and also to ensure that the latter then interact appropriately to form the correct overall structure23,24. The fully native structure is only acquired when all the native-like interactions have been formed both within and between the domains; this happens in a final cooperative folding step when all the side chains become locked in their unique close-packed arrangement and water is excluded from the protein core25. This modular mechanism is appealing because it suggests that highly complex structures might be assembled in manageable pieces. Moreover, such a principle can readily be extended to describe the assembly of other macromolecules, particularly nucleic acids, and even large 'molecular machines' such as the ribosome. Protein folding and misfolding in the cell In a cell, proteins are synthesized on ribosomes from the genetic information encoded in the cellular DNA. Folding in vivo is in some cases co-translational; that is, it is initiated before the completion of protein synthesis, whereas the nascent chain is still attached to the ribosome26. Other proteins, however, undergo the major part of their folding in the cytoplasm after release from the ribosome, whereas yet others fold in specific compartments, such as mitochondria or the endoplasmic reticulum (ER), after trafficking and translocation through membranes27,28. Many details of the folding process depend on the particular environment in which folding takes place, although the fundamental principles of folding, discussed above, are undoubtedly universal. But because incompletely folded proteins must inevitably expose to the solvent at least some regions of structure that are buried in the native state, they are prone to inappropriate interaction with other molecules within the crowded environment of a cell29. Living systems have therefore evolved a range of strategies to prevent such behaviour27,28,29. Of particular importance in this context are the many molecular chaperones that are present in all types of cells and cellular compartments. Some chaperones interact with nascent chains as they emerge from the ribosome, whereas others are involved in guiding later stages of the folding process27,28. Molecular chaperones often work in tandem to ensure that the various stages in the folding of such systems are all completed efficiently. Many of the details of the functions of molecular chaperones have been determined from studies of their effects on folding in vitro. The best characterized of the chaperones studied in this manner is the bacterial complex involving GroEL, a member of the family of 'chaperonins', and its 'co-chaperone' GroES. Many aspects of the sophisticated mechanism through which this coupled system functions are now well understood27,28. Of particular interest is that GroEL, and other members of this class of molecular chaperone, contains a cavity in which incompletely folded polypeptide chains can enter and undergo the final steps in the formation of their native structures while sequestered and protected from the outside world. Molecular chaperones do not themselves increase the rate of individual steps in protein folding; rather, they increase the efficiency of the overall process by reducing the probability of competing reactions, particularly aggregation. However, there are several classes of folding catalyst that accelerate potentially slow steps in the folding process. The most important are peptidylprolyl isomerases, which increase the rate of cis-trans isomerization of peptide bonds involving proline residues, and protein disulphide isomerases, which enhance the rate of formation and reorganization of disulphide bonds30. Despite these factors, given the enormous complexity and the stochastic nature of the folding process, it would be remarkable if misfolding never occurred. Clear evidence that molecular chaperones are needed to prevent misfolding and its consequences comes from the fact that the concentrations of many of these species are substantially increased during cellular stress; indeed, the designation of many as heat shock proteins (Hsps) reflects this fact. It is also clear that some molecular chaperones are able not only to protect proteins as they fold but also to rescue misfolded and even aggregated proteins and enable them to have a second chance to fold correctly27,28. Active intervention in the folding process requires energy, and ATP is required for most of the molecular chaperones to function with full efficiency. In eukaryotic systems, many of the proteins that are synthesized in a cell are destined for secretion to the extracellular environment. These proteins are translocated into the ER, where folding takes place before secretion through the Golgi apparatus. The ER contains a wide range of molecular chaperones and folding catalysts, and in addition the proteins that fold here must satisfy a 'quality-control' check before being exported (Fig. 2)31,32. Such a process is particularly important because there seem to be few molecular chaperones outside the cell, although one (clusterin), at least, has recently been discovered33. This quality-control mechanism involves a remarkable series of glycosylation and deglycosylation reactions that enables correctly folded proteins to be distinguished from misfolded ones31. The importance of these regulatory systems is underlined by recent experiments that suggest that a large fraction of all polypeptide chains synthesized in a cell fail to pass this test and are targeted for degradation34. Like the 'heat shock response' in the cytoplasm, the 'unfolded protein response' in the ER is also stimulated (upregulated) during stress and, as we shall see below, is strongly linked to the avoidance of misfolding diseases35. Figure 2: Regulation of protein folding in the ER. Many newly synthesized proteins are translocated into the ER, where they fold into their three-dimensional structures with the help of a series of molecular chaperones and folding catalysts (not shown). Correctly folded proteins are then transported to the Golgi complex and then delivered to the extracellular environment. However, incorrectly folded proteins are detected by a quality-control mechanism and sent along another pathway (the unfolded protein response) in which they are ubiquitinated and then degraded in the cytoplasm by proteasomes. Adapted from ref. 32. Full size image Folding and unfolding are the ultimate ways of generating and abolishing specific types of cellular activity. In addition, processes as apparently diverse as translocation across membranes, trafficking, secretion, the immune response and regulation of the cell cycle are directly dependent on folding and unfolding events2. Failure to fold correctly, or to remain correctly folded, will therefore give rise to the malfunctioning of living systems and hence to disease36,37,38. Some of these diseases (such as cystic fibrosis36 and some types of cancer39) result from proteins folding incorrectly and not being able to exercise their proper function; many such disorders are familial because the probability of misfolding is often greater in mutational variants. In other cases, proteins with a high propensity to misfold escape all the protective mechanisms and form intractable aggregates within cells or (more commonly) in extracellular space. An increasing number of disorders, including Alzheimer's and Parkinson's diseases, the spongiform encephalopathies and type II diabetes, are directly associated with the deposition of such aggregates in tissues, including the brain, heart and spleen37,38,40,41. In the next section we look at the formation of these species.

Gram-positive bacteria

In most gram-positive bacteria, certain proteins are targeted for export across the plasma membrane and subsequent covalent attachment to the bacterial cell wall. A specialized enzyme, sortase, cleaves the target protein at a characteristic recognition site near the protein C-terminus, such as an LPXTG motif (where X can be any amino acid), then transfers the protein onto the cell wall. Several analogous systems are found that likewise feature a signature motif on the extracytoplasmic face, a C-terminal transmembrane domain, and cluster of basic residues on the cytosolic face at the protein's extreme C-terminus. The PEP-CTERM/exosortase system, found in many Gram-negative bacteria, seems to be related to extracellular polymeric substance production. The PGF-CTERM/archaeosortase A system in archaea is related to S-layer production. The GlyGly-CTERM/rhombosortase system, found in the Shewanella, Vibrio, and a few other genera, seems involved in the release of proteases, nucleases, and other enzymes.

Protein aggregation and amyloid formation

In protein misfolding, protein molecule is converted into non-native state. These misfolded proteins are kinetically trapped in local energy minima. Misfolding generally occurs due to dominant-negative mutations, from changes in environmental conditions (pH temperature, protein concentration), error in posttranslational modifications (phosphorylation, advanced glycation, deamidation, etc.), increase in the rate of degradation, error in trafficking, loss of binding partners and oxidative damage[68]. These factors can act either independently of each other or simultaneously[8]. Misfolded protein or partially folded intermediates have large patches of contiguous surface hydrophobicity and therefore aggregate more readily than native and unfolded state which have hydrophobic amino acid located at the interior core of protein and lie scattered in the polypeptide chain respectively. These partially misfolded intermediates aggregate by interacting with complementary intermediate and consequently give rise to the formation of oligomers thereby proto-fibrils and fibrils. These proteinaceous fibril seeds can therefore serve as self-propagating agents for the instigation and progression of disease. The alzheimer��s disease and other cerebral proteopathies seem to arise from the de novo misfolding and sustained corruption of endogenous proteins, whereas prion diseases can also be infectious in origin[69]. Recently, several independent lines of studies on different proteins indicate that oligomers might be the most toxic species in the misfolding and aggregation pathway[70-72]. This is validated by the findings that early aggregate of A�� peptides, �� synuclein[73], transthyretin[74] lead to the formation of AD, PD and ALS disease[75,76,73]. Lack of a direct correlation between the fibrillar plaque density and the severity of the clinical symptoms in patients suffering with AD or PD further justify that early aggregates are more toxic entities[77]. Furthermore, when transgenic mouse models were exposed to early aggregates disease-like phenotypes appeared in these mouse[78]. Both amyloid oligomers and fibrils are formed via a variety of pathways including reversible association of native monomers, aggregation of conformationally altered monomer, aggregation of chemically modified product, nucleation-elongation polymerization and surface induced aggregation[79].Thus giving rise to diverse fibril structures or polymorphism[80]. Additional polymorphisms arise when the same polypeptide chain occurs in a range of structurally different morphologies[79]. Among these fibrillation pathways, nucleation-elongation polymerization is generally more accepted (Figure 3).

In vitro protein synthesis led to the discovery that the mRNAs for secreted proteins are attached to membranes, whereas cytosolic proteins are made on free polysomes

MEMBRANE PROTEINS

Key point is that the orientation of a protein in the membrane is established when it is first inserted into the membrane. This orientation of the protein persists all of the way to its final destination. That is, the cytosolic side of membrane remains on the cytosolic side throughout all processes. As membrane proteins are being translated, they are translocated or transferred into the ER until a hydrophobic membrane crossing domain is encountered. This serves as a 'stop transfer' signal and leaves the protein inserted in the ER membrane.Import of a membrane protein. This figure illustrates the case of a protein being incorporated in the membrane of the endoplasmic reticulum, but import into organellar membranes works much the same way. The blue sheath-like component shown in the figure is the transport complex that moves the protein through the membrane. This example is a single pass membrane protein that contains a single membrane crossing domain.The hydrophobic trans-membrane domain holds the protein in the membrane because of the very strong hydrophobic interaction between this part of the protein and the hydrophobic membrane core. insertion of a double pass membrane protein into the membrane. The signal sequence is not at the N terminus and is not removed. Transfer continues until a stop signal is reached. There may be more than one pair of start and stop transfer signals. Transfer is reinitiated with each start transfer signal. This means that at each transfer stop signal (membrane crossing domain) the ribosome becomes detached from the ER membrane. If later a start transfer sequence is encounters, it binds to a new SRP and forms a new association between the ribosome and the ER membrane that leads to the insertion of the start transfer sequence and the following amino acids up to and including either the C terminal end or a stop transfer sequence, which ever is encountered first. Proteins with multiple membrane crossing domains are inserted in the the membrane through the action of multiple pairs of start transfer and stop transfer signals: There are two major categories of hydrophobic signals used in insertion of membrane proteins. All of these are membrane crossing domains: Start transfer sequences. These are of two types: N-terminal signal peptide sequence - a cluster of about 8 hydrophobic amino acids at the N-terminal end of a protein. This sequence remains in the membrane and is cleaved off of the protein after transfer through the membrane.Internal start transfer sequence. Similar to a signal sequence, but located internally (not at the N terminal end of the protein). It also binds to the SRP and initiates transfer. Unlike the N-terminal signal sequence, it is not cleaved after transfer of the protein. Stop transfer signal. This is also a sequence of about 8 hydrophobic amino acid residues. It follows either a N-terminal signal sequence or a start transfer sequence. The stop transfer signal is a membrane crossing domain. It remains in the membrane. The peptide is not cleaved. This process of membrane insertion has a very important result: It establishes orientation of membrane proteins. Recall the earlier discussion of 'sidedness of membranes'. This is one of the chief ways that 'sidedness' happens. Notice that the C-terminal end of the protein is on the cytosolic side of the membrane and the N-terminal end is not in the cytosol, but on the inside of the ER, or organelle.

General Rules for Targeting Proteins Synthesised on Soluble Ribosomes

Mitochondrial, nuclear and peroxisomal proteins (chloroplasts and glyoxisomes in plants) are all made on soluble ribosomes and transported to the organelles across one or two membranes in a post-translational manner Signal sequences can be N-terminal (mitochondria, cleavable or non-cleavable), C-terminal (peroxisomes) or in the middle of the protein (nucleus) Signal sequences on these proteins interact specifically with specific receptors on surfaces of the organelles Nascent mitochondrial proteins in transit must be maintained in an 'unfolded' (translocation-competent) state

How proteins fold: The Levinthal Paradox

Levinthal's paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 3300 or 10143 was made in one of his papers[1] (often incorrectly cited as the 1968 paper[2]). For example, a polypeptide of 100 residues will have 99 peptide bonds, and therefore 198 different phi and psi bond angles. If each of these bond angles can be in one of three stable conformations, the protein may misfold into a maximum of 3198 different conformations (including any possible folding redundancy). Therefore, if a protein were to attain its correctly folded configuration by sequentially sampling all the possible conformations, it would require a time longer than the age of the universe to arrive at its correct native conformation. This is true even if conformations are sampled at rapid (nanosecond or picosecond) rates. The "paradox" is that most small proteins fold spontaneously on a millisecond or even microsecond time scale. The solution to this paradox has been established by computational approaches to protein structure prediction.[3] Levinthal himself was aware that proteins fold spontaneously and on short timescales. He suggested that the paradox can be resolved if "protein folding is sped up and guided by the rapid formation of local interactions which then determine the further folding of the peptide; this suggests local amino acid sequences which form stable interactions and serve as nucleation points in the folding process".[4] Indeed, the protein folding intermediates and the partially folded transition states were experimentally detected, which explains the fast protein folding. This is also described as protein folding directed within funnel-like energy landscapes[5][6][7] Some computational approaches to protein structure prediction have sought to identify and simulate the mechanism of protein folding.[8] Levinthal also suggested that the native structure might have a higher energy, if the lowest energy was not kinetically accessible. An analogy is a rock tumbling down a hillside that lodges in a gully rather than reaching the base Try all possible conformations? If you assume that each amino acid in a small 100 aa protein can exist in 3 different conformations, then total number of possible conformations is 5 x 1047. If you then assume that the time taken to convert between any 2 conformations is 10-13 sec, then total time required to sample all possible states randomly is 5 x 1034 sec or 1.6 x 1027 years. Paradox reveals that proteins cannot fold randomly trying every possible conformation to find the most stable one.

The Tat system: How to get a folded protein across a membrane

Minimal components of the Tat system in E. coli are TatA, TatB and TatC Substrates bind initially to the TatBC complex via their signal peptide, this complex recruits TatA to effect translocation. The big question is: How do we get a large folded protein across the membrane while maintaining membrane integrity?

The Diversity of Transport Systems

Mitochondria, gram-negative bacteria, and chloroplasts (56) have multiple, and branched, translocation pathways. Upon reaching the intermembrane space, mitochondrial pre- proteins enter divergent pathways (Fig. 1C): b-barrel proteins integrate into the outer membrane by means of a specialized outer membrane translocase (57, 58); some pro- teins remain in the intermembrane space; and proteins with matrix-targeting prese- quences use the TIM translocase, exploiting two energy sources, the membrane potential DY and the mHsp70 adenosine triphospha- tase (ATPase). Apolar inner membrane proteins use a separate inner membrane translocase system that needs DY but not ATP (59). Several particularly apolar pro- teins, encoded by the mitochondrial genome, are synthesized in the matrix and inserted into the inner membrane with the help of an inner membrane protein, Oxa1p (60). Other multimembrane systems are just as complex. Bacteria have the Sec translocase, TAT trans- locase, and YidC [homologous to Oxa1p (61)] for proteins entering the plasma membrane or periplasm, five transport systems for crossing the outer membrane, and distinct transport systems for coordinated transport across the inner membrane, outer membrane, and a target-cell membrane (Fig. 2). Chloro- plasts (Fig. 1D) import proteins across two membranes and into the stroma by a coupled outer membrane and inner membrane transit pathway (56). From the stroma, import path- ways into the thylakoid membrane and lumen are varied but markedly similar to bacterial protein export. Not all protein translocation is dependent on an unfolded conformation. The TAT bac- terial translocase exports proteins bearing a unique twin-arginine motif (62). Not only is this translocation uncoupled from ongoing protein synthesis, but it accommodates fully folded proteins that remain folded during membrane transit. The proteins translocated by TAT can be oligomeric, with some sub- unit(s) providing the TAT recognition motif, whereas others are translocated ''piggyback,'' solely by virtue of their association with the TAT-motif-tagged subunit (63). TAT translo- case subunits can oligomerize, suggesting a means for providing large transport pores (64). Oligomeric folded proteins are also imported into the peroxisome. Translocation is not limited to a single membrane or two membranes; chloroplasts have three distinct membrane layers, with unique aqueous spaces between each, and the type III transport systems of pathogenic bacteria can inject pro- teins across both bacterial envelope bilayers as well as across the target-cell plasma membrane.

Mechanism of OM protein folding and insertion

Model of the final step in OMP biogenesis catalyzed by BamA. (a) Unfolded but stabilized OMP protein is aligned at the POTRA interface. (b) Strand (pairs) subsequently entering the cavity provided by BamA (c) Further accessing -hairpin elements may lead to additional folding events and laterally force the 1/16 fracture to open temporarily. While part of the protein is released into the membrane, additional parts of the OMP can access the cavity until the entire protein has passed through the folding chamber and can finally be released into the outer membrane.

Molecular Chaperones/Heat Shock Proteins

Molecular Chaperones- bind and stabilise unfolded or partly- folded proteins via exposed hydrophobic regions, thereby preventing them from aggregating and/or being degraded Chaperonins- form a small folding chamber into which individual protein molecules can be sequestered providing time and a suitable environment for it to fold properly, usually in an ATP- dependent manner.

Co-translational translocation

Most proteins that are secretory, membrane-bound, or reside in the endoplasmic reticulum (ER), golgi or endosomes use the co-translational translocation pathway. This process begins with the N-terminal signal peptide of the protein being recognized by a signal recognition particle (SRP) while the protein is still being synthesized on the ribosome. The synthesis pauses while the ribosome-protein complex is transferred to an SRP receptor on the ER in eukaryotes, and the plasma membrane in prokaryotes. There, the nascent protein is inserted into the translocon, a membrane-bound protein conducting channel composed of the Sec61 translocation complex in eukaryotes, and the homologous SecYEGcomplex in prokaryotes. In secretory proteins and type I transmembrane proteins, the signal sequence is immediately cleaved from the nascent polypeptide once it has been translocated into the membrane of the ER (eukaryotes) or plasma membrane (prokaryotes) by signal peptidase. The signal sequence of type II membrane proteins and some polytopic membrane proteins are not cleaved off and therefore are referred to as signal anchor sequences. Within the ER, the protein is first covered by a chaperone protein to protect it from the high concentration of other proteins in the ER, giving it time to fold correctly. Once folded, the protein is modified as needed (for example, by glycosylation), then transported to the Golgi for further processing and goes to its target organelles or is retained in the ER by various ER retention mechanisms. The amino acid chain of transmembrane proteins, which often are transmembrane receptors, passes through a membrane one or several times. They are inserted into the membrane by translocation, until the process is interrupted by a stop-transfer sequence, also called a membrane anchor or signal-anchor sequence. These complex membrane proteins are at the moment mostly understood using the same model of targeting that has been developed for secretory proteins. However, many complex multi-transmembrane proteins contain structural aspects that do not fit the model. Seven transmembrane G-protein coupled receptors (which represent about 5% of the genes in humans) mostly do not have an amino-terminal signal sequence. In contrast to secretory proteins, the first transmembrane domain acts as the first signal sequence, which targets them to the ER membrane. This also results in the translocation of the amino terminus of the protein into the ER membrane lumen. This would seem to break the rule of "co-translational" translocation which has always held for mammalian proteins targeted to the ER. This has been demonstrated with opsin with in vitro experiments.[1][2] A great deal of the mechanics of transmembrane topology and folding remains to be elucidated.

Transport and folding of outer membrane proteins

OM proteins cross the IM via the SecYEG secretion machinery Traverse the periplasm aided by SurA and Skp Omp85/YeaT is conserved in Gram negative bacteria Forms part of the OM protein folding machinery

Classification of Protein Structure

Primary structure - unique linear sequence of amino acids for each protein Secondary structure - local regions of structure a-helix, b-sheet and loops Tertiary structure - unique overall 3D structure for each individual protein Quaternary structure - subunit arrangement in oligomeric proteins - they don't occur one after the other, we just classify them

What is/are the folding pathway(s)?

Proposed folding pathway of chymotrypsin inhibitor. Local regions with sufficient structural preference tend to adopt their favored structures initially (1). These structures come together to form a nucleus with a native like, but still mobile, structure (4). This structure then fully condenses to form the native, more rigid structure (5). each of the intermediates shown represents an ensemble of similar structures, and thus a protein follows a general rather than a precise pathway in its transition from the unfolded to the native state. The energy surface for the overall process of protein folding can be visualized as a fun- nel (Figure 2.60). The wide rim of the funnel represents the wide range of structures accessible to the ensemble of denatured protein molecules. As the free energy of the population of protein molecules decreases, the pro- teins move down into narrower parts of the funnel and fewer conforma- tions are accessible. At the bottom of the funnel is the folded state with its well-defined conformation. Many paths can lead to this same energy minimum. proposed folding pathway of chymotrypsin inhibitor. Local regions with sufficient structural preference tend to adopt their favored structures initially (1). These structures come together to form a nucleus with a nativelike, but still mobile, structure (4). This structure then fully condenses to form the native, more rigid structure (5).

protein localisation

Protein localization can be reasonably well predicted by bioinformatic analysis

protein targeting

Protein targeting or protein sorting is the biological mechanism by which proteins are transported to their appropriate destinations in the cell or outside it. Proteins can be targeted to the inner space of an organelle, different intracellular membranes, plasma membrane, or to exterior of the cell via secretion. This delivery process is carried out based on information contained in the protein itself. Correct sorting is crucial for the cell; errors can lead to diseases.

Proteins Unfold in a Cooperative Process

Proteins can be denatured by any treatment that disrupts the weak bonds stabilizing tertiary structure, such as heating, or by chemical denaturants such as urea or guanidinium chloride. For many proteins, a comparison of the degree of unfolding as the concentration of denaturant increases reveals a sharp transition from the folded, or native, form to the unfolded, or dena- tured form, suggesting that only these two conformational states are pres- ent to any significant extent (Figure 2.56). A similar sharp transition is observed if denaturants are removed from unfolded proteins, allowing the proteins to fold. The sharp transition seen in Figure 2.56 suggests that protein folding and unfolding is an "all or none" process that results from a cooperative tran- sition. For example, suppose that a protein is placed in conditions under which some part of the protein structure is thermodynamically unstable. As this part of the folded structure is disrupted, the interactions between it and the remainder of the protein will be lost. The loss of these interactions, in turn, will destabilize the remainder of the structure. Thus, conditions that lead to the disruption of any part of a protein structure are likely to unravel the protein com- pletely. The structural properties of proteins provide a clear rationale for the cooperative transition. The consequences of cooperative folding can be illus- trated by considering the contents of a protein solution under conditions corresponding to the middle of the tran- sition between the folded and the unfolded forms. Under these conditions, the protein is "half folded." Yet the solu- tion will appear to have no partly folded molecules but, instead, look as if it is a 50/50 mixture of fully folded and fully unfolded molecules (Figure 2.57). Although the pro- tein may appear to behave as if it exists in only two states, this simple two-state existence is an impossibility at a molecular level. Even simple reactions go through reac- tion intermediates, and so a complex molecule such as a protein cannot simply switch from a completely unfolded state to the native state in one step. Unstable, transient intermediate structures must exist between the native and denatured state (p. 54). Determining the nature of these intermediate structures is an area of intense biochemical research.

Conclusions from Anfinsen's Experiments

SPECIFIC CONCLUSIONS: •Correct tertiary structure of RNase backbone had returned. •The right SH groups must have been adjacent to each other prior to reoxidation as a result of the backbone refolding correctly, because disulfide bonds formed spontaneously with the right combinations of Cys residues. MORE GENERAL CONCLUSIONS (Nobel Prize!): •Native structure is the thermodynamically most stable (favoured) state for most proteins. •Native tertiary structure is determined by the primary structure (amino acid sequence) of a protein.

Structure of Signal Recognition Particle

SRP is a ribonuleoprotein complex containing 6 proteins and 7S RNA SRP binding occludes the elongation factor binding site and so causes a pause in translation

Comparison of the Sec and Tat pathways in E. coli

Sec (secretory) and Tat (twin-arginine translocation) pathways transport unfolded and folded proteins, respectively Key to this is the presence of a signal peptide which directs the protein to these transport complexes The Sec- and Tat-dependent protein transport pathways. The Sec pathway is the dominant pathway for protein export from the bacterial cytoplasm. It accepts and translocates cargo proteins across the plasma membrane in a loosely folded or unfolded state, here exemplified with the precursor of the outer membrane protein A of E. coli (OmpA). Targeting and folding control of the cargo protein is supported by cytoplasmic targeting factors, such as SecB. The Sec machinery itself is composed of the SecYEG channel and the trans- location ATPase SecA, which converts chemical energy in the form of ATP into a driving force that pushes the cargo protein through the membrane. Additionally, translocation may be powered by the trans- membrane proton gradient. At the trans-side of the membrane, the translocated protein folds into its active and protease-resistant final conformation. In contrast to the Sec pathway, the Tat pathway trans- ports fully folded cofactor-containing proteins across the membrane, here exemplified with the precursor of the Tat cargo TorA. Cofac- tor insertion and folding may be aided by Redox Enzyme Matura- tion Proteins (REMPS), such as TorD in the case of TorA. The Tat translocase may consist of the three components TatA, TatB and TatC (E. coli), or of TatA and TatC components only (B. subtilis). Protein transport via Tat is powered by the transmembrane proton-motive force

Structure of BamA suggests a mechanism of OM protein biogenesis

Structures of BamA show a 16- stranded β-barrel and highly mobile POTRA domains that may gate access to the barrel from the periplasmic side Weak hydrogen bonding seen between b1 and b16 points towards putative lateral opening of the b-barrel in order to release the substrate OMP

Alzheimer's disease

Symptoms: memory loss, dementia, impairment in other forms of cognition and behaviour Not transmissible between individuals Characterised by intracellular aggregates (fibrillar tangles) of protein called tau and extracellular plaques contain aggregates of β-amyloid peptides (Aβ): 40-42-residue segments derived by proteolytic cleavage of a much larger protein (amyloid precursor protein APP) The gene for the APP protein is on human chromosome 21.People with Down's Syndrome have 3 copies of chromosome21 instead of 2 and have a greatly increased risk of developing Alzheimer's disease

1. Targeting of Proteins to the Endoplasmic Reticulum.

Synopsis. Synthesis of proteins entering the endoplasmic reticulum is initiated on free ribosomes. A targeting sequence of hydrophobic amino acids near the amino terminal end of the growing polypeptide results in the binding of the ribosome to ER membrane and in insertion of the polypeptide into the endoplasmic reticuluum. Proteins secretory or lysosomal pathways enter the ER and don't come out again. The proteins entering either of these pathways may be of either of two types: Proteins that are completely translocated into the endoplasmic reticuluum. These proteins are soluble (not membrane proteins) and are destined for secretion, or for transfer to lysosomes. In all of these cases the proteins are never part of membranes. Proteins that are inserted into membranes, and hence are only partially translocated into the endoplasmic reticuluum. These proteins may be destined for ER, membranes of another organelle (Golgi, lysosomes or endosomes), or the plasma membrane. In all of these cases the proteins stay within the membrane once they are inserted into the ER membrane (e.g. cellulose synthase). Translation of all proteins begins on free ribosomes. Those ribosomes that produce proteins for export through the endoplasmic reticulum become attached to the endoplasmic reticulum as ribosomes of the rough ER. The signal for ER entry is 8 or more hydrophobic amino acid residues (Table 14-3) which rivets the polypeptide to the ER membrane and is also involved in translocation. Whether or not a ribosome becomes attached to the endoplasmic reticulum depends on the nature of the message being translated, the protein being made, and is not an intrinsic property of the ribosome itself. The ribosome and its attached nascent peptide become targeted to the endoplasmic reticulum. Targeting to the endoplasmic reticulum takes place through the interaction of the signal peptide sequence ( a sequence of at least eight hydrophobic amino acids at the amino terminal end of the polypeptide. The emerging signal sequence combines with a 'signal recognition particle' (SRP). This greatly reduces the rate of translocation and allows the ribosome to attach to the endoplasm reticulum by means of a special SRP receptor in the ER membrane. The ribosome becomes attached to a ribosome receptor that also functions as the translocation channel for the newly synthesized polypeptide. As the ribosome becomes attached, the SRP is removed and translation resumes.1. There is a Signal Recognition Particle (SRP) in the cytosol. This binds to the ER Signal sequence when it is exposed on the ribosome and slows protein synthesis long enough to allow the SRP to find the second part, the SRP Receptor. 2. The Signal Recognition Particle Receptor (SRPR) which is embedded in the ER membrane. We now have the new polypeptide synthesizing system in place and protein synthesis speeds up. It seems that the Signal Sequence opens the translocation channel. Experimental test that ER targeting signal is both necessary and sufficient to bring about targeting.

Protein targeting to the Endoplasmic Reticulum

Synthesis of all proteins begins in the cytosol compartment. For proteins entering the secretory or Lysosomal pathways, the first step is targeting to the endoplasmic reticulum. This targeting relies on a targeting signal encoded in the N terminal portion of the protein. The targeting signal is recognized by a specific receptor that results in the protein entering the endoplasmic reticulum.

Structure and Function of Trigger Factor

TF is an abundant 50 kDa protein Free TF exists in monomeric and dimeric forms Binds directly to the ribosome, which causes a conformational change that activates TF for interaction with the nascent polypeptide chain

protein folding funnels

The folding funnel hypothesis is a specific version of the energy landscape theory of protein folding, which assumes that a protein's native state corresponds to its free energy minimum under the solution conditions usually encountered in cells. Although energy landscapes may be "rough", with many non-native local minimain which partially folded proteins can become trapped, the folding funnel hypothesis assumes that the native state is a deep free energy minimum with steep walls, corresponding to a single well-defined tertiary structure. The term was introduced by Ken A. Dill in a 1987 article discussing the stabilities of globular proteins.[1] The folding funnel hypothesis is closely related to the hydrophobic collapsehypothesis, under which the driving force for protein folding is the stabilization associated with the sequestration of hydrophobic amino acid side chains in the interior of the folded protein. This allows the water solvent to maximize its entropy, lowering the total free energy. On the side of the protein, free energy is further lowered by favorable energetic contacts: isolation of electrostatically charged side chains on the solvent-accessible protein surface and neutralization of salt bridgeswithin the protein's core. The molten globule state predicted by the folding funnel theory as an ensemble of folding intermediates thus corresponds to a protein in which hydrophobic collapse has occurred but many native contacts, or close residue-residue interactions represented in the native state, have yet to form.[citation needed] In the canonical depiction of the folding funnel, the depth of the well represents the energetic stabilization of the native state versus the denatured state, and the width of the well represents the conformational entropy of the system. The surface outside the well is shown as relatively flat to represent the heterogeneity of the random coilstate. The theory's name derives from an analogy between the shape of the well and a physical funnel, in which dispersed liquid is concentrated into a single narrow area.

THE CHANNEL COMPLEX SECYEG

The heterotrimeric protein complex SecYEG is the central player in protein translocation and functions as the membrane channel where cytosolic binding partners dock and provide the energy to translocate unfolded polypeptides through its aqueous interior. Reconstitution studies with the purified SecYEG complex have demonstrated that the minimal translocase consists of the SecYE complex and the motor protein SecA [8]. The crystal structure from Methanocaldococcus jannaschii SecYEβ provided the first high-resolution insight into the organization and structure of the translocation channel [9] (figure 2a). The SecY protein constitutes the actual channel and is composed of ten α-helical TMSs, where TMSs 1-5 and TMSs 6-10 are pseudo-symmetrically aligned resembling a bivalve shell. SecE enwraps the SecY channel in a V-shaped manner, and contacts the two SecY 'shells' with a tilted helix and an amphipatic helix, respectively. These two helices are connected via a hinge region, providing flexibility to the structure. The Secβ subunit, which presumably is functional homologous to the bacterial SecG, is more peripherally located in the structure. The SecG protein of Escherichia coli possesses two TMSs with the N- and C-terminus in the periplasm. SecG is not essential for translocation or cell viability, but it increases the efficiency of translocation [10-12]. In vitro, SecG increases the efficiency of translocation at low temperature or in the absence of a proton motive force (PMF) [13]. In vivo, protein translocation in the absence of SecG is cold-sensitive, but the severity of the export defect was shown to be strain-dependent [14-16]. SecG interacts with SecY independently of SecE [17] and cross-linking studies show that it resides beside the N-terminal half of SecY, which was confirmed by the M. jannaschii crystal structure [18-20]. It was also found to contact the accessory protein complex SecDF, possibly committing to the formation of the holotranslocon, i.e. the SecYEGDF complex [21,22]. While the exact functioning of SecG is unknown, several studies imply that SecG inverses its topology to assist SecA cycling [23-26]. On the other hand, when SecG was topologically fixed, the translocation mechanism was still fully functional [27]. Various other studies also provide a link between SecG and SecA [28,29]. The structure of SecA bound to SecYEG in an ATP hydrolysis intermediate state [30] confirms the vicinity of the SecG cytoplasmic loop with SecA. A photo-cross-linking study supports this interaction [31], which appears to involve an ionic pair in SecA and affects the coupling of the SecA ATPase activity with protein translocation [32]. SecG association would destabilize this ionic pair and promote conformational changes in SecA, thereby promoting SecA cycling. While SecG is located peripherally, the SecE-SecY interaction is much more extensive. The E. coli SecE subunit has three TMSs, where the two N-terminal TMSs are connected to the third via an amphipathic helix. Although essential for cell viability and protein translocation [33], cells remain viable when the two N-terminal TMSs plus a large part of the amphipathic helix are deleted [34,35]. Furthermore, most SecE homologues consist of only one TMS and the amphipathic helix, so this portion of SecE might have a more specialized function [36]. The most conserved part of SecE concerns the hinge region bridging the two helices enveloping the SecY protein [37]. This region together with the third TMS is essential for the stability of the SecY-SecE complex [38]. In the case of complex dissociation, the SecY unit is rapidly degraded in vivo by the membrane protease FtsH [39]; hence the importance of SecE for cell viability SecY is the central subunit and forms the actual protein-conducting channel. It features several domains important for proper functioning in protein translocation and insertion. The hourglass shape of the central channel is constricted, with six hydrophobic residues in the middle of the membrane presumably forming a seal to prevent leakage of water and ions [40,41]. The constriction ring may form a hydrophobic gasket around the translocating polypeptide, maintaining the integrity of the membrane [42]. Below the constriction ring, a small α-helix forms a plug domain, contributing to the barrier in the middle of the channel. The two halves of SecY form a bivalvic shell connected with a hinge on one side and the other side proposed to be a lateral gate for release of polypeptides in the membrane. Signal sequences bind in this lateral gate formed by TMSs 2 and 7. By analogy, nascent TMSs may bind at the lateral gate as well, whereupon they are released in the lipid bilayer [43,44]. At the cytoplasmic face of the channel, several loops protrude from the membrane where they bind to cytosolic binding partners. Structural data show that loops between TMS 6/7 and TMS 8/9 form extensive contacts with the ribosome and SecA [30,45,46]. These observations are supported by cross-linking and mutagenesis studies where residues in these loops were shown to be important in translocation and/or binding of the cytoplasmic partner [46-49]. The interaction of SecA and/or ribosomes with the SecYEG complex likely results in specific conformational changes of the channel, possibly even a partial opening as suggested by the SecA-SecYEG crystallographic structure. Indeed, during the past decade, several crystal structures of SecYEG homologues have been reported that suggest specific ligand-induced structural changes of the channel

NOESY spectrum

The nuclear Overhauser effect (NOE) is the transfer of nuclear spin polarizationfrom one population of spin-active nuclei (e.g. 1H, 13C, 15N etc.) to another via cross-relaxation. A phenomenological definition of the NOE in nuclear magnetic resonance spectroscopy (NMR) is the change in the integrated intensity (positive or negative) of one NMR resonance that occurs when another is saturated by irradiation with an RF field. The change in resonance intensity of a nucleus is a consequence of the nucleus being close in space to those directly affected by the RF perturbation. The NOE is particularly important in the assignment of NMR resonances, and the elucidation and confirmation of the structures or configurations of organic and biological molecules. The two-dimensional NOE experiment (NOESY) is an important tool to identify stereochemistry of proteins and other biomolecules in solution, whereas in solid form crystal x-ray diffraction must be used to identify the stereochemistry - a transfer mechanism says that if two nuclei are close in space but not necessary bonded, then through dipole-dipole interactions will allow the magnetisation transfer between them.

Primary structure

The primary structure of a protein, its linear amino-acid sequence, determines its native conformation.[9] The specific amino acid residues and their position in the polypeptide chain are the determining factors for which portions of the protein fold closely together and form its three-dimensional conformation. The amino acid composition is not as important as the sequence.[10] The essential fact of folding, however, remains that the amino acid sequence of each protein contains the information that specifies both the native structure and the pathway to attain that state. This is not to say that nearly identical amino acid sequences always fold similarly.[11] Conformations differ based on environmental factors as well; similar proteins fold differently based on where they are found.

the secretory pathway

The secretory pathway refers to the endoplasmic reticulum, Golgi apparatus and the vesicles that travel in between them as well as the cell membrane and lysosomes. It's named 'secretory' for being the pathway by which the cell secretes proteins into the extracellular environment. But as usual, etymology only tells a fraction of the story. This pathway also processes proteins that will be membrane-bound (whether in the cellular membrane or in the ER or Golgi membranes themselves), as well as lysosomal enzymes, and also any proteins that will live their lives in the secretory pathway itself. It also does some things other than process proteins. The cytosol and the 'lumen' (the liquid that fills the secretory pathway) are different chemical environments, and they normally never mix. The cytosol is reductive (when you're in the cytosol, you keep meeting molecules that want to offer you electrons), and the ER, Golgi and extracellular environment are oxidative (molecules keep coming up to you asking for electrons). See redox if still confused. This makes for different protein-folding conditions: for instance, disulfide bonds usually only form in oxidative conditions. Moreover, different proteins may live only in the secretory pathway or only in the cytosol. The secretory pathway provides a route for the cell to handle things that might not be good to have in the cytoplasm, and/or are most useful when kept concentrated in a specialized compartment with their desired interacting partners. Hepatocytes (in the liver) sequester drugs and toxins in the smooth ER and break them down for excretion from the body there. The secretory pathway is not contiguous, but every movement between its components is in little bubbled-off microcosms of its own chemical world, called vesicles. Many proteins that go through the secretory pathway never touch the cytosol - except the parts of membrane proteins that stick out on the cytosolic side. Many of them need chaperones to help with folding, and/or a whole series of post-translational modifications in order to be ready for their native function, and the secretory pathway specializes in providing them all of that

Tat System

The twin-arginine protein translocation (Tat) system has been characterized in bacteria, archaea and the chloroplast thyla- koidal membrane. This system is distinct from other protein transport systems with respect to two key features. Firstly, it accepts cargo proteins with an N-terminal signal peptide that carries the canonical twin-arginine motif, which is essential for transport. Second, the Tat system only accepts and translocates fully folded cargo proteins across the respective membrane. Here, we review the core essential features of folded protein transport via the bacterial Tat system, using the three-component TatABC system of Escherichia coli and the two-component TatAC systems of Bacillus subtilis as the main examples. In particular, we address features of twin-arginine signal peptides, the essential Tat components and how they assemble into different complexes, mechanistic features and energetics of Tat-dependent protein translocation, cytoplasmic chaperoning of Tat cargo proteins, and the remarkable proofreading capabilities of the Tat system. In doing so, we present the current state of our understanding of Tat-dependent protein translocation across biological membranes, which may serve as a lead for future investigations. To function correctly and efficiently, every cell needs to be highly organised, tightly regulated and compartmentalised. Proteins are essential macromolecules synthesised by ribo- somes in the cytoplasm that often require localisation to a particular subcellular compartment before they can carry out their respective functions. Their proper formation, targeting and activity are imperative to the survival of the cell. This requirement for correct localisation particularly applies to proteins that take part in the acquisition of nutrients, energy transduction, cell-to-cell communication and cellular loco- motion. On average, 20-30% of proteins synthesised in the bacterial cytoplasm are destined for extra-cytoplasmic locations [1]. They therefore have to pass a cell membrane composed of a tightly sealed lipid bilayer intent on keep- ing the cell structurally sound and impenetrable. Therefore, specialised transport systems have evolved within the cell membrane to allow proteins to cross this barrier. Each sys- tem made up of critical components is as specialised as the protein cargo it will transport. However common features tie protein transport systems together, which guarantee cell regulation and safety. These include a gated pore, an energy requirement to drive cargo proteins through the membrane, and the use of signal peptides that direct the cargo protein to the correct translocase and the correct location. Two major transport systems exist for protein transloca- tion across the bacterial cytoplasmic membrane, namely the general secretory (Sec) pathway and the twin-arginine translocation (Tat) pathway (Fig. 1). The Sec pathway facili- tates export of the majority of bacterial proteins, whereas the Tat pathway is quite restricted. For instance, it trans- ports ~ 30 proteins in Escherichia coli and only four in Bacil- lus subtilis [2]. Further, each protein is fully folded in the cytoplasm prior to export via Tat, whereas Sec can only export unfolded proteins.

Transport of OMPs across the bacterial inner membrane

The unusual structure of bacterial OMPs is probably imposed by their biogenesis pathway. OMPs are synthesized in the cytoplasm as precursors with an N-terminal signal sequence, which marks them for transport across the inner membrane via the Sec system (Fig. 2). The protein- conducting channel of the Sec system, which is composed of the integral membrane proteins SecY, SecE and SecG (Driessen & Nouwen, 2008), releases OMPs and periplasmic proteins at the periplasmic side of the membrane. The SecYEG translocon is also implicated in the assembly of integral inner-membrane proteins. When large hydrophobic protein segments are inserted into the translocon, the channel opens laterally to allow for the insertion of these proteins into the inner membrane (Fig. 2; Driessen & Nouwen, 2008). Thus, the presence of similar hydrophobic segments in OMPs would prevent them from reaching their final destination, while the amphipathic b-strands that constitute the transmembrane segments of OMPs are compatible with transport via the SecYEG translocon to the periplasm. Indeed, the insertion of hydrophobic segments into the outer membrane porin PhoE of E. coli was shown to affect the biogenesis of the protein (Agterberg et al., 1990).

ProteinTargetingViatheTwin‐Arginine Signal Peptide

To ensure proteins are appropriately directed into the Sec or Tat pathways and to initiate the translocation process, specific signal peptides are present on the N-terminus of each protein. On the trans side of the membrane the signal peptide is cleaved by a signal peptidase to liberate just the mature protein [3-7]. The amino acid sequences of signal peptides differ substantially, but they are all composed of a positively charged N-terminal N-domain, a hydrophobic H-domain and a C-terminal C-domain with an Ala-x-Ala signal peptidase cleavage site [3, 8] (Fig. 2). Further, the N-regions of Tat signal peptides contain the canonical twin- arginine motif S-R-R-x-F-L-K (where x is a polar amino acid) [9]. The importance of additional conserved amino acids in the Tat-motif depends on the cargo protein and var- ies in different bacteria [10]. However, RR-residues are close to invariant and key to efficient protein export. In particu- lar, the charge-neutral substitution of RR to KK blocks Tat export completely [11]. Yet, a single Arg to Lys mutation only slows down the rate of translocation in most bacteria [12]. In chloroplast thylakoids where the Tat pathway also exists, an RR to KR substitution is tolerated, while a RR to RK substitution precludes transport [12-14]. A single substi- tution of Arg to Glu has been reported as tolerated too [15]. Of note, the TtrB subunit of the tetrathionate reductase in Enterobacteriaceae is the only known native Tat cargo to have a KR-motif [16]. Aside from the RR-motif, other resi- dues within the larger twin-arginine signal peptide are also important. In particular, the Phe residue is present in 80% of Tat-motifs, and substitutions showed a highly hydrophobic residue is essential at this position [11]. Tat signal peptides comprise about 30 residues in most organisms. Hence they are longer than Sec signal peptides, which comprise about 17 to 24 residues [17]. Tat signal peptides are also overall less hydrophobic than Sec signal peptides, which serves to avoid protein targeting to the Sec pathway [18]. Additionally, the C-domain of Tat signal pep- tides may include basic residues N-terminally of the A-x-A motif, which contribute to Sec avoidance

Native ribonuclease can be re-formed from scrambled ribonuclease in the presence of a trace of β-mercaptoethanol

Trace of b-EtSH permits disulphide exchange and correct S-S pairs form automatically. Native state of protein is thermodynamically the most stable

Type 3 secretion system (T3SS)

Type III secretion systems (T3SSs) are complex bacterial structures that provide gram-negative pathogens with a unique virulence mechanism enabling them to inject bacterial effector proteins directly into the host cell cytoplasm, bypassing the extracellular milieu. Although the effector proteins vary among different T3SS pathogens, common pathogenic mechanisms emerge, including interference with the host cell cytoskeleton to promote attachment and invasion, interference with cellular trafficking processes, cytotoxicity and barrier dysfunction, and immune system subversion. The activity of the T3SSs correlates closely with infection progression and outcome, both in animal models and in human infection. Therefore, to facilitate patient care and improve outcomes, it is important to understand the T3SS-mediated virulence processes and to target T3SSs in therapeutic and prophylactic development efforts. The T3SS is a complex structure composed of several subunits, which in turn are made up of approximately 20 bacterial proteins (Fig. (Fig.1).1). The proteins that make up the T3SS apparatus are termed structural proteins. Additional proteins called "translocators" serve the function of translocating another set of proteins into the host cell cytoplasm. The translocated proteins are termed "effectors," since they are the virulence factors that effect the changes in the host cells, allowing the invading pathogen to colonize, multiply, and in some cases chronically persist in the host. The structural components of the T3SS and the process of translocation are expertly reviewed by Ghosh (40). Briefly, the T3SS apparatus consists of two rings that provide a continuous path across the inner and outer membranes, including the peptidoglycan layer. The inner membrane ring is the larger of the two coaxial rings, and protein components that make up the inner ring have been identified for a number of bacteria. The outer membrane ring is composed of the secretin protein family, which is also known to be involved in type 2 secretion and in the assembly of type IV bacterial pili. A needle-like structure associates with the outer membrane ring and projects from the bacterial surface. It varies in length among the different pathogens and, in the case of pathogenic Escherichia coli, is extended by the addition of filaments that are thought to facilitate attachment to the host cells through the thick glycocalyx layer. Effectors are thought to be transported through the hollow tube-like needle into the host cell through the pores formed in the host cell membrane by the translocator proteins. Translocators are usually conserved among the different pathogens possessing a T3SS and show functional complementarity for secretion and translocation, whereas the effectors are most often distinct, having unique functions suited to a particular pathogen's virulence strategy. However, effector homologues also exist among different T3SS-possessing bacteria. Required for virulence in a range of Gram- negative pathogens Injects 'effector proteins' directly into eukaryotic cells

Hydrophobic effect is the driving force for the folding of water soluble proteins

Water is a highly polar solvent and water-water interactions are more favorable than the interaction of water with hydrophobic side chains e.g. L, I, V, F Hydrophobic groups tend to cluster together in water to minimise their interactions with water • In proteins this is a major driving force for the formation of a hydrophobic core The hydrophobic effect is an indirect effect resulting from a peculiarity of water structure. Water molecules exchange hydrogen bonds with neighbours at a rate 11 -1of about 10 s . "Flickering Cluster". At the interface between water and a non-H-bonding group such as CH3, water molecules have fewer opportunities for H-bond exchange (forces are anisotropic) leading to longer than usual lifetime of H-bonds, an ice-like state at the interface, and consequent decrease in entropy. Thus water at the interface is rotationally and translationally constrained. Any situation that minimizes the area of contact between H2O and non-polar, i.e. hydrocarbon, regions of proteins results in an increase in entropy. This is achieved by clustering non polar groups together. • • The hydrophobic effect is the observed tendency of non-polar molecules to aggregate in water. The hydrophobic effect is an indirect effect resulting from a peculiarity of water structure. Water molecules exchange hydrogen bonds with neighbours at a rate of about 1011 s-1. At the interface between water and a non-H-bonding group such as CH3, water molecules have fewer opportunities for H- bond exchange leading to longer than usual lifetime of H- bonds, an ice-like state at the interface, and consequent decrease in entropy. Thus water at the interface is rotationally and translationally constrained.

Experimental setup

X-ray beam is focused on the crystal Most X-rays do not interact with the crystal i.e. pass straight through However some X-rays are scattered and give a characteristic pattern of spots A single pattern is not enough - the crystal must be rotated and multiple patterns collected 1- generate x-rays at a specific wavelength at x-ray source 2- x-rays arrive at x-ray detector part of machine 3- robot holds single crystal to move it around 4- done at low temperature ( crystal frozen), reduces motion of atom within crystal ( fixed position) thus improving resolution X-rays are very high EM radiations they make electrons jump out of atoms radiation damage as a result. stops random chemical reactions that will destroy molecules and they won't be organised in a regular array anymore so if its cold it helps quench any chemical reaction

Funnel-shaped Energy Landscape

en A. Dill and Hue Sun Chan (1997) illustrated a folding pathway design based on Levinthal's Paradox, named the "golf-course" landscape, where a random searching for the native states would prove impossible, due to the hypothetically "flat playing field" since the protein "ball" would take a really long time to find a fall into the native "hole". However, a rugged pathway deviated from the initial smooth golf-course creates a directed tunnel where the denatured protein goes through to reach its native structure, and there can exist valleys (intermediate states) or hills (transition states) long the pathway to a protein's native state. Yet, this proposed pathway yields a contrast between pathway dependence versus pathway independence, or the Levinthal dichotomy and emphasizes the one-dimensional route of conformation. Another approach to protein folding eliminates the term "pathway" and replaces with "funnels" where it's concerned with parallel processes, ensembles and multiple dimensions instead of a sequence of structures a protein has to go through. Thus, an ideal funnel constitutes of a smooth multi-dimensional energy landscape where increasing interchain contacts correlate with decreasing degree of freedom and ultimately achievement of native state.[6] From left to right for proposed funnel-shaped energy landscape: the idealized smooth funnel, the rugged funnel, the Moat funnel, and the Champagne Glass funnel. Unlike an idealized smooth funnel, a rugged funnel demonstrates kinetic traps, energy barriers, and some narrow throughway paths to native state. This also explains an accumulation of misfolded intermediates where kinetic traps prevent protein intermediates from achieving their final conformation. For those that are stuck in this trap, they would have to break away favorable contacts that do not lead to their native state before reaching their original starting point and find another different search downhill.[6] A Moat landscape, on the other hand, illustrates the idea of a variation of routes including an obligatory kinetic trap route that protein chains take to reach their native state. This energy landscape stems from a study by Christopher Dobson and his colleagues about hen egg white lysozyme, in which half of its population undergo normal fast folding, while the other half first forms α-helices domain quickly then β-sheet one slowly.[6] It's different from the rugged landscape since there are no accidental kinetic traps but purposeful ones required for portions of protein to go through before reaching the final state. Both the rugged landscape and the Moat landscape nonetheless present the same concept in which protein configurations might come across kinetic traps during their folding process. On the other hand, the Champagne Glass landscape involves free energy barriers due to conformational entropy that partly resembles the random golf-course pathway in which a protein chain configuration is lost and has to spend time searching for the path downhill. This situation can be applied to a conformational search of polar residues that will eventually connect two hydrophobic clusters. From left to right for proposed funnel-shaped energy landscape: the idealized smooth funnel, the rugged funnel, the Moat funnel, and the Champagne Glass funnel

Sec translocase system

exports proteins and inserts integral membrane proteins into the membrane membrane via the translocase, which consists of a protein-conducting channel SecYEG and an ATP-dependent motor protein SecA. The ancillary SecDF membrane protein complex promotes the final stages of translocation. Recent years have seen a major advance in our understanding of the structural and biochemical basis of protein translocation, and this has led to a detailed model of the translocation mechanism. After their synthesis on the ribosome until their arrival at their functional location, proteins are faced with a maturation path that is filled with obstacles. In prokaryotes, one such a barrier is the inner membrane, where most proteins either are directed across or into the lipid bilayer. The majority of secretory proteins pass across the inner membrane via the Sec pathway, which comprises a set of cytosolic and membrane proteins that work together to facilitate protein translocation. This pathway also provides an entry for membrane proteins to be inserted into the inner membrane. Proteins are targeted to their final location, i.e. the inner membrane or the periplasm, by their respective hydrophobic transmembrane segments (TMSs) or signal sequences (for a review, see von Heijne [1]). At an early stage during translation, when the N-terminal signal sequence emerges from the ribosome, signal recognition particle (SRP) and the trigger factor (TF) compete for binding to the nascent chain [2,3]. Targeting sequences (stop-transfer sequences) from inner membrane proteins correspond to TMSs that exhibit high hydrophobicity and that are bound tightly by SRP. This association slows or temporarily halts elongation of the nascent chain, giving SRP time to interact with its membrane receptor FtsY [4,5]. After binding to FtsY, the ribosome nascent chain complex is transferred to the protein-conducting channel SecYEG, where translation continues providing the driving force for insertion of the membrane protein. The post-translational pathway for protein secretion (figure 1) involves less hydrophobic signal sequences of nascent secretory proteins that are bound by TF, but this reaction does not result in a slowdown of translation. Following elongation, the chaperone activity of TF is taken over by SecB, which keeps the preprotein in an unfolded conformation and directs it to the motor protein SecA [6,7]. Subsequent binding of SecA to SecYEG and binding of ATP to SecA initiate translocation of the preprotein across the inner membrane. SecA is a motor protein that uses ATP as energy source and threads the unfolded polypeptide through the channel. The adjoining SecDF complex is involved in later stages of protein translocation and presumably pulls translocating proteins from the channel at the periplasmic side of the membrane.

General Scheme for Protein Folding/Misfolding

mid folded proteins are degraded by cellular quality control mechanisms such as the proteasome but failure of this system results in protein aggregation ( amyloid fibrils) leading to protein misfoling disorders proteins are synthesised on ribosomes of cells some proteins will fold immediately while the nascent polypeptide chain is still attached to the ribosomes while other proteins will follow a folding mechanism in the ER after translation chaperons control the folding mechanism side chains of the polypeptide chain determine environmental conditions under which the polypeptide chain can undergo aggregation proteins are evolved amino acid polymers whose amino acid sequences disfavour aggregation whilst favour folding into compact states resulting from tertiary interactions among the side chain that shield the peptide backbone

Transport systems in E. coli

most prokaryotic membrane-bound and secretory proteins are targeted to the plasma membrane by either a co-translation pathway that uses bacterial SRP or a post-translation pathway that requires SecA and SecB. At the plasma membrane, these two pathways deliver proteins to the SecYEG translocon for translocation. Bacteria may have a single plasma membrane (Gram-positive bacteria), or an inner membrane plus an outer membrane separated by the periplasm (Gram-negative bacteria). Besides the plasma membrane the majority of prokaryotes lack membrane-bound organelles as found in eukaryotes, but they may assemble proteins onto various types of inclusions such as gas vesicles and storage granules.

Constraints on "acceptability" of mutations?

• Core residues more highly conserved - Changing them might disrupt structure • Surface residues change faster - But functional sites slower • Structure conserved more than sequence

Pathways of protein folding

•Framework model. Precedence is given to the formation of secondary structural units. •Hydrophobic collapse model. Precedence is given to an initial chain collapse. •Nucleation-condensation model. Extended nucleus is formed early during folding. Formation of secondary and tertiary structure are tightly coupled. •Molten globule-like intermediates accumulate during the folding of many proteins. For some proteins, particularly those following nucleation mechanisms, a molten globule intermediate does not usually accumulate. •Molten globules are partially-folded intermediate state in which main elements of secondary structure are formed. Have a more open dynamic tertiary structure than the native state Proteins appear to fold by diverse pathways, but vari-ations of a simple mechanism - nucleation-conden-sation - describe the overall features of folding of mostdomains. In general, secondary structure is inherentlyunstable and its stability is enhanced by tertiary inter-actions. Consequently, an extensive interplay ofsecondary and tertiary interactions determines thetransition-state for folding, which is structurally similarto the native state, being formed in a general collapse(condensation) around a diffuse nucleus. As the propen-sity for stable secondary structure increases, foldingbecomes more hierarchical and eventually follows aframework mechanism where the transition state isassembled from pre-formed secondary structural elements

Protein Folding and the Chaperone Network in Bacteria

•Nascent Polypeptides are bound by Trigger Factor as they exit the ribosome •They can fold spontaneously or be assisted by the DnaK (Hsp70)/ DnaJ (Hsp40) chaperone system •Alternatively they can be passed to GroEL(Hsp60)- GroES(Hsp10) chaperonins for final folding Nascent polypeptides emerging from the ribosome are assisted by a pool of molecular chaperones and targeting factors, which enable them to efficiently partition as cytosolic, integral membrane or exported proteins. Extensive genetic and biochemical analyses have significantly expanded our knowledge of chaperone tasking throughout this process. In bacteria, it is known that the folding of newly-synthesized cytosolic proteins is mainly orchestrated by three highly conserved molecular chaperones, namely Trigger Factor (TF), DnaK (HSP70) and GroEL (HSP60). Yet, it has been reported that these major chaperones are strongly involved in protein translocation pathways as well. This review describes such essential molecular chaperone functions, with emphasis on both the biogenesis of inner membrane proteins and the post-translational targeting of presecretory proteins to the Sec and the twin-arginine translocation (Tat) pathways. Critical interplay between TF, DnaK, GroEL and other molecular chaperones and targeting factors, including SecB, SecA, the signal recognition particle (SRP) and the redox enzyme maturation proteins (REMPs) is also discussed

NMR

- study chemical / physical properties of all molecules - identifying unknown structures of proteins/ molecules - all atoms contain a nucleus and the nucleus is surrounded by electrons hydrogen has 1 proton and 1 electron, the proton can spin so it behaves as a magnet and get the magnetic property, if it behaves like magnet then it defy has a North Pole and a South Pole if we bring another hydrogen atom next to it, north will repel north so what it will do is, it'l rotate the other one in the opposite direction. it'll allow the second atom to spin so that the South Pole is next to the north pole. they'll have a rearrangement after that they'll acquire a stable state. if we apply a magnetic field ( apply north position withheld of machine) that'll do the same effect

alpha-amino acid chirality

Amino acids (except for glycine) have a chiral carbon atom adjacent to the carboxyl group(CO2-). This chiral center allows for stereoisomerism. The amino acids form two stereoisomers that are mirror images of each other. The structures are not superimposable on each other, much like your left and right hands. These mirror images are termed enantiomers With the exception of glycine, all the 19 other common amino acids have a uniquely different functional group on the central tetrahedral alpha carbon (i.e. CαCα). The CαCα is termed "chiral" to indicate there are four different constituents and that the Ca is asymmetric. Since the CαCα is asymmetric there exists two possible, non-superimposable, mirror images of the amino acids: The D/L system is based on optical activity and refers to the Latin words dexter for right and laevus for left, reflecting left- and right-handedness of the chemical structures. An amino acid with the dexter configuration (dextrorotary) would be named with a (+) or D prefix, such as (+)-serine or D-serine. An amino acid having the laevus configuration (levorotary) would be prefaced with a (-) or L, such as (-)-serine or L-serine. Here are the steps to determine whether an amino acid is the D or L enantiomer: Draw the molecule as a Fischer projection with the carboxylic acid group on top and side chain on the bottom. (The amine group will not be at the top or bottom.) If the amine group is located on the right side of the carbon chain, the compound is D. If the amine group is on the left side, the molecule is L. If you wish to draw the enantiomer of a given amino acid, simply draw its mirror image. All amino acids found in proteins occur in the L-configuration about the chiral carbon atom. The exception is glycine because it has two hydrogen atoms at the alpha carbon, which cannot be distinguished from each other except via radioisotope labeling. D-amino acids are not naturally found in proteins and are not involved in the metabolic pathways of eukaryotic organisms, although they are important in the structure and metabolism of bacteria. For example, D-glutamic acid and D-alanine are structural components of certain bacterial cell walls. It's believed D-serine may be able to act as a brain neurotransmitter. D-amino acids, where they exist in nature, are produced via post-translational modifications of the protein. hydrogen has the lowest priority so pointed away from viewer. L isomers are more soluble as D isomers form crystals the small solubility difference could have been amplified over time. so "L " BECAME more dominant

limitations

As the crystal's repeating unit, its unit cell, becomes larger and more complex, the atomic-level picture provided by X-ray crystallography becomes less well-resolved (more "fuzzy") for a given number of observed reflections. Two limiting cases of X-ray crystallography—"small-molecule" (which includes continuous inorganic solids) and "macromolecular" crystallography—are often discerned. Small-molecule crystallography typically involves crystals with fewer than 100 atoms in their asymmetric unit; such crystal structures are usually so well resolved that the atoms can be discerned as isolated "blobs" of electron density. By contrast, macromolecular crystallography often involves tens of thousands of atoms in the unit cell. Such crystal structures are generally less well-resolved (more "smeared out"); the atoms and chemical bonds appear as tubes of electron density, rather than as isolated atoms. In general, small molecules are also easier to crystallize than macromolecules; however, X-ray crystallography has proven possible even for viruses and proteins with hundreds of thousands of atoms, through improved crystallographic imaging and technology.[104] Though normally X-ray crystallography can only be performed if the sample is in crystal form, new research has been done into sampling non-crystalline forms of samples

beta strand

In schematic representations, b strands are usually depicted by broad arrows pointing in the direction of the carboxyl-terminal end to indicate the type of b sheet formed—parallel or antiparallel. More structurally diverse than a helices, b sheets can be almost flat but most adopt a somewhat twisted shape (Figure 2.34). The b sheet is an important structural element in many proteins. For example, fatty acid-binding proteins, important for lipid metabolism, are built almost entirely from b sheets (Figure 2.35).

Detecting homology: 3D structure

Protein function is specified by 3D structure Proteins with similar functions that are evolutionarily related (homologues) will have similar 3D structure • But for a protein of unknown structure, structure determination is relatively hard work - is there a simpler way?

Proteomics

Proteomics is used to investigate: when and where proteins are expressed; rates of protein production, degradation, and steady- state abundance; how proteins are modified (for example, post- translational modifications (PTMs) such as phosphorylation); the movement of proteins between subcellular compartments; the involvement of proteins in metabolic pathways; how proteins interact with one another.

The Fourier Transform

The Fourier Transform is an important image processing tool which is used to decompose an image into its sine and cosine components. The output of the transformation represents the image in the Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier domain image, each point represents a particular frequency contained in the spatial domain image. The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression Mathematically we canperform a Fourier transform (FT) on an image e.g. duck The result is the molecular transform (MT) - Dark & light represent intensity - Colour represents phase Reverse FT-1 on MT generates image i.e. duck We can illustrate the relative contribution of thephase and the amplitude to the final image Replacing the phases of one image with another then performing FT-1 results in a slightly distorted pattern The resultant image is of the cat demonstrating thatthe phase is by far the most important component

Structure calculation: methodology

The basic problem - Many restraints (100s-1000s) from experimental and empirical data all to be satisfied at the same time. The solution - Build a description of the molecule that we can sample different conformations and measure the potential energy of each so that we can find minima. The result - Several different structures will be equally compatible with the data so represent this as an ensemble

Diffraction pattern

The diffraction pattern has 3 properties of interest 1) Geometry:- regular pattern, spots lie in specific positions 2) Symmetry:- the pattern can be symmetrical, reflecting symmetry of crystal 3) Intensities:- these vary widely from very intense to too weak to be seen But phase information is lost. Spots at greatest distance from the centre (largest ) contain information about shortest distances (because nλ = 2d sinθ). crystals at certain specific wavelengths and incident angles produce intense peaks of reflected radiation. Bragg diffraction occurs when radiation with a wavelength comparable to atomic spacings is scattered in a specular fashion by the atoms of a crystalline system. when the scattered waves interfere constructively, they remain in phase since the difference between the path length of the two waves is equal to an integer multiple of the wavelength ( 2dsin theta) low value = better

Rotation about bonds ( phi & psi) in a polypeptide.

The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi ( ) is the angle of rotation about the bond between the nitrogen and the a-carbon atoms, whereas psi ( ) is the angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the a-carbon atoms, showing how is measured. (C) A view down the bond between the a-carbon and the carbonyl carbon atoms, showing how is measured. The angle of rotation about the bond between the nitrogen and the a-carbon atoms is called phi (). The angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms is called psi (). A clockwise rotation about either bond as viewed from the nitrogen atom toward the a-carbon atom or from the a-carbon atom toward the carbonyl group corresponds to a positive value. The and angles determine the path of the polypeptide chain. Are all combinations of and possible? Gopalasamudram Ramachandran recognized that many combinations are forbidden because of steric collisions between atoms. The allowed values can be visualized on a two-dimensional plot called a Ramachandran plot (Figure 2.23). Three- quarters of the possible (, ) combinations are excluded simply by local steric clashes. Steric exclusion, the fact that two atoms cannot be in the same place at the same time, can be a powerful organizing principle. The ability of biological polymers such as proteins to fold into well- defined structures is remarkable thermodynamically. An unfolded polymer exists as a random coil: each copy of an unfolded polymer will have a differ- ent conformation, yielding a mixture of many possible conformations. The favorable entropy associated with a mixture of many conformations opposes folding and must be overcome by interactions favoring the folded form. Thus, highly flexible polymers with a large number of possible con- formations do not fold into unique structures. The rigidity of the peptide unit and the restricted set of allowed f and c angles limits the number of structures accessible to the unfolded form sufficiently to allow protein folding to take place. A Ramachandran plot showing the values of and . Not all and values are possible without collisions between atoms. The most favorable regions are shown in dark green; borderline regions are shown in light green. The structure on the right is disfavored because of steric clashes.

CD to detect conformational changes

What might makes prion protein switch conformation to make amyloid? Theory: prion protein residues 106-126 change conformation in formation of amyloid

Multidimensional NMR experiments

• "Coupling" allows transfer of magnetisation from one nucleus to another. • "Labelling" the magnetization with the frequency of the first nucleus before transfer • Record the frequency of the destination nucleus during acquisition • Different methods of transfer Multidimensional NMR spectroscopy (1) forms the basis of the determination of the three-dimensional structure of biomolecules (proteins, DNA, RNA) (2) by providing the resolution necessary to analyze their complex spectra. One-dimensional NMR spectra result from a Fourier transform (FT) of a directly detected time domain signal, the free induction decay (FID) that is recorded at the end of pulse sequenceduring the acquisition time. Higher-dimensional spectra are created by indirectly detecting further time domains. This is accomplished by systematic and independent variation of one or several delays in the pulse sequence while repeatedly acquiring an FID. In between the variable delays that result in the indirect detected dimensions are mixing sequences of varying complexity. They enable a transfer of magnetization viaseveral mechanisms resulting in correlations between signals from different nuclei. Depending on the type of interaction used to accomplish this transfer..

Quality of structures: R factors

- R factor and free R factor How well does the model represents the data - Low is good 1000s of individual diffraction spots used in structure calculation Many spots are duplicates due to symmetry Each spot's intensity influenced by all the atoms within the unit cell R factor calculated for data used in refinement - Remember we can fit intensities because phase dominates so we need an independent check.... Free R factor calculated for 5% of data left out of refinement

cis- vs trans-peptide bonds

2 configurations are possible for a planar peptide bond - trans : 2 ca atoms are on opposite sides of peptide bond - cis: 2 ca atoms are on the same of peptide bond all peptide bonds in protein are trans because of the steric clashes between groups attached to the ca atoms hinders the formation of cis configuration but not trans thus its more preferable most cis peptide bonds are x-pro linkages that shows less preference for the trans configuration because the nitrogen of the proline is bonded to 2 tetrahedral carbon atoms, limiting the steric differences between trans and cis forms Two configurations are possible for a planar peptide bond. In the trans configuration, the two a-carbon atoms are on opposite sides of the peptide bond. In the cis configuration, these groups are on the same side of the pep- tide bond. Almost all peptide bonds in proteins are trans. This preference for trans over cis can be explained by the fact that steric clashes between groups attached to the a-carbon atoms hinder the formation of the cis configuration but do not arise in the trans configuration (Figure 2.20). By far the most common cis peptide bonds are X}Pro linkages. Such bonds show less preference for the trans configuration because the nitrogen of proline is bonded to two tetrahedral carbon atoms, limiting the steric differences between the trans and cis forms In contrast with the peptide bond, the bonds between the amino group and the a-carbon atom and between the a-carbon atom and the carbonyl group are pure single bonds. The two adjacent rigid peptide units can rotate about these bonds, taking on various orientations. This freedom of rotation about two bonds of each amino acid allows proteins to fold in many different ways. The rotations about these bonds can be specified by torsion angles • synthesised as trans- on the ribosome • trans- form favoured because side chains clash in cis- omega- 180 = Trans omega - 0 degrees = cis, unfavourable because side chains clash with one another as they're placed on top. the most common cis peptide bonds are x-pro linkages they show less preference for trans configuration because the nitrogen of the proline is bonded to 2 tetrahedral carbon atoms limiting steric differences between trans and cis forms Both of the isomers have exactly the same atoms joined up in exactly the same order. That means that the van der Waals dispersion forces between the molecules will be identical in both cases. The difference between the two is that the cis isomer is a polar molecule whereas the trans isomer is non-polar

Detecting homology: sequence

3D structure is specified by amino acid sequence Proteins with similar structures (and functions) will have similar amino acid sequences • Sequence comparison of (predicted) amino acid sequences can reveal potential homology Regardless of the method used for its determination, the amino acid sequence of a protein can provide the biochemist with a wealth of informa- tion as to the protein's structure, function, and history. 1. The sequence of a protein of interest can be compared with all other known sequences to ascertain whether significant similarities exist. A search for kin- ship between a newly sequenced protein and the millions of previously sequenced ones takes only a few seconds on a personal computer (Chapter 6). If the newly isolated protein is a member of an established class of protein, we can begin to infer information about the protein's structure and func- tion. For instance, chymotrypsin and trypsin are members of the serine protease family, a clan of proteolytic enzymes that have a common catalytic mechanism based on a reactive serine residue (Chapter 9). If the sequence of the newly isolated protein shows sequence similarity with trypsin or chymotrypsin, the result suggests that it may be a serine protease. 2. Comparison of sequences of the same protein in different species yields a wealth of information about evolutionary pathways. Genealogical relationships between species can be inferred from sequence differences between their pro- teins. If we assume that the random mutation rate of proteins over time is constant, then careful sequence comparison of related proteins between two organisms can provide an estimate for when these two evolutionary lines diverged. For example, a comparison of serum albumins found in primates indicates that human beings and African apes diverged 5 million years ago, not 30 million years ago as was once thought. Sequence analyses have opened a new perspective on the fossil record and the pathway of human evolution.

proteins

Amino acids are the building blocks of proteins. An -amino acid consists of a central carbon atom, called the carbon, linked to an amino group, a carboxylic acid group, a hydrogen atom, and a distinctive R group. The R group is often referred to as the side chain. With four different groups connected to the tetrahe- dral a-carbon atom, a-amino acids are chiral: they may exist in one or the other of two mirror-image forms, called the L isomer and the D isomer Only L amino acids are constituents of proteins. For almost all amino acids, the L isomer has S (rather than R) absolute configuration (Figure 2.5). What is the basis for the preference for L amino acids? The answer has been lost to evolutionary history. It is possible that the preference for L over D amino acids was a consequence of a chance selection. However, there is evidence that L amino acids are slightly more soluble than a racemic mixture of D and L amino acids, which tend to form crystals. This small solubility difference could have been amplified over time so that the L isomer became dominant in solution. Amino acids in solution at neutral pH exist predominantly as dipolar ions (also called zwitterions). In the dipolar form, the amino group is protonated (-NH3) and the carboxyl group is deprotonated (}COO). The ionization state of an amino acid varies with pH (Figure 2.6). In acid solution (e.g., pH 1), the amino group is protonated (}NH3) and the carboxyl group is not dissociated (}COOH). As the pH is raised, the carboxylic acid is the first group to give up a proton, inasmuch as its pKa is near 2. The dipolar form persists until the pH approaches 9, when the protonated amino group loses a proton.

COSY

COSY is a technique for determining correlations through the chemical bond. In it's most used form this allows us to see which proton resonances are mutually coupled. ... Peak of interest, known as 'cross-peaks', appear away from the diagonal axis where the two protons are coupled For example, the presence of a cross peak (a correlation off the diagonal) on a COSYdataset is a result of nuclei coupling through a bond(s) whereas a NOESY dataset measures NOE's (Nuclear Overhauser Effect) through space regardless of the number of bonds separating the nuclei. ¹H-¹H Correlation Spectroscopy (COSY) shows the correlation between hydrogens which are coupled to each other in the ¹H NMR spectrum. The ¹H spectrum is plotted on both 2D axes. nuclei that share electrons, bonded networks allow transfer between nuclei that are within covalently bonded network. if you see a cross by combining information from these two types of spectroscopy we can work out which atom is which and which atoms are close to each other by using dipole coupling information

CATH

Experimentally-determined protein three-dimensional structures are obtained from the Protein Data Bank and split into their consecutive polypeptide chains, where applicable. Protein domains are identified within these chains using a mixture of automatic methods and manual curation. The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution [2] i.e. they are homologous. Additional sequence data for domains with no experimentally determined structures are provided by CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments

proline

Generally speaking, peptide bonds are in the trans conformation. However, cis forms can occur in peptide bonds that precede a proline residue. In such cases, the cis form is more stable than usual since the proline side-chain offers less of a hindrance. Nonetheless, cis peptide bonds occur only in approximately 10% of instances of peptide bonds preceding proline residues. As can be seen above, steric hindrance between the functional groups attached to the Calpha atoms will be greater in the cisconfiguration. However for proline residues, the cyclic nature of the side chain means that both cis and trans configurations have more equivalent energies. Thus proline is found in the cis configuration more frequently than other amino acids. The omega torsion angle of proline will be close to zero degrees for the cis configuration, or most often, 180 degrees for the trans configuration. proline contains a secondary amine group ( imine) instead of primary amine group that's why its called an imino acid. since the 3-carbon R group of proline is fused to the alpha-nitrogen group this compound has a rotationally constrained rigid ring structure. when its put on a polypeptide chain then its unfavourable if trans or cis ( delta carbon is held by a bond to nitrogen atom so it closes with side chain) they're energetically unfavourable in an unconstrained polypeptide it'll equilibrate with a 9:1 ratio in trans 180 vs 0 isomerization is slow. due to its cyclic chain, cis is 0 degrees of peptide bond is energetically only less favourable than trans state. mature protein structures: 10% of all bonds between proline and amino acids is cis state. leave ribosome as trans after polypeptide synthesis Proline cis−trans isomerization plays a key role in the rate-determining steps of protein folding. The energetic origin of this isomerization process is summarized, and the folding and unfolding of disulfide-intact bovine pancreatic ribonuclease A is used as an example to illustrate the kinetics and structural features of conformational changes from the heterogeneous unfolded state (consisting of cis and trans isomers of X-Pro peptide groups) to the native structure in which only one set of proline isomers is present.

Homologues/ orthologues

Homology forms the basis of organization for comparative biology. A homologous trait is often called a homolog (also spelled homologue). In genetics, the term "homolog" is used both to refer to a homologous protein and to the gene ( DNA sequence) encoding it. As with anatomical structures, homology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs). Homology among proteins or DNA is often incorrectly concluded on the basis of sequence similarity. The terms "percent homology" and "sequence similarity" are often used interchangeably. As with anatomical structures, high sequence similarity might occur because of convergent evolution, or, as with shorter sequences, because of chance. Such sequences are similar, but not homologous. Sequence regions that are homologous are also called conserved. This is not to be confused with conservation in amino acid sequences in which the amino acid at a specific position has been substituted with a different one with functionally equivalent physicochemical properties. One can, however, refer to partial homology where a fraction of the sequences compared (are presumed to) share descent, while the rest does not. For example, partial homology may result from a gene fusion event.

polypeptide chains can change directions by making reverse turns and loops

Most proteins have compact, globular shapes owing to reversals in the direction of their polypeptide chains. Many of these reversals are accomplished by a common structural element called the reverse turn (also known as the turn or hairpin turn), illustrated in Figure 2.36. In many reverse turns, the CO group of residue i of a polypeptide is hydrogen bonded to the NH group of residue i 1 3. This interaction stabilizes abrupt changes in direction of the polypeptide chain. In other cases, more-elaborate structures are responsible for chain reversals. These structures are called loops or sometimes loops (omega loops) to suggest their overall shape. Unlike a helices and b strands, loops do not have regular, periodic structures. Nonetheless, loop structures are often rigid and well defined (Figure 2.37). Turns and loops invariably lie on the surfaces of proteins and thus often participate in interactions between proteins and other molecules. • beta -turns - tight turn between two strands • loops - often longer, lacking regular 2 degrees structure

natural surfactant proteins

Naturally occurring foam constituent and surfactant proteins with intriguing structures and functions are now being identified from a variety of biological sources. The ranaspumins from tropical frog foam nests comprise a range of proteins with a mixture of surfactant, carbohydrate binding and antimicrobial activities that together provide a stable, biocompatible, protective foam environment for developing eggs and embryos. Ranasmurfin, a blue protein from a different species of frog, displays a novel structure with a unique chromophoric crosslink. Latherin, primarily from horse sweat, but with similarities to salivary, oral and upper respiratory tract proteins, illustrates several potential roles for surfactant proteins in mammalian systems. These proteins, together with the previously discovered hydrophobins of fungi, throw new light on biomolecular processes at air-water and other interfaces. This review provides a perspective on these recent findings, focussing on structure and biophysical properties.

why does cis vs trans matter ?

Peptide bonds in proteins, including antibodies, are often planar due to their partially double bond nature, and the relative isomeric conformation of Cα atoms with respect to the C-N peptide bond is typically trans, yielding dihedral values close to 180°. The cisconformation, with dihedral angle close to 0°, is usually energetically unfavorable and thus less probable.5 In general, trans-cis conformers have an energetic difference of approximately 2-6 kcal/mol,6 while the trans-cis conformational switching activation energy to overcome is ~20 kcal/mol.6,7 For proline, due to its particular closed five-atom ring sidechain, the energetic cost of trans-cis conformational switching is lower (~2 kcal/mol less),8 with interconversion kinetics ranging from seconds to minutes.7 This proline isomerization is in chemical equilibrium, thus the same molecular entity can form cis and trans conformers simultaneously, as observed in protein crystal structures.5 Different variables, such as temperature or pH, can shift the equilibrium from one population to another or enrich either conformation state.9 Another important factor that affects the cis/trans ratio is the type of amino acid preceding proline. Prolines preceded by aromatic residues such as tyrosine or phenylalanine, or another proline, are more likely to adopt cis conformation.5,10 Proline isomerization forms the basis of some molecular allosteric switches10,11 and timers,12 and plays a role in different physiological (e.g., immune function,13 cell signaling14-17) and pathological conditions (e.g., cancer,18-21 Alzheimer's disease18,22). In a classical antibody structure, prolines can be found both in constant and variable regions. In constant regions, they participate in domain folding (i.e., CH2,23 CH3,24 scFv24 folding pathways) after quaternary structure formation, by the enzymatic activity of prolyl isomerases.25 The prolyl isomerization in antibody complementarity-determining regions (CDRs) can occur upon antigen binding,7 but in general, it is a rare event. The impact of proline isomerization on the biological functions of proteins has been widely documented.12,19-21,33,34 Proline isomerization has also been documented as a driver for proper folding of antibody constant domains, facilitated by prolyl isomerases.23-25,35-37 Despite the bimodal behavior of proline, only one conformation is generally compatible with the native structure of antibodies.37 An exception has been reported by Shinoda et al., who have shown that antigen binding can lead to proline isomerization and generation of both conformers in a CDR - antigen binding can lead to proline isomerization - immunoglobulin proteins serve as cell surface antigen receptors and upon antigen stimulation they are secreted as antibodies that provide infection. both chains vary in aa that provides specificity of antibody molecules that's cis trans matters when making immunoglobulins.

Origin of CD effect

Plane-polarised light can be split into L- and R-circularly polarised components. When the 2 components pass through a solution of a chiral chromophore, they will be absorbed unequally. The resulting beam is elliptically polarised, with ellipticity, q. A = AL - AR Numerically q (deg) = 32.98 A • Hence 10 mdeg corresponds to 3 x 10-4 A Plane polarised light, L+R= plane polarised Electromagnetic radiation consists of an electric (E) and magnetic (B) field that oscillate perpendicular to one another and to the propagating direction,[7] a transverse wave. While linearly polarized light occurs when the electric field vector oscillates only in one plane, circularly polarized light occurs when the direction of the electric field vector rotates about its propagation direction while the vector retains constant magnitude. At a single point in space, the circularly polarized-vector will trace out a circle over one period of the wave frequency, hence the name. The two diagrams below show the electric field vectors of linearly and circularly polarized light, at one moment of time, for a range of positions; the plot of the circularly polarized electric vector forms a helix along the direction of propagation (k). For left circularly polarized light (LCP) with propagation towards the observer, the electric vector rotates counterclockwise.[2] For right circularly polarized light (RCP), the electric vector rotates clockwise

Domains are autonomous folding units

Protein folding - the unsolved problem : Since the seminal work of Anfinsen in the early 1960s,[20] the goal to completely understand the mechanism by which a polypeptide rapidly folds into its stable native conformation remains elusive. Many experimental folding studies have contributed much to our understanding, but the principles that govern protein folding are still based on those discovered in the very first studies of folding. Anfinsen showed that the native state of a protein is thermodynamically stable, the conformation being at a global minimum of its free energy. Folding is a directed search of conformational space allowing the protein to fold on a biologically feasible time scale. The Levinthal paradox states that if an averaged sized protein would sample all possible conformations before finding the one with the lowest energy, the whole process would take billions of years.[52] Proteins typically fold within 0.1 and 1000 seconds. Therefore, the protein folding process must be directed some way through a specific folding pathway. The forces that direct this search are likely to be a combination of local and global influences whose effects are felt at various stages of the reaction.[53] Advances in experimental and theoretical studies have shown that folding can be viewed in terms of energy landscapes,[54][55] where folding kinetics is considered as a progressive organisation of an ensemble of partially folded structures through which a protein passes on its way to the folded structure. This has been described in terms of a folding funnel, in which an unfolded protein has a large number of conformational states available and there are fewer states available to the folded protein. A funnel implies that for protein folding there is a decrease in energy and loss of entropy with increasing tertiary structure formation. The local roughness of the funnel reflects kinetic traps, corresponding to the accumulation of misfolded intermediates. A folding chain progresses toward lower intra-chain free-energies by increasing its compactness. The chain's conformational options become increasingly narrowed ultimately toward one native structure.

protein folding

Proteins fold to the lowest free energy conformation. Linear chains of amino acids have multiple possible conformations so high entropy. As folding proceeds, both free energy and the entropy decrease to a minimum It is thought that hydrophobic interactions and hydrophilic interactions drive the early steps in folding. Proteins bury hydrophobic residues in an interior 'core'. Most proteins begin to fold during translation (more in a moment). Molten Globule - a partially folded protein state: conserves a native-like secondary structure content but without the tightly packed protein interior. van der Waals forces help the atoms in a protein pack together. Natural selection has strongly favoured protein sequences that have a single conformation that form easily and seldom makes mistakes (think α-helix, ß-sheet) Folding of a monomeric protein follows the structural hierarchy of primary - secondary - tertiary. The formation of the secondary structures and structural motifs occurs early in the folding process, followed by the assembly of more complex domains, which then associate into the tertiary structure.

Sic1 degradation drives cell cycle progression

Sic 1 is a CDK inhibitor ( cyclin dependent kinase) prevents progression into s PHASE IN THE YEAST CELL CYCLE sic1 degradation leads to g1->s CELL CYCLE TRANSITION sic1 targetted by an SCF ubiquitin ligase via its adaptor protein Cdc4 that recognised CPDs- Cdc4 phosphodegrons ( such as found in cyclin E )

p27Kip1 inhibitor of cyclin A/CDK2

The crystal structure of the human p27Kip1 kinase inhibitory domain bound to the phosphorylated cyclin A-cyclin-dependent kinase 2 (Cdk2) complex has been determined at 2.3 angstrom. p27Kip1 binds the complex as an extended structure interacting with both cyclin A and Cdk2. On cyclin A, it binds in a groove formed by conserved cyclin box residues. On Cdk2, it binds and rearranges the amino-terminal lobe and also inserts into the catalytic cleft, mimicking ATP. The crystal structure of the human p27Kipl kinase inhibitory domain bound to the phgsphoryl- ated cyclin A-cyclin-dependent kinase 2 {Cdk2} complex has been determined at 2.3 A. p27Kipl binds the complex as an extended structure interacting with both cyclin A and Cdk2. On cyclin A, it binds in a groove formed by conserved cyclin box residues. On Cdk2, it binds and rearranges the amino-terminal lobe and also inserts into the catalytic cleft, mimicking ATP. COMPLEXESofcyclinswithcyclin-dependentkinases(CDKs)playa central role in the control of the eukaryotic cell cycle1• The discovery of proteins that bind to and inhibit the catalytic activity of cyclin-CDK complexes has identified kinase inhibition as an intrinsic component of cell-cycle control (reviewed in refs 1-3). These inhibitors (CK.Is) induce cell-cycle arrest in response to anti-proliferative signals, including contact inhibition and serum deprivation4, TGF-P (ref. 5), myogenic6, myeloid7 and neuronal differentiation8, and DNA-damage checkpoints9• The inhibitors, which are present in proliferating cells as well, may also help to coordinate cell-cycle progression by their redistribution between different cyclin-CDK complexes10•11• The inhibitors that have been identified so far can be grouped, according to sequence and functional similarities, into two families. The Kip/Cip family of inhibitors, which include p27Kipl (refs 4, 12), p21 °r1, W AF-l (refs 9, 13-15) and p57Kir2 (refs 16, 17), bind to and inhibit cyclin-CDK complexes with broad preference for the G 1 and S phase kinase complexes over the mitotic ones. The Kip/Cip inhibitors can bind isolated cyclin and CDK subunits independently, but they have a higher affinity for the cyclin-CDK complexes12•18-20• The INK4 family members, which include p15, pl6, p18 and pl9, are specific for Cdk4, and its close isoform Cdk6, and can bind to either the isolated CDK subunit or its complex with cyclinD3• Members of the Kip/Cip family contain a 65-amino-acid region with homology (38-44% identity) at their N-terminal portions, which is necessary and sufficient to bind to and inhibit cyclin- CDK complexes4•16•18•21 • Their carboxy-terminal portions are variable in length and divergent in sequence and function3• Here we report the crystal structure of the 69-amino-acid N-terminal inhibitory domain of p27Kipl bound to t!Je phos- phorylated cyclin A-Cdk2 complex, determined at 2.3 A resolu- tion (Table 1 and Fig. 1). The structure reveals that p27 uses a three-stage approach to bind and inhibit the cyclinA-Cdk2 complex. It binds a peptide-binding groove on the conserved cyclin box of cyclinA, it binds the N-terminal lobe of Cdk2 and it also inserts deep inside the catalytic cleft, mimicking A TP . A comparison of Cdk2 in this complex with the structure of Cdk2 in the binary cyclinA-Cdk2 complex22•23 reveals that p27 binding causes large conformational changes in and around the catalytic cleft of Cdk2.

Advantage of domains in protein folding

The organisation of large proteins by structural domains represents an advantage for protein folding, with each domain being able to individually fold, accelerating the folding process and reducing a potentially large combination of residue interactions. Furthermore, given the observed random distribution of hydrophobic residues in proteins,[56] domain formation appears to be the optimal solution for a large protein to bury its hydrophobic residues while keeping the hydrophilic residues at the surface.[57][58] However, the role of inter-domain interactions in protein folding and in energetics of stabilisation of the native structure, probably differs for each protein. In T4 lysozyme, the influence of one domain on the other is so strong that the entire molecule is resistant to proteolytic cleavage. In this case, folding is a sequential process where the C-terminal domain is required to fold independently in an early step, and the other domain requires the presence of the folded C-terminal domain for folding and stabilisation.[59] It has been found that the folding of an isolated domain can take place at the same rate or sometimes faster than that of the integrated domain,[60] suggesting that unfavourable interactions with the rest of the protein can occur during folding. Several arguments suggest that the slowest step in the folding of large proteins is the pairing of the folded domains.[30] This is either because the domains are not folded entirely correctly or because the small adjustments required for their interaction are energetically unfavourable,[61] such as the removal of water from the domain interface.

statistical significance of alignments can be estimated by shuffling

The similarities in sequence in Figure 6.5 appear striking, yet there remains the possibility that a grouping of sequence identities has occurred by chance alone. Because proteins are composed of the same set of 20 amino acid monomers, the alignment of any two unrelated proteins will yield some identities, particularly if we allow the introduction of gaps. Even if two proteins have identical amino acid composition, they may not be linked by evolu- tion. It is the order of the residues within their sequences that implies a relationship between them. Hence, we can assess the significance of our alignment by "shuffling," or randomly rearranging, one of the sequences (Figure 6.7), repeating the sequence alignment, and determining a new alignment score. This process is repeated many times to yield a histogram showing, for each possible score, the number of shuffled sequences that received that score (Figure 6.8). If the original score is not appreciably differ- ent from the scores from the shuffled alignments, then we cannot exclude the possibility that the original alignment is merely a consequence of chance. When this procedure is applied to the sequences of myoglobin and a-hemoglobin, the authentic alignment (indicated by the red bar in Figure 6.8) clearly stands out. Its score is far above the mean for the alignment scores based on shuffled sequences. The probability that such a deviation occurred by chance alone is approximately 1 in 10 and Bioinformatics . Thus, we can comfortably con- clude that the two sequences are genuinely similar; the simplest explanation for this similarity is that these sequences are homologous—that is, the two molecules have descended from a common ancestor.

Generating a CD spectrum

When chiral chromophores are present, one state of circularly polarized light will be absorbed to a greater or lesser extent than the other. Over corresponding wavelengths, a circular dichroism signal can, therefore, be positive or negative, depending on whether L-CPL is absorbed to a greater extent than R-CPL (CD signal positive) or to a lesser extent (CD signal negative). Chirascan circular dichroism spectrometers measure alternately the absorbance of L- and R-CPL and then calculate the CD signal.

Ionisable side chain properties

Twenty kinds of side chains varying in size, shape, charge, hydrogen- bonding capacity, hydrophobic character, and chemical reactivity are com- monly found in proteins. Indeed, all proteins in all species—bacterial, archaeal, and eukaryotic—are constructed from the same set of 20 amino acids with only a few exceptions. This fundamental alphabet for the con- struction of proteins is several billion years old. The remarkable range of functions mediated by proteins results from the diversity and versatility of these 20 building blocks. Understanding how this alphabet is used to create the intricate three-dimensional structures that enable proteins to carry out so many biological processes is an exciting area of biochemistry and one that we will return to in Section 2.6. Although there are many ways to classify amino acids, we will sort these molecules into four groups, on the basis of the general chemical characteris- tics of their R groups: 1. Hydrophobic amino acids with nonpolar R groups 2. Polar amino acids with neutral R groups but the charge is not evenly distributed 3. Positively charged amino acids with R groups that have a positive charge at physiological pH 4. Negatively charged amino acids with R groups that have a negative charge at physiological pH Hydrophobic amino acids. The simplest amino acid is glycine, which has a single hydrogen atom as its side chain. With two hydrogen atoms bonded to the a-carbon atom, glycine is unique in being achiral. Alanine, the next simplest amino acid, has a methyl group (}CH3) as its side chain - asparatate and Glu are -ve, they bind +ve charges like metal ions - His,cys: close to equilibrium= involved in acid/base catalysis - Lys,Arg are positively charged binds to negative charges such as the phosphate backbone of DNA protein environment can manipulate pKa

Circular Dichroism

Uses UV light to measure 2° structure. Can be used to measure destabilization. is an absorption spectroscopy method based on the differential absorption of left and right circularly polarized light. Optically active chiral molecules will preferentially absorb one direction of the circularly polarized light. Very economical in terms of time and sample. Can use a wide range of conditions. Structural changes can be studied on a rapid time scale (e.g. stopped flow) - useful for protein folding. Main limitation is the low resolution of structural information. Secondary structure content from far UV CD. Tertiary structure fingerprint from near UV CD. Circular Dichroism (CD) is an absorption spectroscopy method based on the differential absorption of left and right circularly polarized light. Optically active chiral molecules will preferentially absorb one direction of the circularly polarized light. The difference in absorption of the left and right circularly polarized light can be measured and quantified. UV CD is used to determine aspects of protein secondary structure. Vibrational CD, IR CD, is used to study the structure of small organic molecules, proteins and DNA. UV/Vis CD investigates charge transfer transitions in metal-protein complexes.

Alignment Scores

We need to differentiate good alignments from poor ones. We use a rule that assigns a numerical score to any alignment; the higher the score, the better the alignment. For any proposed rule for scoring an alignment, there are two questions: 1. Given any alignment, can we compute its score? 2. Given two sequences, can we automatically find a local alignment of highest possible score? For some rules, the second answer is "No".

Determining protein 3D structure

X-ray crystallography, NMR spectroscopy Elucidation of the three-dimensional structure of a protein is often the source of a tremendous amount of insight into its corresponding function, inasmuch as the specificity of active sites and binding sites is defined by the precise atomic arrangement within these regions. For example, knowledge of the structure of a protein enables the biochemist to predict its mechanism of action, the effects of mutations on its function, and the desired features of drugs that may inhibit or augment its activity. X-ray crystallography and nuclear magnetic resonance spectroscopy are the two most important tech- niques for elucidating the conformation of proteins.

beta sheets

beta sheets are stabilised by hydrogen bonding between polypeptide strands Pauling and Corey proposed another periodic structural motif, which they named the pleated sheet (b because it was the second structure that they elucidated, the a helix having been the first). The b pleated sheet (or, more simply, the b sheet) differs markedly from the rodlike a helix. It is composed of two or more polypeptide chains called strands. A b strand is almost fully extended rather than being tightly coiled as in the a helix. A range of extended structures are sterically allowed (Figure 2.29). The distance between adjacent amino acids along a b strand is approxi- mately 3.5 Å, in contrast with a distance of 1.5 Å along an a helix. The side chains of adjacent amino acids point in opposite directions (Figure 2.30). A b sheet is formed by linking two or more b strands lying next to one another through hydrogen bonds. Adjacent strands in a b sheet can run in opposite directions (antiparallel b sheet) or in the same direction (parallel b sheet). In the antiparallel arrangement, the NH group and the CO group of each amino acid are respectively hydrogen bonded to the CO group and the NH group of a partner on the adjacent chain (Figure 2.31). In the parallel arrangement, the hydrogen-bonding scheme is slightly more complicated. For each amino acid, the NH group is hydrogen bonded to the CO group of one amino acid on the adjacent strand, whereas the CO group is hydrogen bonded to the NH group on the amino acid two residues farther along the chain (Figure 2.32). Many strands, typically 4 or 5 but as many as 10 or more, can come together in b sheets. Such b sheets can be purely antiparal- lel, purely parallel, or mixed (Figure 2.33).

SH2 domains bind to

phosphotyrosine • Structural studies of proteins in the late 1980's revealed an unexpected similarity in protein primary sequence between a proto- oncogene, Src kinase, and a number of other proteins of different functions. This similarity was observed in three domains, the kinase domain (more on this later), and two other domains called Src-homology-2 and Src-homology-3 domains: SH2 and SH3 domains. Analysis of the sequences using bioinformatics began to reveal similarities...SH2 (and SH3) domains found in many other proteins! >110 proteins in humans contain an SH2 domain These very important in Signal Transduction - coming up later in course Environmental Perception block. SH2 domains are structurally conserved... SH2 domains can autonomously fold... Every known SH2 domain exhibits this arrangement of α-helix and ß-sheet. • Domains - e.g. SH2 domain, Kinase domain, Bromodomain

polypeptide chains

polypeptide chains are flexible yet conformationally restricted. the double bonds are usually prevented from rotations (peptide backbone constrained ). the peptide bond is planar, for a pair of amino acids linked by a peptide bond, 6 atoms lie in the same plane. the carbon group and co group of first amino acid group and NH group + carbon alpha of second amino acid. the bond resonates between a single bond and a double bond character. because of the double bond character, rotation about this point is prevented and thus conformation of the peptide backbone is constrained. double bond is also expressed in the length of the bond between CO and NH groups. the peptide bond is uncharged allowing polymers of amino acids linked by peptide bonds to form tightly packed globular structures Examination of the geometry of the protein backbone reveals several important features. First, the peptide bond is essentially planar (Figure 2.18). Thus, for a pair of amino acids linked by a peptide bond, six atoms lie in the same plane: the a-carbon atom and CO group of the first amino acid and the NH group and a-carbon atom of the second amino acid. The nature of the chemical bonding within a peptide accounts for the bond's planarity. The bond resonates between a single bond and a double bond. Because of this partial double-bond character, rotation about this bond is prevented and thus the conforma- tion of the peptide backbone is constrained The partial double-bond character is also expressed in the length of the bond between the CO and the NH groups. As shown in Figure 2.19, the C}N distance in a peptide bond is typically 1.32 Å, which is between the values expected for a C-N single bond (1.49 Å) and a C=N.double bond (1.27 Å). Finally, the peptide bond is uncharged, allowing polymers of amino acids linked by peptide bonds to form tightly packed globular structures.

ionisation state as a function of pH

process by which an atom acquires a + or - charge by gaining or loosing electrons - the ionisation state of amino acids is altered by a change in pH. - the zwitterionioc form ( 0-11.5) form predominates near physiological pH - In an acid solution ph= -1, the amino group is protonated ( -nh3+) and the carboxyl group is not dissociated ( cooh) - as the pH is raised, the carboxylic acid is the first group to give up a proton. the dipolar form persists until a pH ( protonated aa) loses a proton

NMR: multidimensional spectra

projections of spectra of 2 frequency dimensions A combination of 1-dimensional and 2-dimensional NMR experiments are necessary for complete confidence in chemical structure.

figure

the bars represent pulses of radio waves and the spaces between them are the gap times preinduction delay, that's the oscillation frequency we can use the fully transformed to get a 1D structure that'll extract frequency it oscillates By varying time between pulses, we can modulate the amplitude. when we fully transform each of these FID, you can get a diff 1D spectrum. signal lies in the same position in frequency axis but different intensity. as you vary length between delay of pulses, modulate signal of a specific frequency by another one.

The proteome

the entire set of proteins expressed by a given cell or group of cells "Proteins expressed from the genome." ~23,000 genes in human genome ~10,000 genes expressed in any one cell at any one time >100,000 different proteins produced by - alternative splicing- post-translational modification Copy number from 1 to ~500,000 per cell Differs with cell type, over time, in response to signals A proteome is the complete set of proteins expressed by an organism. The term can also be used to describe the assortment of proteins produced at a specific time in a particular cell or tissue type. The proteome is an expression of an organism's genome. However, in contrast with the genome, which is characterized by its stability, the proteome actively changes in response to various factors, including the organism's developmental stage and both internal and external conditions. The study of the proteome is called proteomics, and it involves understanding how proteins function and interact with one another. For instance, many proteins fold into elaborate three-dimensional structures, and some form complexes with each other to perform their functions. In addition, proteins undergo modifications, which may occur either before or after translation. The proteome can be studied using a variety of techniques. For example, two-dimensional gel electrophoresis can be used to separate proteins by their sizes and by their charges. The proteome can also be studied using another laboratory technique called mass spectrometry, which identifies specific proteins within complex samples

1D NMR experiment

when remeasure NMR spectroscopy we frequently measure decaying radio waves ( oscillation radiowave spectrum in respect to time, its a one dimension experiment. 1 frequency dimension record signal (digitised) in time domain and Fourier transform into frequency domain Measurement of a regular 1D NMR experiment is carried out in three stages and it is convenient to explain the process by reference to the vector model: An rf signal is transmitted (according the pulse sequence in fig. 1) of sufficient power (~50 W) and for a sufficient period of time (a few microseconds). In order to move the magnetization vector from the z-axis to the x,y-plane (a pulse of up to 90°). The signal that evolves due to precession of the magnetization vector is measured after the pulse. At the end of the process, the vector returns to equilibrium on the z-axis. The process is called free induction decay (fid) of the magnetization. The measured fid is the variation of magnetization with time. Because the information in the signal is incomprehensible as it is, we are interested in a spectrum of intensity versus frequency. Therefore, there is a need to transform from the time domain to the frequency domain using a mathematical procedure on the fid called a Fourier transform. In the fid signal there is a repetitive component that is frequency dependent. A Fourier transform (fig. 2) reveals the frequency component in the fid and results in the desired spectrum - intensity versus frequency. The resolution of the spectrum can be improved slightly by adding zeros to the end of the FID, a process known as zero filling (fig. 3). The disadvantage is that this uses extra computing resources. The sensitivity or resolution (but not both together to a great extent) of the spectrum can be improved by multiplying the FID by a window function (apodization) before Fourier transformation. An exponential decay function is used to increase sensitivity (fig. 4). The strength of the decay is set by the line-broadening (on Bruker and Varian, LB and on JEOL and Mestre-C, width) factor in Hz. For greatest effect, LB should be set to the line-width at half height. However, with the increase in sensitivity comes a loss in resolution The resolution can be enhanced by using a Gaussian function (fig. 5) for which there are two parameters: a line narrowing parameter that should be set to the line-width (on Bruker it is a negative LB, on JEOL and Mestre-C it is a positive width) and a Gaussian broadening parameter (on Bruker GB is a fraction of the acquisition on JEOL it is the shift, the point on the time axis in seconds where the function is maximum and on Mestre-C it is a percent shift) between zero and 0.5 (it can be set higher but this is pointless) that should be set as high as possible without losing the signal into the noise. Gaussian resolution enhancement reduces sensitivity. For acquisition data with only a few points and the FID is truncated (usually the case for 2D NMR) it is convenient to use one function for both resolution and sensitivity enhancement, choosing which by changing one parameter. For this, a sine-bell or sine-bell squared function is used. On Bruker the SSB parameter is set to 2 for sensitivity enhancement and 1 for resolution enhancement.

protein structure

• Detecting secondary structure content - Circular dichroism • The organisation of tertiary structure - Domains and folds • Determining protein 3D structure - X-ray crystallography, NMR spectroscopy and electron microscopy

Proteins are built of domains

• Discrete units of 3D protein structure • Often "autonomously folding" • ~30-300 residues long, median ~120 residues A protein domain is a conserved part of a given protein sequence and tertiary structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length.[1] The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

Sequence similarity: checking significance

• Generate randomly scrambled sequences • Align & score • Does true sequence score significantly better than scrambled sequences? yes.

Scoring alignments

• Going beyond just checking identity• Mutations can be good, bad, or indifferent • Not all mutations are equal An alignment of two sequences (frequently called a local alignment) can be obtained as follows. 1. extract a segment from each sequence 2. add dashes (gap symbols) to each segment to create equal-length sequences 3. place one padded segment over the other

Natively disordered proteins

• Many proteins do not adopt a defined globular structure • Extended structures may allow them to recognise targets that are large complexes or allow them to recognise alternative, competing targets. • Extended structures can have extensive binding interfaces (good specificity, favourable deltaH) but have to become ordered (unfavourable -TS) resulting in high specificity with low affinity (G). Short lived, easily reversed.

Scoring: substitution matrices

• Not all mutations are equal • Score by:- Relationship to genetic code - Physicochemical properties of amino acids - Sequence conservation in closely related proteins

What is NMR? ( proteins in solution)

• Nuclear magnetic resonance - nuclei have a quantum mechanical propertycalled "spin angular momentum". - for nuclei with non-zero spin angular momentum e.g. H, C, N, P,( H) each spin state also has a magnetic moment. - magnetic moments interact with external magnetic fields giving each spin state a different potential energy. X-ray crystallography is the most powerful method for determining protein structures. However, some proteins do not readily crystallize. Furthermore, although structures of crystallized proteins very closely represent those of pro- teins free of the constraints imposed by the crystalline environment, structures in solution can be sources of additional insights. Nuclear magnetic resonance (NMR) spectroscopy is unique in being able to reveal the atomic structure of macromolecules in solution, provided that highly concentrated solutions (,1 mM, or 15 mg ml1 for a 15-kDa protein) can be obtained. This technique depends on the fact that certain atomic nuclei are intrinsically magnetic. Only a limited number of isotopes display this property, called spin, and those most important to biochemistry are listed in Table 3.4. The simplest example is the hydrogennucleus(1H),whichisaproton.Thespinningofaprotongenerates amagneticmoment.Thismomentcantakeeitheroftwoorientations,orspin states (called a and b), when an external magnetic field is applied (Figure 3.43). The energy difference between these states is proportional to the strength of the imposed magnetic field. The a state has a slightly lower energy because it is aligned with this applied field. Hence, in a given population of nuclei, slightlymorewilloccupytheastate(byafactoroftheorderof1.00001ina typicalexperiment).Aspinningprotoninanastatecanberaisedtoanexcited state(bstate)byapplyingapulseofelectromagneticradiation(aradio- frequency, or RF, pulse), provided that the frequency corresponds to the energy difference between the a and the b states. In these circumstances, the spin will change from a to b; in other words, resonance will be obtained. These properties can be used to examine the chemical surroundings of the hydrogen nucleus. The flow of electrons around a magnetic nucleus generates a small local magnetic field that opposes the applied field. The degree of such shielding depends on the surrounding electron density. Consequently, nuclei in different environments will change states, or resonate, at slightly different field strengths or radiation frequencies. A resonance spectrum for a molecule is obtained by keeping the magnetic field constant and varying the frequency of the electromagnetic radiation. The nuclei of the perturbed sample absorb electromagnetic radiation at a frequency that can be measured. The different frequencies, termed chemical shifts, are expressed in fractional units d (parts per million, or ppm) relative to the shifts of a standard compound, such as a water-soluble derivative of tetramethylsilane, that is added with the sample. For example, a —CH proton typically exhibits a chemical shift (d) of 1 ppm, 3 compared with a chemical shift of 7 ppm for an aromatic proton. The chemical shifts of most protons in protein molecules fall between 0 and 9 ppm (Figure 3.44). Most protons in many proteins can be resolved by using this technique of one-dimensional NMR. With this information, we can then deduce changes to a particular chemical group under different conditions, such as the conformational change of a protein from a disordered structure to an a helix in response to a change in pH.

proteins and evolution

• Relationships between proteins and genes • Detecting related proteins - Sequence based- Structure based • Mechanisms for generating diversity from proteins - Duplication & divergence - Gene fusion- Exon shuffling

pKa

• pKa is the pH at which an acid (or base) is 50% deprotonated (protonated) - pKa = −log10Ka the lower the Pka the stronger the acid, dissociates in water pka is a method used to indicate the strength of an acid pH below pKa of each functional group= functional group is protonated pH above pKa= deprotonated ph=pKa will mean that its 50% protonated and 50% deprotonated The lower the pKa value, the stronger the acid. For example, the pKa of acetic acid is 4.8, while the pKa of lactic acid is 3.8. Using the pKa values, one can see lactic acid is a stronger acid than acetic acid. The reason pKa is used is because it describes acid dissociation using small decimal numbers. The same type of information may be obtained from Ka values, but they are typically extremely small numbers given in scientific notation that are hard for most people to understand. The pKa value is one method used to indicate the strength of an acid. pKa is the negative log of the acid dissociation constant or Ka value. A lower pKa value indicates a stronger acid. That is, the lower value indicates the acid more fully dissociates in water.

Electron density at high resolution

•At resolution better than ~2Å you can see the electron density for individual atoms. •In this example the final R factors areRwork 0.157 and Rfree 0.178 once the model is built and refined you can see multiple electron densities ( high E.D) where atoms are)

proteins

Related study sets

Section 5.2: "Inner Product Spaces"

Summation of Arithmetic Series (2)

Economics Ch. 5: Supply

Social Psychology Exam 2: Reactance Theory

wrong answer 3

2023 AHIP

ATI: Fluid, Electrolyte, and Acid/Base Regulation

Test Questions

Physics I: Final Exam Study Guide

T6: Compression Techniques (Ch 11,12,13,14)

CH.12 Env. Sci.

EAQ

Spain Final

Pharm ch 48 - Immunosuppressant Drugs

Unit 9 - The Government Regulators

Growth & development Exam 2- Chapter 6

Physical Science Final part 6

Unit 11 Vocab

Adrenal Medicine

Forensics-Chapters 6-9, 13,16