Biochemistry Practical 1
.... need to go look at picture
. For now, we will focus on the large white rectangle shown on this page; this contains a graphical representation of the genomic features (e.g. protein coding genes, percent GC) of chr3L mapped against the DNA sequence, which is embedded in the top line of the white box. The different types of features (also known as "tracks" or "evidence tracks") are separated by a title and are often shown in different colors. For example, we can examine the region under the blue title labeled "FlyBase Protein-Coding Genes" to estimate the number of protein-coding genes on chr3L. In this track each gene is represented by a set of blue boxes connected by thin blue lines. There are clearly fewer blue boxes at the right side of the image compared to the left, which suggests that genes are not uniformly distributed along the chromosome
pH scale
1-6 acidic (protonated) 7 neutral 8-14 basic (unprotonated)
Solvent
17 ethyl acetate: 2 methanol: 1 concentrated ammonium hydroxide
How do you know that the leptin gene is on the + strand of human chromosome 7?
5'→ 3' This is the plus strand, it is easier to read, number nucleotides 1 → # 3 → 5' This is the minus strand, if it was on the minus strand the number would be written high # →1 We know that it is on the positive strand because the nucleotide numbers go from low to high indicating the strand is going in the 5' to 3' direction.
What is a buffer?
A buffer is a mixture of a weak acid and their conjugate base.At pH = pKa, there is a 50:50 mixture of acid and anion forms of the compound. Buffering capacity of acid/anion system is greatest at pH = pKa. Buffering capacity is lost when the pH differs from pKa by more than 1 pH unit.
What is BLAST?
A program that reports regions of similarity (at the nucleotide or protein level) between a query (your input) sequence and sequences within a database. BLAST uses a robust statistical framework that determines if the alignment between two sequences is statistically significant (i.e. has a low probability of the reported alignment being produced by chance alone). The ability to detect sequence similarity allows scientists to determine if a gene or a protein is related to other known genes or proteins in the same species or between species.
What is an acid and what is a base?
Acid -proton donor base - proton acceptor
What is the E-value for BLASTs?
Alignments with smaller E-values are more statistically significant and are less likely to have arisen by chance. Specifically, the E-value denotes the number of times we expect to see an alignment with scores equal to or greater than S (alignment score) when we align random sequences against each other
Q10. Does frame +2 have an ORF in this region? What about frame +1 and frame +3?
All three reading frames have an open reading frame at this point. None of them contain stop codons in the open reading frame of exon 3.
What does BLAST stand for?
Basic Local Alignment Search Tool
Take a little time to explore some of the other evidence tracks on the browser. While looking at contig1 (size 11,000 bp), put the "GC Percent" track on full. What sort of pattern do you see, relative to the map of the genes? What can you conclude about gene structure?
CG percent when looking at introns and exons: The percent GC is higher in exons than in introns.
Q2. What are the names of these features?
CG32165 -RC CG32165-RB CG32165-RA Spd-2-RA tra-RB tra-RA
Q3. Which feature has the largest span (i.e. the largest distance between the start and end of the feature)?
CG32165-RC
H2O --> H+ + OH-
Concentrations of participating species in an equilibrium process are not independent but are related via the equilibrium constant: [H+] = [OH-] = 10-14 M
Q1. How many distinct features are there in contig1?
Contiguous sequence - A small part of chr3L There are six features (genes) in contig 1.
The picture depicts the mRNA (you know it's the mRNA since the identifier starts with NM_) with 4 boxes, 2 light green in color (1 on each end) and 2 dark green. What do you think each of those boxes refers to?
Each box is an exon on the gene. Exons and introns of the mRNA go through splicing before it becomes fully functional. Introns are usually bigger than exons and introns are what gets spliced out. Light grean boxes at the end are the untranslated regions that are important for the ribosome to set down on and the dark green boxes are the actual exons that get translated.
Maintenance of intracellular pH is vital to all cells.
Enzyme-catalyzed reactions have optimal pH. Solubility of polar molecules depends on H-bond donors and acceptors. Equilibrium between CO2 gas and dissolved HCO3- depends on pH.
General DNA info
Genes encode information that our cells use to carry out their functions. In particular, protein-coding genes provide the cell with the information to make messenger RNAs (mRNA), which are then used to make proteins.
What are the 6 strong acids?
HCL, HBr, HI, HNO3, HClO4, H2SO4 all other inorganic acids are weak. Organic acids are weak as well.
Purpose of introns and exons coding information
Hence the thick rectangles denote the parts of the exon that carry information about the protein sequence, while the thin blocks indicate regions that do not carry protein sequence information (Figure 14).
What is the relationship between V and M that warranted a + designation (between the chimp and human sequence)?
Humans - DFIPGLHPILTLSKMDQTLAVYQQILTSMPSRNvIQISNDLENLRDLLHVLAFSKSCHLP 120 Match line - DFIPGLHPILTLSKMDQTLAVYQQILTSMPSRN+IQISNDLENLRDLLHVLAFSKSCHLP Chimps - DFIPGLHPILTLSKMDQTLAVYQQILTSMPSRNmIQISNDLENLRDLLHVLAFSKSCHLP Pluses denote amino acids that are either the same or have similar chemically properties. Since both V and M are designated by a + in the match line this indicates that both of these amino acids are chemically similar and most likely in the same amino acid group.
What is the percent identity and percent positives (aka percent similarity) of the Ochotona dauurica bedfordi leptin sequence compared to the human sequence?
Identities are the letters in the match line that are the same between both organisms while the percent positives are the amino acids that differ from each other but are chemically similar and therefore function in similar ways. The percent identity is 121/167 (72%) while the percent positives are 139/167 (83%) which is much lower than the 99% of the chimp when compared to the human sequence.
What is an E-value (explained)?
If I blast a sequience against a database and it hits several times in a genome then it is not statistically significant. The lower the E value, the less likely it is to come up randomly. It tells you you found alignment and that the probability of finding something this unique sequence in a genome is really rare and you found it - probability.
on which chromosome is the leptin gene located?
In humans, the leptin gene is located on chromosome 7. It is on the positive strand relative to the orientation to the human chromosome 7.
What is a Genome Browser?
In this module, we will use a web-based visualization tool called a Genome Browser to explore the structure of a eukaryotic gene, and obtain a basic understanding of how this information is stored and used. In subsequent modules, you will learn more about the details of these biological processes, and use the Genome Browser to examine the experimental data that provide evidence for a detailed gene structure. The protein-coding genes in eukaryotes (higher organisms, with a cell nucleus) are much more complex than the protein-coding genes in prokaryotes (bacteria, organisms without a nucleus).
How does a BLAST classify sequences as homologous or not?
Mutations in genes with an important biological function have a higher probability of being harmful to the organism and are less likely to become fixed in a population. Such sequences are said to be under negative selection, which causes them to be conserved against change over time. Therefore, it is expected that two homologous copies of a functional sequence will show a higher degree of sequence conservation (observed as base-by-base similarity at the nucleotide level) than either two unrelated sequences or two sequences that are not under strong negative selection. This similarity is the "signal" detected by a BLAST search.
Visualization with Chemical Solutions
Ninhydrin - dextromethorphan should appear as a purple/tan spot, pseudophedrine should appear as a bright pink and guaifenesin should appear as a white spot and is not always obvious but after adding diphenylcarbazone pseuoephedrine turns darker and guaifenesin turns pink. iodoplatinate - pseudoephedrine, dextromethorphan and quinine wil appear brown. Caffeine may appear as a greenish white spot but will only be identifiable as process of elimination. iodoplatinate - DHEA should appear as a white spot near the solvent front (R-I).
chemical solutions
Ninhydrin, diphenylcarbazone, iodoplatinate
Combined standard
No urine. caffeine, dextromethorphan, DHEA, guaifenesin, pseudoephedrine, quinine
Q8. Why do you think it takes three lines to display the amino acid information? Hint: remember that a codon is specified by three bases, e.g. CCG = Proline (circled in Figure 13)
Nucleotides are read in groups of three called codons. The codons are nonoverlapping and once a nucleotide is read it cannot be read twice. It takes three lines to display the amino acid information because codons can be different depending on the reading frame of the sequence. Each line denotes a different starting point in the sequence so the following codons are made up of different nucleotides in each line. The differences in the reading frame increases the number of gene products able to be made from that sequence.
Ionization of water
OH bonds are polar and can disassociate heterolytically. Products are a proton and a hydroxide ion. Dissociation of water is a rapid reversible process. Most water molecule remain un-ionized, thus pure water has a very low electrical conductivity. the extent of dissociation depends on the temperature.
How does leptin control feeding?
Obesity is an excess of body fat that frequently results in a significant impairment of health. Obesity is a known risk factor for chronic diseases including heart disease, diabetes and stroke. It has more than one cause: genetic, environmental, psychological and other factors may play a part. The hormone leptin is produced by fat cells (adipocytes) and found on chromosome 7. It acts as a lipostat where as the amount of stored fat rises, leptin is released into the blood and signals the brain that the body is full. Whole networks of signals contribute to weight homeostasis. Obesese people tend to have high levels of leptin which indicates there are other factors in this process.
How many hits came out of this search? Are all of them significant matches? (Note: in general, we consider matches with E-values less than 1 x 10-5 as statistically significant). From the results of this BLAST search, can you conclude that chimps have a homolog of human leptin?
Only two hits came out of this search. Not all of them are significant as the Predicted: panthothenate kinase 4 isoform x1 has an E value of 9.4 while the leptin precursor (Pan troglodytes) has an E value of 5e-121 stating it is statistically significant. Yes you can conclude that chimps have a homolog of human leptin because they are 99% identical and have a very low E value.
Organic acids and bases
Organic acids usually contain C, H and O in its molecular formula (such as carboxylic acid functional group (COOH). Ammonia (NH3) and organic bases are weak bases. organic bases usually contain C, N,H in their molecular formula (usually amines).
Importance of choosing a solvent and stationary phase.
Polarity, bonding, solubility, and intermolecular forces between the solvent s and stationary phase and the molecules being separated all affect the separation and identification process
Translation
Proteins are made up of amino acids, and the mRNA provides the information for the amino acid sequence. This information is read by the cell in groups of three bases, with each three-base group (i.e. codon) specifying an amino acid. The Genome Browser uses single-letter abbreviations to represent each amino acid.
What is RefSeq status?
RefSeq is a comprehensive, curated database of non-redundant sequences that are identified with an accession number.version with the following prefixes: •chromosomes: NC_ •genomic regions: NG_ •mRNA: NM_ •proteins: NP_
what level are sequences most conserved at?
Sequences are most conserved between species at the amino acid level and this is what we will use in our BLAST searches. There are two choices for proteins: the consensus coding sequence (CCDS - another database that integrates and links to information from several different databases) or the RefSeq sequence (NP_).
What is the extra sequences at the end of mRNAs?
The Genome Browser therefore shows that a part of the mRNA extends beyond the end of the protein-coding region. This is a general property of mRNAs: they contain extra sequences both before and after the protein-coding sequence.
Inrons and Exons/ transcription
The black blocks are the exons (expressed regions of the gene; Figure 11). To use the information stored in a gene, a cell uses the DNA sequence as a template to produce a molecule called a messenger RNA (mRNA). This process is called transcription. You will see in module 2 that while the initial transcript (product of transcription) is continuous, copying all of the DNA, only exon sequences are retained in the processed mRNAs. The lines connecting the blocks are the introns (intervening regions of the gene). These sequences will be removed during the production of mature mRNAs. The arrows on the lines denote the direction of transcription (or orientation) of the gene.
Similarity in leptin sequence between humans and chimpanzee (Pan Troglodyte)?
The chimp match is from a sequence in the RefSeq database with a total length of 167 amino acids. With an E value is 7 x 10-117 and a single amino acid substitution (from V to M), this high degree of sequence similarity between our query human Leptin precursor and the predicted chimp Leptin is unlikely to be caused by chance alone.
Reading frames
The combination of the directionality (with two alternative directions) and the three rows in the "Base Position" track means that there are six different ways to translate a genomic region, i.e. to determine the sequence of amino acids from a DNA sequence. These different ways to translate a genomic region are known as reading frames. Examination of the "Base Position" track at the beginning of the contig shows that the three positive reading frames are numbered relative to the start of the contig1 sequence. Similarly, the three reading frames on the bottom strand are numbered relative to the end of the contig1 sequence
Q5. Why do you think the bases are displayed in this way in the Genome Browser?
The default position is 5' to 3' because that is the standard direction in which DNA is read. You assume you are reading the top strand unless otherwise stated.
Q13. What is the orientation of this gene relative to contig1? How do you know? Where are the start codon and the stop codon (give the base position numbers of the codon)?
The gene is read on the plus strand because the start codon is at the beginning of the first exon. When looking at the gene directly, the computer reads each exon as 1 then 2 and then 3. Instead of reading it, 3, 2, 1. This is how we know it is on the plus strand.
Q9. Based on the screenshot shown in Figure 23, which reading frame contains the amino acid sequence for the second coding exon of tra-RA?
The second coding exon is in the +2 reading frame because it doesn't have any stop codons interrupting the exon like in the other two lines. For instance, exon 2 has an extra 2 nucleotides and exon 1 has an extra nucleotide so when they are joined together they will make a complete codon of three nucleotides. Since it is +2, when the exons are joined together and introns are spliced out, exon 1 will be in the +1 reading frame while exon two is in the +2 reading frame and the third exon is in the +3 reading frame so that when they are joined together, the codons are read in the correct pairs of three nucleotides.
Theory of evolution and ancestors
The theory of evolution is based on all organisms descending from common ancestors by speciation. At the molecular level, an ancestral DNA sequence diverges over time (through accumulation of point mutations, duplications, deletions, transpositions, recombination events, etc.) to produce diverse sequences in the genomes of living organisms. Such sequences are classified as homologs if they come from the same ancestral gene.
When preforming a BLAST search.....
There are many different options for a BLAST search and the option you choose depends on the sequence you have and the degree of conservation you are trying to detect.
Acid-Base conjugate pairs
They resist changes in pH acting as a buffer. A buffer is most effective at a pH near its pKa.
Chromatography
This technique takes advantage of the solubility of one substance within another. The separation procedures are determined based on the molecular structures of the molecules. polar molecules are attracted to other polar molecules making them attracted to polar solvents or polar solid phases. Meanwhile, non polar molecules are attracted to non polar solvents or non polar solid phases. these types of molecular forces and attractions are the basis of separating molecules by chromatography.
Gel R instructions
View under long UV light (purple spots are quinine). Then mist with Ninhydrin solution - spots of amine will develop against a pink background. Amphetamines, AA, dextromethorphan, pseudo ephedrine and phenylopropanolamine will develop as pink spots. UV light will enhance the color of the spots. Then use Diphenylcarbazone and the amphetamines will intensify in color. the use iodoplatinate and spots will appear immediately in a variety of colors from yellows to greens to reddish-browns.
After a BLAST search in complete....
When your search is complete, you will get a graphical output of the database entries (subjects) that align with your input leptin sequence (query) on a color scale. Features with highly statistically significant alignments will appear in red in the graphical output while features with the lowest statistical significance will appear in black.
pH= pKa + log[A-/HA]
Where HA is a very weak acid (base) and A is a strong acid
Did you obtain matches from this search? Are they significant matches? How do you know? Does this data support your hypothesis about the presence of a leptin homolog in chimps? Did a BLAST of the chimp leptin gene - found homo sapient match
Yes, we obtained two matches and both were statistically significant. The E value for leptin precursor for homo sapiens is 7e-121 which is less than the 1e-5. Yes, it supports the hypothesis that of the presence of a leptin homolog in chimps.
What is a BLAST tool?
You are going to have a query and a subject. So when you do a BLAST search you are taking a sequence and blasting against a data base that has a bunch of different genome sequences If you have a query that is a nucleotide sequences and blast it against a subject of nucleotide sequences. Or you can take a query of a nucleotide sequence and have it translated into a protein and have it blasted against a bunch of protein
Why do you think you want the protein sequence as opposed to the nucleotide sequence?????????
You want to use the protein sequence as opposed to the nucleotide sequence because sequences are most conserved between species at the amino acid level and so there will be more similarities. It is important because different nucleotides (codons) could code for the same amino acid - the code is degenerate.
Q11. Given that 3 of the 64 possible codons are stop codons, what is the chance of having a stop codon at any given position, assuming that the sequence is random? Are you surprised that all three reading frames have an ORF here?
You would assume that there would be a stop codon about 1 out of every 21 codons or there is about a 4.7% chance that a codon will be a stop codon at any given position. Yes, I am surprised that all three reading frames have an OFR here because the probability of having a stop codon is 1 out of every 21 codons and there are more codons that that in the reading frame.
What is the gene leptin do?
a gene that has been found to contain mutations associated with sever obesity and the development of type 2 diabetes. It is a hormone that works in weight control.
Urine Lab
first the urine samples will be run over diatomaceous earth column. the the drugs all be isolated by eluting them from the columns. Drugs will further be separated from the mixtures using chromatography. Then the drugs will be identified using specific visualization techniques.
Visualization with UV light
long UV wave light or black light, will cause certain molecules to fluoresce. Quinine will fluoresce and appear purple important because quinine, dextromethorphan and guaifenesin have similar Rf values. short UV light will cause the molecules to appear as dark spots and will reveal all the drug samples. This does not allow for definite identification of samples however unless pure standard are made.
What is pH (equation)?
pH=-log[H+) pH + pOH =14
What is pKa?
pKa is the pH at which the acid is half disassociated.
pka measures acidity
pKa= -logKa
polar vs. non polar
polar molecules have distinct areas of positive and negative charge. water is polar and attracts other polar molecules when used as a chromatography solvent. water is a good solvent for compounds containing O, N or ionizable groups. Non polar molecules do not have distinct areas of + and - charge but they attract other non polar molecules as a solvent. Molecules that contain large numbers of carbon and hydrogen are non polar and serve as good solvents for isolating non polar molecules.
proton hydration
protons do not exist free in a solution. They form hydronium ions. this is a water molecule with a proton associated with one of the nonbonding electron pairs. The covenant and hydrogen bonds are interchangeable. This allows for an extremely fast mobility of protons in water via "proton hopping".
Variety in solvents and stationary phases
some solvents include mixtures of a number of different chemical solutions while many different stationary phases including paper, silica gel, alumina, and cellulose may be used.
Gel R-I instructions
spray with iodoplatinate only. the background will be orange. some spots will be white
When designing a BLAST search, there are three basic decisions we must make:
the BLAST program we want to use, the query sequence we want to annotate, and the database we want to search. In addition, there are several optional parameters (such as the Expect threshold and low complexity filters) we can change in order to modify the behavior of BLAST.
What is NCBI
the National Center for Biotechnology Information is a public database that house molecular biology info including sequences from thousands of different species from mammals to fungi.
Dissociation of an acid
the degree of dissociation for an acid is determined by the strength of the acid. For a strong acid, dissociation is complete. weak acids do not dissociate to a great extent. weak acids are not shown to be dissociated in the net ionic equation.
Q4. What is the relationship between the bases displayed when the arrow is pointed to the left versus when it is pointed to the right?
the left versus when it is pointed to the right? When the arrow is pointing right it shows the base pairs of one strand read in the 5" to 3" direction. When you change the arrow's orientation, it inverts the sequence to the corresponding base pairs located on the complementary strand.
What is pH (info)?
the measure of H+ concentration of a solution. • maintaining pH is important because changes in pH can drastically effect the internal electrostatic environment of an organism which can alter the weak bonds that maintain the structure of bio molecules - altered structure no function
Thin Layer Chromatography: choosing a stationary phase
the molecular forces determine how far the molecules will move within the chromatogram (capillary action). the stationary phase's relative different polarity from the solvent is important. we can see that the Rf values of the molecules being separated are relative to the specific characteristics of the solvents and stationary phases selected for a specific chromatography model.
acid-base reaction
the products commonly produced by an acid base reaction are a salt and water. The cation of the salt always comes from the base and the anion of the salt always comes from the acid.
What is a net ionic equation?
they are useful in that they show only those chemical species participating in a chemical reaction.
Diatomaceous earth column?
this is used to prepare biological samples such as plasma, serum, whole blood and urine for analysis. Its function is to remove residual acid compounds from liquids and is effectively used in the forensic analysis of drugs in urine.
Gel F instructions
viewed under long wave UV. Any purple spots are quinine. Also viewed under the short wave UV light. With this all drugs should be visible in the F gel.
Dissociation of Weak electrolytes: principle
weal electrolytes dissociate only partially in water. The extent of dissociation is determined by the acid dissociation constant Ka.
Thin Layer Chromatography: choosing a solvent
you want a solvent that gives an Rf value greater than 0 but less than 1 meaning that the molecules you are trying to separate is not sticking to the stationary phase and moves in the solvent allowing for separation. having a mixture with polar and non polar molecules allows for successful separation.
Conclusion of module:
•Genes provide the information to make proteins. This information is captured by transcribing the DNA to make RNA, and is carried on the mRNA in the form of three-base groups called codons. •Genes are composed of exons and introns. Exons are regions retained in the processed mRNA, and are represented by black blocks in the browser, while introns are the regions that are removed during the process of creating the final mRNA, and are represented by lines connecting the blocks. •The codon ATG in DNA (AUG in mRNA) specifies the amino acid M (Methionine) and is highlighted in green on the "Base Position" track of the Genome Browser. The first Methionine provides the starting signal for protein synthesis. •The codons TAA, TAG, and TGA in DNA (UAA, UAG, and UGA in mRNA) encode the stop codon (*) and are highlighted in red on the "Base Position" track of the Genome Browser. The stop codons provide the ending signal for protein synthesis. •Genes may be read either from left to right (top strand of the DNA), or from right to left (bottom strand of the DNA). Arrows on a gene indicate its directionality. •Each row in the "Base Position" track (set on full) corresponds to a different reading frame. Different coding exons for a transcript can be in different reading frames.