Genetics Ch 14

Ace your homework & exams now with Quizwiz!

Comparative genomics of chimpanzees and humans: - Chimps and humans had common ancestry ab 5-6 mil years ago and since then genetic differences have accumulated from mutations in each lineage → genome sequencing has shown there are ab .......... ranging in length from single nucleotide to more than 15kb = 90Mb of divergent DNA sequence (3% overall genome) → most insertions and deletions lie ...... - Proteins encoded by human and chimp genomes are very similar: 29% of all ........ are identical in sequence → Most proteins that differ do so by only ab 2 amino acid replacements and some differences in sets of functional genes → ab 80 genes functional in common ancestors are no longer functional in humans owing to their ...... - ............ contribute to genome divergence: over 170 genes in human genome and over 90 genes in chimp genome are present in large duplicated segments which are responsible for greater amount of the total genome divergence than all single-nucleotide mutations combines - All genetic differences between species originate as variations within species → sequencing of human genome and using faster/less expensive high-throughput sequencing allows for ......

- 35 mil single-nucleotide differences between chimps and humans and ab 5 mil insertions and deletions; outside of coding regions - orthologous proteins - deletion or accumulation of mutations → some of these changes contribute to differences in physiology - Duplications of chromosome segments in single lineage - detailed analysis of human genetic variation

Comparative genomics of nonpathogenic and pathogenic E.coli: - E.coli is found in the mouth and intestinal tract and is a ....., it was the 1st bacterial genome sequenced - 1982 a multistate outbreak of human disease was traced to consumption of undercooked ground beef → E.coli identifies as the cause → most people survive but some develop hemolytic uremia syndrome which is a ...... - Genome of E.coli strain has been sequenced to better understand the genetic bases of pathogenicity: - Among the 1387 genes specific to E.coli many of them are suspected to encode virulence factors including ........, and other activities that may confer the ability to survive in different hosts - Most of these genes were ...... before sequencing - Most new genes in E.coli strains are thought to have been introduced by ...... - Differences can come from .....

- benign symbiont - life-threatening kidney disease - there are similarities in many proteins between different stairs of E.coli but the genomes and proteomes are VERY different - toxins, cell-invasion proteins, adherence proteins, and secretion systems for toxins, and metabolic genes that may be required for nutrient transport, antibiotic resistance - unknown - horizontal transfer from genomes of viruses and bacteria - gene deletion

The Genomics Revolution: - After development of recombinant DNA technology, research labs undertook ....... and then only after 1st finding something interesting about that gene from a classic mutational analysis → the steps from classical genetic map of a locus to isolating DNA encoding a gene to determining its sequence were .....

- cloning and sequencing of one gene at a time - numerous and time consuming

Traditional WGS: - begins with construction of genomic libraries: ......→ the short DNA segments in libraries are ....... → accessory chromosomes carrying DNA inserts are called .....

- collections of these short segments of DNA representing the entire genome - inserted into one of a number of types of accessory chromosomes (nonessential elements such as plasmids, modified bacterial viruses, or artificial chromosomes) and are propagated in microbes (usually bacteria or yeast) - vectors

Studying the protein-DNA interactome using chromatin immunoprecipitation assay (ChIP): - Sequence-specific binding of proteins to DNA is important for ...... - Eukaryotes:

- correct gene expression - chromosomes are organized into chromatin where the fundamental unit (nucleosome) contains DNA wrapped around the histones, post-translational modification of histones decides which proteins bind and where → many technologies have been developed to allow researchers to isolate specific regions of chromatin so DNA and its associated proteins can be analyzed together

Alignment of cDNAs with their corresponding genomic sequence clearly delineates the ..... and reveals .... - In an assembled cDNA sequence, the ORF should be continuous from initiation codon through stop codon → cNDA sequences can assist in .... - Full-length cDNA evidence is taken as gold-standard proof that one has identified .....

- exons and introns and reveals the regions falling between the exons - identifying correct reading frame and the initiation and stop codons - the sequence of a transcription unit including its exons and location in the genome

Noncoding functional elements in the genome: - Only 3% human genome encodes ...... → more than 98% of genome ..... - Introns and 5' and 3' untranslated sequences are readily annotated by analysis of gene transcript while gene promoters are usually identified by ...... → other regulatory sequences such as enhancers are not identifiable by inspection of DNA sequences and other sequences that encode different kinds of RNA transcripts (miRNAs, siRNA, lncRNA) require ....

- exons of mRNA and fewer than half of these exon sequences encode protein sequences; DOES NOT encode proteins - promoters: their proximity transcription units and DNA sequences - RNA: require detection and annotation of their transcripts

Obtaining the Sequence of a Genome: - Geneticists use many kinds of maps to ..... - Highest-resolution map:

- explore genomes - is complete DNA sequence of the genome (complete sequence of nucleotides A,T,C,G of each double helix in the genome

To make genomic library: - first ....... → some enzymes cut DNA at many places and others at fewer → researcher can control whether DNA is cut into longer or shorter pieces → resulting fragments have ..... and each fragment is joined to ...... which also has been cut with restriction enzyme and has ends that are complementary to those of the genomic fragments → for entire genome to be represented multiple copies of the genomic DNA are ..... → thousands to millions of different ...... are made

- first use restriction enzymes to cleave DNA at specific sequences to cut up purified genome DNA - short single strands of DNA at both ends - DNA molecules of accessory chromosomes - cut into fragments - fragment-vector recombinant molecules

The Comparative Genomics of Humans with Other Species: - Comparative genomics: - Species evolved and traits change through changes in DNA sequence → genome contains record of ....... → comparisons among species genome can reveal events inquire to particular lineages that can continue to ......

- has the potential to reveal how species diverge and allow us to find info ab human genome from model organisms - evolutionary history of species; differences in physiology, behavior, or anatomy

Differences between genomes of mice and humans: - In genes of color vision (opsins) humans have one more paralog → - Old world primates (chimps, gorillas, etc) possess this gene but all nonprimate mammals lack it → additional opsin gene evolved in ancestry of ....... - Mouse genome has more functional copies of some genes that reflect its lifestyle:

- humans have trichromatic vision so we can perceive colors across entire spectrum of visible light but mice cannot - Old World primates (including humans) - mice have ab 1400 genes for olfaction and humans have much less

Two-hybrid system separates the 2 domains of the activator encoded by GAL4 making activation of a reporter gene impossible → each domain is connected to a different protein → if the 2 proteins interact they will ...... → the activator will become ..... - The GAL4 gene is divided between 2 plasmids so that one plasmid contains ........ and the other plasmid contains ...... → on one plasmid, a gene for one protein is spliced next to the DNA-binding domain and this fusion protein acts as "bait" → on the other plasmid, a genome for another protein is spliced next to the activated domain and this fusion protein is the "target" → both hybrid plasmid are then introduced into the same yeast cell (by mating haploid cells containing bait and target plasmids) → final step is to look for activation of transcription by a GAL4-regulated reporter gene construct which would be proof that bait and target bind to each other → two-hybrid system can be automated to make it possible to .......

- if the 2 proteins interact they will join the 2 domains together - activator will become active and start transcription of the reporter gene - one plasmid contains the part encoding the DNA-binding domain - the other plasmid contains the part encoding the activating domain - hunt for protein interactions throughout the proteome

Using the two-hybrid test to study protein-protein interactome: - there is a large number of proteins in any cell → there are ways to study all the ....... - Most Common way to study the interactome uses engineered system in yeast cells called ........ - two-hybrid test: - basis for the test is the ......

- interactions of individual proteins in a cell - two-hybrid test - detects physical interactions between 2 proteins - transcriptional activator encoded by the yeast GAL4 gene

The nature of the information content of DNA: DNA contains info but how is it encoded? - The info is thought of as the sum of all gene products, both protein and RNAs but the info content of genome is more complex than that → it also contains ....... → sequence and relative positions of those sites allow genes to ..... - Information in genome can be thought of as the sum of .....

- it also contains binding sites for different proteins and RNAs, many proteins bind to sites located in DNA and other proteins & RNAs bind to sites located in mRNA - allows genes to be transcribed, spliced, and translated properly at the right time and in the right tissue - all the sequences that encode proteins and RNAs plus the binding sites that cover the time and place of their actions

Higher-throughput sequencing techs make greater frequency of errors than older methods → error rate may range from ..... → to ensure accuracy genome projects do what? → many-fold coverage ensures that chance errors in the reads don't have a .......

- less than 1% to 10% - conventionally obtain many independent sequence reads of each base pair in a genome - false reconstruction of the consensus sequence

Of mice and humans: sequence of the mouse genome is informative for understanding human genome due to mouses long standing role at model genetic species, knowledge of its classical genetics, and the mouses evolutionary relationship to humans - Mouse and human lineages diverged ab 75 mil years ago = enough time for ...... → sequences common to mouse and human genomes are likely to indicate ...... - Homologs are identified bc they have similar DNA sequences → analysis of mouse genome shows that # of protein-coding genes that is contained is similar to that of human genome → mouse genes shows that at least 99% of all mouse genes have ........ & at least 99% of all human genes have ........ → the types of proteins encoded in each genome are the same → ab 80% of all mouse and human genes are ...... - Similarities between the genomes extends beyond protein-coding genes to the overall genome organization → over 90% of mouse & human genomes can be partitioned into corresponding regions of conserved synteny: ........ → synteny is helpful to relate the ....... → mouse and human genomes contain ........

- mutations to cause their genomes to differ at about one of every 2 nucleotides; common functions - some homolog in the human genome; some homolog in mouse genome - orthologs - order of genes within variously sized blocks is the same as their order in the most recent common ancestry of the 2 species - the maps of the 2 genomes - similar sets of genes often arranged in similar order

Resulting pool of recombinant DNA molecules is propagated by introducing the molecules into bacterial cells → each cell takes up ....... → resulting library of clones is called a ..... bc sequence reads are obtained from clones randomly selected from the whole-genome library without any info on where these clones map in the genome

- one recombinant molecules and is replicated in the normal growth and division of its host so many identical copies of the inserted fragment are produced for use in analyzing the fragments DNA sequence (each cell is a clone) - shotgun library

Comparative Genomics and Human Medicine: - Homo sapiens are from africa and migrated across world populating 5 additional continents → migration led to different climates, adopting different diets, & combating different pathogens in different parts of the world → recent evolutionary history of our species is recorded in ....... - Any 2 unrelated human genomes are ...... - Sequencing of first human genome → more rapid and low cost analysis of other individuals because using known genome as reference it is easier to ..... - Comparing individual human genomes: humans differ at one base in a thousand but also in the number of copies of parts of individual genes, entire genes, or sets of genes = - copy number variations (CNVs): - Between any 2 unrelated individuals there can be 1000+ segments of DNA greater than 500bp length that ..... - Some CNVs can be large and span over .....

- our genomes and so are the genetic differences that make individuals/populations more or less susceptible to disease - 99.9 percent identical and that 0.1 difference corresponds to ab 3 million bases - align the raw sequences of additional individuals and design approaches to studying and comparing parts of the genome - copy number variations (CNVs) - include repeats and duplications that increase copy number and deletions that reduce copy number - differ in copy number - 1 million base pairs

Next, the genome fragments in clones from shotgun library are partially sequenced → sequencing reaction starts from ..... → primers are used to ....... → after sequencing the output there is a large collection of random short sequences (some overlapping) → sequence reads are assembled into a consensus sequence covering the whole genome by ..... → Sequences of overlapping reads are assembled into .....

- primer of known sequence & primers are based on the sequence of adjacent vector DNA bc the cloned insert sequence is unknown - guide the sequencing reaction into the insert (short regions at one or both ends of the genomic inserts can be sequenced) - matching homologous sequences shared by reads from overlapping clones - units called sequence contigs (contiguous or touching sequences)

Deducing the protein-encoding genes from genomic sequence: - Proteins in a cell determine its form and physical properties so in genome analysis and annotation you have to determine an inventory of all the polypeptides encoded by an organism's genome = ...... - Sequence of each mRNA encoded by the genome must be deduced which is challenging in eukaryotes bc of ...... → alternatively processed mRNAs can encode polypeptides having much (not all) of their amino acid sequences in common but we can't identify 5' and 3; splice sites just from DNA sequence with a high degree of accuracy → we cannot be certain which sequences are introns → .....

- proteome - intron splicing; errors in finding the total polypeptide parts list in higher eukaryotes

Bioinformatics: meaning from genomic sequence: - Genomic sequence is a highly encrypted code containing .... - bioinformatics:

- raw info for building and operation of organisms - Study of information context of genomes

The exome and personalized genomics: - Advances in sequencing has ..... - Since many disease causing mutations occur in copied sequences, we sequence all the ....... instead of introns as well = - Exome sequencing: - Exome sequencing is about identifying de novo mutations in individuals, which are ...... → these mutations are responsible for many spontaneously appearing ....... - Exome sequencing can be used to identify .....

- reduced cost of sequencing individual genomes from ab $300 mil to $1 mil to $5000 - exons (exome); less costly - make a library of genomic DNA that is enriched for exon sequences → DNA prepared by 1) shearing genomic DNA into short, single-stranded pieces, 2) hybridizing the single-stranded pieces to biotin-labeled probes complementary to exonic regions and purifying the biotin-labeled duplexes, 3) amplifying the exon-rich duplexes, and 4) sequencing the exon-rich duplexes → 30-60 megabases of human genome is targets for sequencing instead of 3200 megabases of total sequence - de novo mutations: mutations not present in either parent; genetic diseases - genetic differences between individuals and to identify differences between normal and abnormal cells like cancer cells

What are the goals of sequencing a genome? - Want to produce a consensus sequence that is a true and accurate representation of the genome starting with one individual organism or standard strain from which the DNA was obtained → this sequence will be a ...... for the species → there are many differences in ....... → no one genome truly represents genome of entire species but it is a reference

- reference sequence - DNA sequence between different individuals within species and between maternally and paternally contributed genome within single diploid individual

The Structure of the Human Genome: - Large amount of human genome (45%) is ......... → human genome is composed much of genetic "hitchhikers" - Only small part of human genome encodes ....... (less than 3% encodes exons of mRNA), exons are small (150 bases) and introns are large (over 1,000-100,000 bases) - Transcripts are composed of ab 10 exons → introns can be spliced out of the same gene in locations that vary which generates ...... - cDNA and EST (expressed sequence tag) data: at least 60% of human protein-coding genes are likely to share 2+ splice variants →

- repetitive DNA composed of copies of transposable elements which may be descended from ancient transposable elements that are now immoble and have accumulated random mutations causing them to diverge in sequence for, the ancestral transposable elements - polypeptides - more diversity in mRNA and polypeptide sequence - number of distinct proteins encoded by the human genome is several-fold greater than the number of recognized genes

Conserved elements can be tested in same manner as transcriptional cis-acting regulatory elements examined in earlier chapters using ...... → place candidate regulatory regions adjacent to a promoters and reporter gene and introduce the reporter gene into host species → the reporter protein will be expressed in an expected location → expression pattern corresponds to part of the expression pattern of the native conserved element/gene → the expression pattern suggests ......

- reporter genes: encode proteins that are easy to detect and assay - expression pattern suggests that the conserved element is a regulatory region for the specific gene in each species

Predictions based on codon bias: - Triplet codon for amino acids is degenerate bc most amino acids are encoded by 2 or more codons → multiple codons for a single amino acid are called ...... - Not all synonymous codons for an amino acid are used with equal frequencies for different species → - Codon biases are thought to be due to the relative abundance of the tRNAs complementary to these various codons in a given species → if codon usage of predicted ORF matches that specie's known pattern of codon usage, then .....

- synonymous codons - certain codons are present much more frequently in mRNAs & in the DNA that encodes them - this match is supporting evidence that the proposed ORF is genuine

Comparing human, rat, and mouse genomes → ultraconserved elements: ........ → found that over 5000 sequences of more than 100bp & 481 sequences of more than 200bp are absolutely conserved → many of these elements are found in ...... but they are most richly concentrated near regulatory genes important for development → majority of highly conserved noncoding elements take part in regulating .....

- ultra conserved elements: sequences that are perfectly conserved among the 3 species - gene-poor regions - regulating expression of genetic toolkit for development of mammals and vertebrates

Using DNA microarrays to study to transcriptome: - Active genes are transcribed into RNA so the set of RNA transcript present in the cell can tell us ...... - DNA chips: ......... → set of DNAs displayed is called a ...... - Typical type of microarray contains .....

- what genes are active → using DNA chips to assay RNA transcripts - DNA chips: samples of DNA laid out as a series of microscopic spots bound to a glass "chip" the size of microscope cover slip - DNAs displayed is called a microarray - short synthetic oligonucleotides representing most or all the genes in a genome

traditional WGS

-Cut up DNA and make genomic libraries -Use restoration enzymes -Clone in vectors (In Vivo) -Both ends are cloned

Protein has 2 domains: - domains must be ...... for transcriptional activation to occur

1) a DNA-binding domain that binds to the transcrtiptional start site (Gal4 BD) 2) an activation domain that will activate transcript but cannot itself bind to DNA (Gal4 AD) - close to each other

Turning Sequence reads into an assembled sequence: 1) break DNA molecules of a genome into ..... 2) read the sequence of ...... 3) computationally find the ..... 4) continue overlapping every large piece until ...... → sequence of genome is assembled

1) thousands to millions or random, overlapping small segments 2) each small segment 3) overlap among the small segments where their sequences are identical 4) all the small segments are linked

steps of ChIP (chromatin immunoprecipitation):

1. cross-link proteins to DNA 2. break the chromatin into small pieces 3. add antibody to target protein and purify 4. reverse cross-links to separate DNA and protein

paired-end reads can be produced by circularization: steps

1. prepare circularization-ready fragments. Genomic DNA is sheared into 20kb, 8kb, or 3kb fragments and circularization adapters containing linker sequences are added to the end of each segment 2. DNA is circularized 3. the circularized DNA is fragmented and fragments containing linker sequences are isolated. Additional adapters (A and B) are added to both ends to help amplification and sequencing 4. the resulting library contains paired-end reads with the two end tags. The paired-end reads average 150kb and are separated by 20kb, 8kb, and 3kb

pyrosequencing reactions:

1. single DNA strands are immobilized on individual beads 2. these molecules are amplified by PCR 3. each bead is deposited into tiny wells

Individual sequencing reactions provide letter string that range from ...... → tiny lengths are compared with the DNA of ....

100-5000 bases long a single chromosome

Example using ChIP: isolate a gene from yeast and suspect it encodes a protein that binds to DNA when yeast is grown at high temps and you want to know whether this protein binds to DNA and if so, which yeast sequence? - 1st: treat the yeast cells that have been growing at high temps with chemical that will ...... - 2nd: break the chromatin into small pieces, to separate the fragment containing your protein-DNA complex from others you need to use an ....... → add the antibody to the mixture so it forms an ........ that can be purified → DNA bound in the immune complex can be analyzed after cross-linking is reversed → - Regulatory proteins often activate transcription of many genes simultaneously by .....

1st: cross-link proteins to the DNA so that proteins bound to the DNA at the time of chromatin isolation will remain bound through subsequent treatments - antibody that reacts with the encoded protein; immune complex; DNA bound by the protein may be sequenced directly or amplified into many copies by PCR to prepare for DNA sequencing - binding to several promoter regions

Pair-wise comparison between an organism and other species does not distinguish between their differences so you must first ........ → 2nd: ........ how to decide between the alternatives?????? - When studying infrequent events evolutionary biologists rely on parsimony: ....... → the preferred explanation for the pattern of a traits evolution in a species is that ......

1st: infer whether a trait was likely to be present in the last common ancestor of any of the organisms being studied 2nd: consider the relationship of the organism to the species - favor the simplest explanation involving the smallest number of evolutionary changes - some protein and corresponding gene were present in some organisms ancestor and were retained in the organism and lost from other organisms of the species

Either method involves complementary DNA sequences that are extremely valuable in 2 ways:

1st: they are direct evidence that given segment of genome is expressed and can encode a gene 2nd: cDNA is complementary to the mature mRNA so the introns of the primary transcript have been removed which greatly facilitates the identification of the exons and introns of a gene

Next-generation whole-genome shotgun sequencing: - 3 strategies that increased throughput: 1. DNA molecules are prepared for ..... 2. Millions of individual DNA fragments are ..... 3. Advanced fluid-handling technologies, cameras, and software make it possible to ......

Goal is to obtain large number of overlapping sequence reads that can be assembled into contigs 1. sequencing in cell-free reactions, without cloning in microbial hosts 2. isolated and sequenced in parallel during each machine run 3. detect the products of sequencing reactions in extremely small reaction volumes

CNVs (Copy Number Variations)

Identify differences in structure of genes; can be additions or deletions in DNA within genes

contigs

Overlapping DNA segments that share the same sequence.

Whole-genome-sequence assembly: - Genome of bacteria are easy to assemble: bacterial DNA is single-copy DNA with no repeating sequences → any DNA sequence read from bacterial genome will come from ...... → contigs in bacterial genomes can be assembled into ..... - Eukaryotes genome assembly is difficult: sequencing read of repetitive DNA fits into many places in the draft of the genome → a tandem repetitive sequence is longer than the length of a maximum sequence read so there is no way to ...... → dispersed repetitive elements can cause reads from different chromosomes or different parts of the same chromosome to ....

Process of assembling contigs into entire genome sequence - one unique place in that genome - larger contigs representing most or all the genome sequence in a straightforward manner - bridge the gap between adjacent unique sequences - to be mistakenly aligned together

propagate

Reproduce, spread, increase

Genome projects - these projects were used to clone an entire genome sequence would make the clones and sequences ..... - Researchers become interested in a gene of a species whose genome has been sequenced and they only needed to find out where the gene is located on the map of the genome to zero in on its sequence and function → .....

Research and technology development efforts aimed at mapping and sequencing some or all of the genome of human beings and other organisms - publicly available resources - genes could be characterized quicker than by cloning and sequencing it from scratch

Annotation (Gene Annotation, Genome Annotation)

The process of attaching biological functions to DNA sequences - Genome annotation is the process of identifying the location of genes and other functional sequences within the genome sequence; gene annotation defines the biochemical, cellular, and biological function of each gene product the genome encodes - identification of all the functional elements of the genome

"'Omics": includes:

The transcriptome, the proteome, and the interactome

Forward genetics:

Traditional approach to the study of gene function that begins with a phenotype (a mutant organism) and proceeds to a gene that encodes the phenotype. screening mutants and identifying gene and function of DNA, RNA, and protein sequences from the phenotype

Useful predictions is possible even without a ..... - A binding-site-prediction program can propose .....

a cDNA sequence or evidence of protein similarities - a hypothetical ORF and proper codon bias would be supporting evidence

Reverse genetics:

a form of genetic analysis that manipulates DNA to disrupt or affect the product of a gene to analyze the gene's function

Whole-genome sequencing: - Whole-genome shotgun sequencing: - 2 approaches to this are responsible for most genome sequences obtained, The differences between the approaches are in how ......

a process that determines the nucleotide sequence of an entire genome - based on determining the sequence of many segments of genomic DNA that have been generated by breaking the long chromosomes of DNA into many short segments - how the short segments of DNA are contained and prepared for sequencing and the sequencing chemistry used

open reading frames (ORFs): - ORF detection: - main approach to producing polypeptide list using computational analysis of the genome sequence to predict mRNA and polypeptide sequences → these sequences would be gene-size and composed of sense codons after possible introns had been removed → the appropriate 5' and 3' end sequences would be present (start and stop codons) → sequences with these characteristics typical of genes are called ....... → computer programs scan DNA sequence on both strands in each reading frame to find .....

a sequence of DNA or RNA that could be translated to give a polypeptide - main approach to producing polypeptide list using computational analysis of the genome sequence to predict mRNA and polypeptide sequences - open reading frames (ORFs); candidate ORFs (there are 3 possible reading frames per strand = 6 possible reading frames in all)

Exome:

all exons in genome

The transcriptome:

all the RNA molecules transcribed from a genome sequence and expression patterns of all RNA transcripts (which kinds, where in tissues, when, and how much)

The proteome:

an organism's complete set of proteins sequence and expression patterns of all proteins (where, when, and how much)

Bioinformatics:

analysis of info content of entire genomes which includes the numbers and types of genes and gene products as well as location, number, and types of binding sites on DNA and RNA that allow functional products to be produced at the correct time and place

functional genetics:

approach to the study of function, expression, and interaction of gene products

DNA microarrays have powered molecular genetics by allowing the .......... HOW DOES THIS WORK??? - Microarrays are exposed to cDNA probes (ex: one set of probes used as a control and one set of probes representing a specific condition, set used as a control might be made from total set of RNA molecules extracted from particular cell type grown under typical conditions, second set of probes might be made from RNA extracted from cells grown under experimental condition) → fluorescent labels are attached to the probes and the probes are hybridized to the microarray → relative binding of probe molecules to the microarray is monitored automatically using ........ → which genes are identified from this? → the sets of genes that may respond to similar regulatory inputs can be identified → gene-expression profiled can show the differences between ....... → by identifying whose expression is altered by mutations, in cancer cells or by pathogen, we can make ......

assay of RNA transcripts for all genes simultaneously in a single experiment - laser-beam-illuminated microscope - genes whose levels of expression are increased or decreased under given experimental condition are identified and genes that are active in given cell type of at given stage of development are identified - normal and diseased cells - new therapeutic strategies

3 major aspects of genomic analysis:

bioinformatics, comparative genomics, and functional genomics

Direct evidence from cDNA sequences: identifying ORFs and exons through analysis of mRNA expression which can be done in 2 ways (both methods include .....) - Longest Established method: - NGS (next generation sequencing) technologies allow for .....

both methods include synthesis of libraries of DNA molecules that are complementary to mRNA sequences called cDNA:open reading frame (ORF) - cloning and amplification of cDNA molecules in a vector - direct sequencing of short cDNA molecules without cloning = RNA sequencing

Problem with genome project is sequence assembly: - consensus sequence:

building up all of the individual reads into a consensus sequence sequence for which there is consensus (agreement) that it is an authentic representation of the sequence for each of the DNA molecules in that genome

Annotation of human genome processed as sequences of each chromosome were finished one by one → sequences became searching ground for ...... → these predictions are the best inferences of .....

candidate genes protein-coding genes present in the sequences species

next-generation WGS

cell free. millions of fragments isolated and seq parallel. advanced fluid tech. high throughput.

2nd method: next-generation WGS

cell-free method that uses new techniques for sequencing and are designed for very high throughput (referring to number of reads per machine per unit time)

The interactome:

complete set of physical interactions between proteins and DNA segments, between proteins and RNA segments, and between proteins

Comparative genomics:

considers genomes of closely and distantly related species for evolutionary insight

Illumina system:

detected the incorporation of individual fluorescently labeled dNTPs → produces larger number of shorter reads than the 454 system

Pacific Bioscience process:

detects bases being incorporated into a single, immobilized DNA molecule → provides advantage of much longer individual reads than any other system but with higher error rate

Reverse Genetics: - To establish the function of a gene its best to ..... - Start from available gene sequences and use methods to disrupt function of specific gene = reverse genetics: status with known molecules (DNA sequence, mRNA, or a protein) and attempts to disrupt this molecule to assess ......

discovering gene function from a genetic sequence - disrupt its function and to understand the phenotypes in native conditions - assess the role of the normal gene product in the biology of the organism

Genomics shows how genomes and organisms have .....

diverged and adapted over time

Genome sequences can range from draft quality (......) to finished quality (.......) ,or truly complete (........)

draft quality: general outline with some errors finished quality: very low rate of typographical errors, some missing sections but everything that is currently possible has been done to gill in these sections truly complete: no typographical errors, every base pair is absolutely correct from telomere to telomere

Whole-genome shotgun sequencing is good at producing ..... - Ex: genome of fruit fly initially sequenced by traditional WGS method: sequencing libraries of genomic clones of different sizes → sequence reads obtained from both ends of genomic-clone inserts and aligned by a logic identical to that used for bacterial WGS sequencing --> Sequence overlaps were identified and clones were paced in order producing sequence ....... → but the contigs ran into a repetitive DNA segment that prevented unambiguous assembly of the contigs into a whole genome → the sequence contigs has an average size of ab 150kb - how do you glue these sequence contigs together in the correct order? ------>

draft-quality sequences of complex genomes with many repetitive sequences - contigs (consensus sequences for these single-copy stretched of the genome) - make use of the pairs of sequence reads from opposite ends of the genomic inserts in the same clone called paired-end reads → find paired-end reads that spanned the gaps between 2 sequence contigs (if one end of an insert was part of one contig and the other end was part of a second contig then this insert must span the gap between 2 contigs and the 2 contigs were near each other) → size of each clone was known (from library containing genomic inserts of uniform size) so the distance between the end reads was known → aligning the sequences of the 2 contigs using paired-end reads determines the relative orientation of the 2 contigs → single-copy contigs can be joined together with gaps where the repetitive elements reside → These gaps collections of joined together sequence contigs are called scaffolds (supercontigs) → produced correctly assembled draft sequence of the single-copy DNA

next generation sequencing

entire genomes sequenced using multiple parallel reactions to analyze short segments of DNA and compare the results to known sequences group of automated techniques used for rapid DNA sequencing

History of gene families reveals a lot about ......

evolutionary history

BLAST searches have the goal of ..... what is BLAST search?

finding out more about some identifies sequence of interest comparison of sequence data to all known sequences

Orthologs

genes in different species that evolved from a common ancestral gene by speciation homologous genes at the same genetic locus in different species which would have been inherited from common ancestor

The logic of obtaining a genome sequence

genome cut many genome copies into random fragments sequence each fragment overlap sequence reads overlap contains for complete sequence

Encyclopedia of DNA Elements (ENCODE) project: - Detect sequences potentially involved in the control of gene transcript and all transcribed regions → these sequences are expected to be active in ...... so researchers studied 147 human cell types and searched for regions that were associated within the binding of transcription factors → ENCODE project estimated that ...... - Evolutionary conservation of sequences is a good indicator of biological function, sequences will not be preserved over evolutionary time unless ........: locate potentially functional noncoding elements by looking for ..... - Can search for highly conserved sequences of ...... or for less perfectly conserved sequences of ......

goal to identify all functional elements within human genome - expected to be active in only individual or subsets of cell types - there are ab 500,000 potential enhancers associated with known genes and detected transcripts emanating from 80% of human genome - mutations that alter then are weeded out by natural selection - conserved sequences which have not changed much over millions of years of evolution - modest length among few species; greater length among larger number of species

Paralogs

homologous genes in the same species Some homologs belong to families that have been expanded over evolution and have genes at different genetic loci in the same organism, they arose when genes within genome were duplicated, genes related by gene-duplication events in a genome

Variation of the ChIP procedure called ChIP-chip is used to .... - proteins that bind to many genomic regions are ....... and then after cross-linking is reversed, the DNA fragments are labeled and used to ......

identify multiple binding sites in a sequenced genome - immunoprecipitated; probe microarray chips that contain the entire genomic sequence of the species being studied

X-linked inhibitor of apoptosis: changes one amino acid at position 203 of the protein (an amino acid that was invariant among mammals, fish, and even fruit flies counterparts of the XIAP gene (nicholas had genome sequenced to see what was causing such horrible conditions) → XIAP gene has role in ...... and mutations in the gene are associated with very rare but fatal immune disorders → ........ - This shows .....

inflammatory response doctors gave the boy an infusion of umbilical cord blood to boost immune system → he got better! - advances in technology and the impact of genomics

- Number of genes in human genome is hard to determine: initial draft of human genome stated there were ...... → but the complex architecture of these genes and the genome make annotation difficult and there are more than 19,000 pseudogenes: ......... → processes pseudogenes: DNA sequences that have been reverse-transcribed from RNA and randomly inserted into the genome (90% human pseudogenes are this type) and ab 900 pseudogenes are ....... → recent estimate of genes in human genome is .....

initial draft: 30,000-40,000 protein-coding genes - pseudogenes: ORFs or partial ORFs that may at first appear to be genes but are either nonfunctional or inactive due to their origin or mutations - 900 pseudogenes are conventional genes that have acquired one or more ORF-disrupting mutations over evolution - recent estimate: 21,000 protein-coding genes

1 approach to reverse genetics:

introduce random mutations into the genome and then focus on the gene of interstate by molecular identification of mutations inthe gene

Large-scale sequencing projects that analyze ...... → choose method based on balancing speed, cost, and accuracy is important

large individual genomes or genomes of many different individuals/species

Process requires automation: automated sequencing is the current state of the art in DNA sequencing technology which employs variety of chemistries and optical-detection methods → methods now available vary in .....

length of DNA sequence obtained, bases determined per second, and raw accuracy

Noncoding functional elements of the genome are much more difficult to identify than ....... and require .....

more difficult to identify than coding sequences and require a combination of comparative & experimental evidence to validate

Using polypeptide and DNA similarity: genes will likely have relatives among the genes isolated and sequenced in .... - Candidate gene predicted by preceding techniques can be verified by comparing them with all the other gene sequences that have ever been found → candidate sequence is submitted as a "query sequence" to ....... → the sequence can be submitted as a nucleotide sequence = ....... or as a translate amino acid sequence = ....... → computer scans database and returns list of full or partial "hits" starting with the closets matched → if candidate sequence closely resembles genes previously identified from another organism then this shows a strong indication that ......

other organisms especially is closely related organisms - public databases containing a record of all known gene sequences called BLAST search (basic local alignment search tool) - nucleotide sequence = BLASTn search - translate amino acid sequence = BLASTp search - the candidate gene is a real gene → less-close matches are still useful

Human genome project: - what has helped the identification of disease-causing genes? - what has helped understand disease processes and find new therapies for those diseases? - Individual machines can produce .... - Genomics has encouraged researchers to develop ways of experimenting on .... - Genomes also showed the value of collecting large-scale data sets in advance so ......

project whose goal is to map, sequence, and identify all of the genes in the human genome - availability of human genome sequences and ability to sequence genomes of patients and their relatives - the ability to determine gene sequences in normal and diseased tissues - sequences very quickly of genomes - a genome as a whole rather than one gene at a time - they can be used later to address specific research problems

oligonucleotides

segments of nucleic acid that are 50 nucleotides or less in length

Microarray

shows which genes are being actively transcribed in a sample from a cell

Predictions of binding sites: gene consists of segment of DNA that encodes a transcript and regulatory signals that determine when, where, and how much of that transcript is made → the transcript in turn has ..... - There are "gene-finding" computer programs that search for ......... → these predictions are based on consensus motifs but are not perfect

signals to determine its splicing into mRNA & translation of that mRNA into a polypeptide - predicted sequences of various binding sites used for promoters, for transcription start sites, for 3' and 5' splice sites, and for translation initiation codons within genomic DNA

3 stages of first next-generation system: Stage 1: ........ is constructed Stage 2: the DNA molecules in the template library are amplified into many copies, not by growing colonies as for traditional genomic libraries but by using ...... - 1st single molecules are immobilized on individual beads → molecules are then amplified by PCR such that single-stranded DNA molecules remain attached to the beads → Each bead contains many identical DNA fragments and then is ...... Stage 3: the sequencing of each bead is performed using "sequencing-by-synthesis" chemistry called ...... → DNA polymerase and a primer are added to the wells to prime the synthesis of a complementary DNA strand → each of the 4 deoxyribonucleotides dATP, dGTP, dTTP, dCTP are made to .......→ when a nucleotide is added that is complementary to the next base in template strand in a given well it is incorporated and the reaction releases a ...... → 2 enzymes ...... convert the pyrophosphate signal to a visible-light signal → the light is detected by special camera → growing DNA strands that have A as first base after the primer will yield signal only when dATP is made to flow through the well → reaction is repeated for at least 100 cycles & the signals from each well over all the cycles are .......

stage 1: A DNA template library of single-stranded DNA molecules stage 2: the polymerase chain reaction (PCR) - deposited individually into wells of very small volume in a device that hosts the sequencing reactions Stage 3: pyrosequencing - flow through all the wells one at a time in specific order - pyrophosphate molecule - sulfurylase and luciferase - integrated to generate the sequence reads from each well

DNA sequence of genome will help understand the .....

structure, function, and evolution of the genome and its components

genomics

study of genomes in their entirety

paired-end reads

the DNA sequences corresponding to both ends of a genomic DNA insert in a recombinant clone

Functional genomics:

the study of the relationship between genes and their function use of an expanding variety of methods including reverse genetics to understand gene and protein function in biological processes

ChIP (chromatin immunoprecipitation)

the use of antibodies to isolate specific regions of chromatin and to identify the regions of DNA to which regulatory proteins are bound

automated sequencing

the use of fluorescently labeled dideoxyribonucleotides and a fluorescence detector to sequence DNA

Next-generation WGS does not circumvent the problem of repetitive sequences and gaps but this method ^^^^ could so next-generation WGS had to find a way to bridge these gaps without building genomic libraries in vectors → how???? - Some gaps usually remain in traditional & next-generation WGS and procedures to fill the missing data in sequences has to be done → if gaps are short missing fragments can be generated using ...... → if gaps are longer you can try to .....

they built a library of circularized genomic DNA fragments of desired sizes, circularization allows for short segments of previously distant sequences located at the ends of each fragment to be juxtaposed on either side of a linker sequence → shearing of these circular molecules and amplification and sequencing of linker-containing fragments produced paired-end reads equivalent to those obtained from sequencing traditional genomic-library inserts - short gaps: known sequences at the ends of the assemblies as primers to amplify and analyze the genomic sequence in between - long gaps: isolate the missing sequences as parts of larger inserts that have been cloned into a vector and then sequence the inserts

1st method: traditional WGS

used sequence of the first human genome - relies on cloning of DNA in microbial cells and uses the Sanger dideoxy sequencing technique

Phylogenetic Inference: - 1st: decide which species to compare by knowing the ..... - Phylogeny: - 2nd: identify the most closely related genes called ......, genes that are homologs can be recognized by ......

uses phylogenies to understand evolutionary history and processes 1st: evolutionary relationship among species - evolutionary history of a group useful for inferring how species genomes have changed over time - homologs; similarities in their DNA sequences and in the amino acid sequence of the proteins they encode


Related study sets

Introduction to Nutrition Chapter 8

View Set

6 What are the characteristics of lymphatic capillaries?

View Set