Lecture 25 - The Human Genome Project

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Illumina HiSeq 2000

Highest output Fastest data rate Highest number of reads

The idea of a coordinated effort to sequence the human genome

first raised at a meeting at the University of California at Santa Cruz in 1985

BRACAnalysis® is a genetic test that confirms the presence of BRCA1 or BRCA2 gene mutation

BRCA mutations are responsible for the majority of hereditary breast and ovarian cancers. People with a mutation in either the BRCA1 or BRCA2 gene have risks of up to 87% for developing breast cancer and up to 44% for developing ovarian cancer by age 70. Mutation carriers previously diagnosed with cancer also have a significantly increased risk of developing a second primary cancer. Genetic testing, specifically the BRACAnalysis test, identifies patients who have these mutations. Genes occur naturally in every human, and in addition to moral questions raised, patenting them would constitute an obstacle to biomedical research worldwide. The discovery of their relevance to breast cancer was funded by the public. The company was selling its breast cancer diagnostic test for a price many described as "outrageous": $4000, the price of a whole genome sequencing (around 20,000 genes analyzed), when the test only looked at two genes. Many universities and hospitals were offering the test for a much lower cost, Myriad forced them to stop, leading to a starkly increased cost for patients

Bioinformatics

Because of the massively parallel nature of next gen sequencers, huge amounts of data are produced quickly requiring terabytes of storage Each run produces 1.5Tb of data

Illumina/Solexa

Bought by Illumina in 2007 ($615 million) Also sequencing by synthesis

Celera Genomics vs UCSC

Celera also promised to publish their findings, by releasing new data annually (the HGP released its new data daily), although, unlike the publicly funded project, they would not permit free redistribution or scientific use of the data. The publicly funded competitor UC Santa Cruz was compelled to publish the first draft of the human genome before Celera for this reason (Jim Kent). On July 7, 2000, the UCSC Genome Bioinformatics Group released a first working draft on the web. The scientific community downloaded one-half trillion bytes of information from the UCSC genome server in the first 24 hours of free and unrestricted access to the first ever assembled blueprint of our human species.

Views of public effort vs. Celera

Celera's view of International Consortium: -Unfair competition: IC delivering the same goods but with state funding. International Consortium's view of Celera: -Unfair competition: Celera delivering the same goods but can use IC data, while IC cannot use Celera data.

ENCODE

ENcyclopedia of CODing Elements

EST Projects

EST=Expressed Sequence Tag Short, single pass reads from bits of mRNA In practice random reads from cDNA libraries polyA primed/random primed Sometimes libraries are tissue specific

Advantages of Sanger sequencing

Each individual reaction is fairly cheap (~$1-$25) Each reaction sequences ~500bp very, VERY accurately. This is perfect for small applications that require high accuracy. Each reaction requires a single pair of primers spanning that ~500bp region. In practice this requires a lot of groundwork before any sequencing can be done.

Cost and funding

Estimated that it cost $3 billion over the 15 year project that was funded by the Department of Energy The first draft was announced in 2000 with the more complete version released in 2003 (2 years ahead of schedule)

Sanger Method

Fred Sanger, 1958-2013 -Was originally a protein chemist -Made his first mark in sequencing proteins -Made his second mark in sequencing RNA Sanger Sequencing: Partial copies of DNA fragments made with DNA polymerase Collection of DNA fragments that terminate with A,C,G or T using ddNTP Separate by gel electrophoresis Read DNA sequence

Maps

Genetic map -determined from recombination frequencies Physical map -based on physical distances -the physical location of a particular cloned sequence of DNA -BAC Shotgun Sequencing

What Goals Were Established for the Human Genome Project When it Began?

Identify all of the genes in human DNA. Determine the sequence of the 3 billion chemical nucleotide bases that make up human DNA. Store this information in data bases. Develop faster, more efficient sequencing technologies. Develop tools for data analysis. Address the ethical, legal, and social issues (ELSI) that arise from the project.

Celera Genomics

In 1998, Craig Venter founded Celera Genomics The $300M Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly funded project. Celera used a technique called whole genome shotgun sequencing, employing pairwise end sequencing. Celera initially announced that it would seek patent protection on "only 200-300 genes", but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100-300 targets. The firm eventually filed preliminary ("place-holder") patent applications on 6,500 whole or partial genes.

Celera Genomics - presidential orders

In March 2000, President Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology-heavy Nasdaq. The biotechnology sector lost about $50 billion in market capitalization in two days. But the public release of the data ensured its fair use and availability to all mankind. The competition proved to be very good for the project, spurring the public groups to modify their strategy in order to accelerate progress. UC Santa Cruz and Celera initially agreed to pool their data, but the agreement fell apart when Celera refused to deposit its data in the unrestricted public database GenBank. Celera had incorporated the public data into their genome, but forbade the public effort to use Celera data.

BAC Sequencing

It was far too expensive at that time to think of sequencing patients' whole genomes. The genome was broken into smaller pieces; approximately 150,000 base pairs in length. -These pieces were then ligated into a type of vector known as "bacterial artificial chromosomes", or BACs The vectors containing the genes can be inserted into bacteria where they are copied by the bacterial DNA replication machinery. Each of these pieces was then sequenced separately as a small "shotgun" project and then assembled.

Next Generation Sequencers

Next (or 3rd) generation sequencers came onto the scene in the early 2000's General characteristics include: -Amplification of genetic material by PCR -Ligation of amplified material to a solid surface -Sequence of the target genetic material is determined using Sequence-by-Synthesis (using labelled nucleotides or pyrosequencing for detection) or Sequence by ligation Sequencing done in a massively parallel fashion and sequence information is captured by a computer

23andMe

On November 22, 2013 the FDA ordered 23andMe to stop marketing its Saliva Collection Kit and Personal Genome Service (PGS) as 23andMe had not demonstrated that they have "analytically or clinically validated the PGS for its intended uses and the FDA is concerned about the public health consequences of inaccurate results from the PGS device

Roche / 454 : GS FLX - Parallel Sequencing

Owned by Roche ($115 million) Shipping machines since ~2006 Many publications (Neanderthal, James Watson re-sequencing) Sequencing by synthesis

ESTs pros and cons

Pros: -Represent the part of the genome (most) people care about -Does not require a sequenced genome -Find genes -Find SNPs -Find splice isoforms Cons: -Libraries are highly biased -Can be hard to know when two ESTs are derived from the same gene -(generally) high error rates

Public effort and Celera strategies

Public - BAC shotgun sequencing Celera - whole genome shotgun sequencing, employing pairwise end sequencing.

Single Clone Molecule Array

RANDOM ARRAY OF CLUSTERS ~1,000 molecules per ~1um cluster ~40M clusters per flowcell

Solexa Chemistry

Sequencing by synthesis: -Add four-color reversible terminators -Image fluorophore -Remove 3' block and fluorophore -Add next set of bases Takes 48-72 h/run plus 8h analysis

Two Different Groups Worked to Obtain the DNA Sequence of the Human Genome

The HGP is a multinational consortium established by government research agencies and funded publicly. Celera Genomics is a private company whose former CEO, J. Craig Venter, ran an independent sequencing project. Differences arose regarding who should receive the credit for this scientific milestone. June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome.

Your Genome is Published

The International Human Genome Sequencing Consortium published their results in Nature, 409 (6822): 860-921, 2001. "Initial Sequencing and Analysis of the Human Genome" Celera Genomics published their results in Science, Vol 291(5507): 1304-1351, 2001. "The Sequence of the Human Genome"

Capillary Sequencing

Trace files (dye signals) are analyzed and bases called to create chromatograms. Chromatograms from opposite strands are reconciled with software to create double-stranded sequence data.

What are some major concerns about sequencing?

Who will fund it? What impact will it have on biology? Who's DNA should be sequenced?

Sanger Method - Greater detail

in-vitro DNA synthesis using 'terminators', use of dideoxi- nucleotides that do not permit chain elongation after their integration DNA synthesis using deoxy- and dideoxynucleotides results in termination of synthesis at specific nucleotides Requires a primer, DNA polymerase, a template, a mixture of nucleotides, and detection system Incorporation of di-deoxynucleotides into growing strand terminates synthesis Synthesized strand sizes are determined for each di-deoxynucleotide by using gel *is more efficient to run everything in one lane


Kaugnay na mga set ng pag-aaral

Chapter 2: The Data of Macroeconomics

View Set

Network+ Guide to Networks (Chapter 2 Quiz Review)

View Set

Clinical Ethics - Final Study Guide

View Set

Chapter 3 smart book - acct 201B

View Set

BIOL 211 Full Exam 6 Study Questions

View Set

IB Biology 2016 - Topic 1 - Cells

View Set

Ch 43 Nursing Care of a Family when a Child has an Infectious Disorder

View Set

Chapter 20: Assessment of the Normal Newborn

View Set