Lecture 4/18: Genomics and personalized medicine

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Data analysis and work-flow for analyzing the RNA sequence results: An RNA seq workflow for ________ of differentially expressed genes:

identification

New sequencing = ____ the speed ( whole genome in days) + reduced cost

increased

Variants: SVs ( structural variants: large ______ and deletions)

insertions

Variants: Indels (small _______ or deletions) slightly _____ mutations than SNVs

insertions, bigger

All systems require a _____, that may involve ligation and custom adaptors

library

6. Amplifying is important: works best if you have ___ of copies of DNA, can filter out errors, get error free results

lots

16. Massive _____ sequencing: sequencing all groups of DNA that have been _____ multiple times:

parallel, amplified amplification to get accurate results and filter out errors that could occur during sequencing

They found that resistant tumors display

transcriptional signature where certain genes are upregulated Certain genes are differentially regulated

Statistical differences: usually every sequencing study is one at least in

triplicate check picture use of statistics programs, such as R, bioconductor

Some types of NGS sequencing: WGS: sequences the _____ genome

Whole Genome sequencing entire

3. To each tube individually add either ____, _____, _____, or _____ each fluorescently labeled

ddTTP, ddATP, ddGTP, or ddCTP

Gene sets can be

-Pathways (based on databases such as Reactome, KEGG, Wikipathways, Ingenuity Pathway Analysis (IPA)) -Genomic location -Transcription factor targets software is available for many of these tests

7. _ strand is washed off

1

Human Genome: 22 are autosomes, _ pair are sex chromosomes

1

Nucleotide in DNA: A:T, C:G Humans differ from each other by _% of the sequence

1

Technology behind the human genome project: The fragments were broken up further, into fragments about _______ long

1000 bp long

21. Reads obtained from the sequencing machine: These DNA sequences are not arranged in any particular order List of ___ sequences, not in _____ don't know what genes they refer to Possible to be done _____ reference _____- look for overlapping sequencing

150 nucleotide , order without, genome

Era of human genome: don't memorize dates

1866 Mendel's discovery of genes 1871 Nucleic acids discovered 1951 First protein sequence 1953 Structure of DNA 1960s Elucidation of genetic code 1977 Advent of DNA sequencing 1975-1979 First human gene is isolated 1986 DNA sequencing is automated 1992 EST sequencing (Expressed sequence tag) 1995 First whole genome sequenced (Haemophilus Influenzae) 1999 First human chromosome sequenced (22) 2000 Draft of human genome completed 2001 Publications of draft sequences of the human genome 2003 Completion of human genome sequence 2005 Elucidation of genes on X chromosome

RNA sequencing: compare gene expressions between _ conditions (normal cell vs. tumor cell; tumor cell v tumor cell being treated)

2

Basic structure of DNA: _ sugar phosphate backbones, _____ helix, base pairing

2, double

Basic structure of DNA: _ ______ base pairs in human genome

3 billion

The human genome project: project goals: determine the sequences of over ____________ that make up the human DNA

3 billion base pairs

Steps of Sanger sequencing method 1. take DNA sequence and divide into _ tubes

4

4. In total , >_____ genes were found to be differentially expressed between 2 cell lines

5000 included ER and PR genes and associated genes

Summary of some studies that have recently been carried out in cancer genomics, using both DNA and RNA sequencing: Anti PD1 therapy in metastatic melanoma

Anti PD-1 antibody provides clinical benefits for many melanoma patients, but other patients are resistant. —- not clear why

Tumors from responding patients are enriched in mutations in the DNA repair gene _______

BRCA2

13-15. ______ amplification

Bridge

Chip:

Chromatin Immunoprecipitation

4. Test for differential expression

Cuffdiff

5. Statistical analysis: R: the R project for statistical computing R packages

CummeRbund, visualization & analysis How to decide which is significant / which is not (too much info usually given) Statistical tests: -Interpretation of the results from an RNA seq experiment is complicated. -How do you decide if differences in expression are significant -How do you decide which genes are relevant to your system

10. Another round of ____ synthesis

DNA

Bowtie is fine for ___ sequencing

DNA

Fast qc to analyze the quality of

DNA or RNA sequencing results

6. DNA amplification occurs, in the presence of ____ _________

DNA polymerase

1. Assess data quantity and quality

FastQC

Summary of analysis methods we have discussed so far A workflow from reads to differentially expressed genes: 1. Assess data quantity and quality

FastQC

The sequence you get from the machine is in _____ format

Fastq

18. _____________ labeled nucleotides incorporated

Fluorescently

Gene sets:

Gene set enrichment analysis - of pathways Reactome - looks at different pathways, finds genes that fit into certain pathways Takes big set of genes and find ~10 to fit into a certain pathway

Gene ontology

Genes are categorized by function, or by association with a specific term Ex: all transcription factors, or all transcription factor targets, all protein kinases

3. ____ flow cell

Glass

differential expression analysis of RNA sequencing can tell you:

How many copies of RNA are in one cell type compared with another

This signature is referred to as

IPRES (innate anti-PD-1 resistance

Several platforms for NGS

Illumina MySeq, Illumina HiSeq, 454 Sequencer, SOLiD system, Ion Proton system

Technology behind the human genome project: Each fragment was inserted into a ________ ________ ________

bacterial artificial chromosome

Expression analysis: count the number of fragments overlapping with all ______ _____ of a gene

annotated exons

RNA-sequencing 1. Isolate ____ from cells 2. ______ transcribe to cDNA 3. _______ the DNA 4. Add ________ 5. Carry out sequencing 6. Illumina sequencing method 7. Get sequencing output

Look at quanitity mRNA reverse Fragment adapters

Chromosomes in condensed state during _ phase

M

2. Map reads to a reference genome ---> Tophat or Bowtie

Mapping reads to a reference genome This issue of introns and exons when sequencing RNA rather than DNA Introns will be sliced off of RNA , genomic DNA has introns + exons

4. the ddNTPS have no __ group and thus are considered _______ nucleotides (no further nucleotides will be incorporated once they are added)

OH, termination

MAPK targeted therapy has a similar signature

Possible future directions: Determine whether attenuating the biological processes that underlie IPRES, would improve the PD-1 response.

5. Statistical analysis-

R project- analyze differences, focus on only important differences

NGS can sequence both DNA and ____

RNA

RNA sequencing: comparing conditions with ____ sequencing

RNA

Types of NGS sequencing: RNA-Seq: sequencing of

RNA

Which sequencing is the best way to assess differential expression of genes?

RNA sequencing

Check picture

green area: good quality; indicates shorter sequence has better quality (160-169)

Human genome project used _____ sequencing method

Sanger time consuming and cost $1 per base, totaled 3.8 B

Human genome project: Started in 1990, the U.S. Human Genome Project resulted in competition and a race to finish the sequencing of

Started in 1990, the U.S. Human Genome Project is a multi-center effort coordinated by the U.S. Department of Energy and the National Institutes of Health A parallel project was carried out by Celera Corporation, a private company This resulted in competition and a race to finish the sequencing of the entire genome The project was originally planned to last 15 years, but rapid technological advances accelerated its completion in 2003, ahead of schedule.

T:F/Sequencing facilities are found all over the world

T

Variants: Chromosomal rearrangements

bigger mutations

2. Map reads to reference genome

Tophat (RNA) or Bowtie (DNA)

Software used in RNA sequence analysis

Tuxedo software suite

Fastq format: sequence _ quality

a software that assess sequence quality

8. Hybridization to the other primer (________), _ shaped form

adapter, U

2. Short sequences (______) are attached to the ___ of all small sequences

adapters, ends already know the sequence of adapters

Technology behind the human genome project: The bacteria are grown and _________

amplified

1. Fastq ( _____ sequence quality)

analysis

Fastqc

analysis of sequence quality get sequences back, make sure quality is good

17. Sequencing primer is ______, massively parallel sequence

annealed

Sanger Sequencing Method (-___ termination method)

chain, less efficient, costly, only used for smaller sequencing

Next generation sequencing is faster and _____

cheaper

Strict p value for cutoff, fold change cutoff:

check picture In this volcano plot every dot represents one gene on the list. The log fold change is plotted on the horizontal axis ; On the vertical axis is the p value after testing. Here a p value cutoff of 5% (red line) is assigned. A problem with this random cut-off assignment is that important genes might be missed.

groups are called

clusters

Types of NGS sequencing: WES: Sequences only the _____ regions of the genome

coding

R: The R project for statistical _______ R packages such as

computing CummeRbund, visualization and analysis

Sequences are amplified on a solid surface with _______ attached linkers that hybridize the _______ adapters, producing clusters of DNA

covalently, library

3. Transcript reconstruction and count the number of reads per gene

cufflinks

Technology behind the human genome project: In the human genome project the DNA was ___ into overlapping fragments of _____ bp long

cut, 150K

Steps of RNA seq workflow: 1. Assess ____ quantity and quality of reads

data

The human genome project: goals: develop tools for _____

data analysis

Sanger sequence relies on ______ nucleotides (termination nucleotides) get banding pattern, can determine _______ DNA sequence

dideoxy, original

breast cancer has _________ subcategories

different, ones + or -, know differences check picture

Cuffdiff

differential expression analysis

4. Test for differential expression --> Cuffdiff

differential expression analysis Gene ID, gene name, location Differential expression results for the 2 cell lines

How is RNA sequencing useful: Using RNA sequencing one can analyze _______ _____ _______ when comparing 2 different conditions

differential gene expressions

Gene set enrichment analysis

do genes fall into specific categories or sets

The human genome project: goals: address _____, ______, and _____ issues

ethical, legal, and social

4. Test for differential _____

expression

2. Bowtie

fast short-read alignment

20. Everything on _____ cell gets sequences Shows _____ color, pattern of fluorescently colored nucleotides Massive parallel sequencing

flow flurorescent

The human genome project: project goals: identify all ____ in the human DNA

genes

Upregulation genes:

mesenchymal transition, cell adhesion, ECM remodeling, angiogenesis, wound healing

Methyl-Seq: sequencing of ______ DNA

methylated

3. Once the alignment is done you will still need to assign gene _____

names

New sequencing methods are called ____ ________ _________ (NGS)

next generation sequencing

2. In each tube: DNA, a primer , all of the ________ (dTTP, dATP, dGTP, dCTP), DNA polymerase

nucleotides

Expression analysis: add up _ of fragments

number often expressed as FPKM (Fragments per kilobase per million fragments mapped)

This gel can separate sequences that differ by __ __ nucleotide

only one

9. Hybridization to ______ primer

other

5. DNA binds at the matching complementary ______

primer

4. ________ on the flow cell are _______ to the adapters added to the DNA fragments

primers complementary on the glass flow cell

Cufflinks: transcription reconstruction and _____: assign gene names and quantitate transcripts per gene

quanititation Use of annotation file 2 different conditions, compare amount of gene

3. Transcript _______ and count the number of reads per gene

reconstruction

2. map reads to a _____ genome

reference

TopHat eliminates the problem of spliced out sites in

reference genome

Technology behind the human genome project: the sequences were sequenced by the ______ method

sanger

Technology behind the human genome project: The cloned fragments were then _______ in labs around the world

sequenced

The all involve ______ machines that produce raw data at the end of the sequencing run

sequencing

NGS allows __________ of thousands to millions of ___ molecules simultaneously Ex: compare tumor cells to normal cells

sequencing, DNA

Clustering: group of genes or samples that contain

similar sequences or have similar expression profiles Groups are called clusters Many different clustering algorithms exist There are pros and cons to this method. There can be problems with assignment of the clusters, which is sometimes quite arbitrary

5. Run each tube on a gel which separates DNA by _____

size

Steps involved in NGS sequencing: 21 steps check picture 1. Break DNA into ____ pieces ( 50-150nts) - sequence ___ pieces unlike sanger method

small

What to look for in DNA sequencing: variants SNVs

small nucleotide variants small mutations

Tophat vs Bowtie: RNA sequencing- we want to use splice aware aligner Top hat is a _____ aware aligner best for

splice ,RNA seq

2. or TopHat

spliced short-read alignment

The Human genome project: project goals: ____ information in databases

store

Further analysis of RNA sequencing results: Gene by gene analysis

to determine which differentially expressed genes are most relevant This requires extensive literature research This can be very time consuming

3. Cufflinks

transcript reconstruction from alignments

3. Transcript reconstruction and count the number of reads per gene --> Cufflinks

transcript reconstruction from alignments and quantitation

Types of NGS sequencing: Chip-Seq: sequencing that identifies ______ _____ binding

transcription factor


Kaugnay na mga set ng pag-aaral

Conceptual Physics Final Exam Study Guide

View Set

AP Government: Shaw v. Reno Court Case

View Set

Inventory Methods: LIFO and FIFO

View Set

Abeka 9th Grade Grammar and Composition 3 Test 12

View Set

06 Law of Agency (6) Chapter 6: Terminating Agency

View Set

Part III Magnetic Resonance Imaging

View Set

APUSH chapters 15, 16, and 17 test reveiw

View Set