DNA Subway BIO 140

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

sequence viewer

-alllows you to iew your seqeunce in a d=format called a trace file or base calls (first row_

sequence trimmer

-alot of Ns at the begining and end of your dna sequence -trimming of bad calls helps clean up the dna seqeunce so it is composed of only accurate base pairs

why are visual representations imporatn

-analyzeing and interpreting our data -add depth to our analyses by showing how closely your sequence is related to other organims from oyour blast query based on difference in their dna barcodes

what are the 3 main stops on the blue line

-assembling sequences -add sequences -align sequences

query

-blasts search -same as a googl e search but searches dor dna sequence -identifies htits to the dna seqeunce by compaoring it to sequences maintained in a national daabse that contains millions of sequences collected from organisms all over the world

what is the goal of dna subway

-determine whether or not the DNA sequences are accurate -whether the dna sequences are similiar to those of other organisms -how similiar the dna sequences are to other organisms dna sequences -can we use the dna sequences to help deterine your specimens taxonomic idnetity

bottom row of sequence viewer

-files of colored hump -represent the color read out from the gell electrophoresis used to sequence your dna sample

mismatches

-how many pairs dont match up between your query sequence and each hit frim blast -gives a rough idea oof how closley the wo sequences match and therefore how closley related they are likely to be -more mismatches=more sequence differences=greater evolutionary distance=less closley related

how do you know if your sequence is related

-it will share a single node with those species -if they are identical to another sequence the two will emege directly from the vertical line of the node without any horizontal branching -if your sequence is distantly related to all of the species in your tree, your sequence will sit on a branch by itself

MUSCLE

-multiple sequence comparison by log expectatio -lines up all the dna seqeunces youve selcted in the previous step into a nicely organism column -allows you to more easily visualize the similiarites and difference betweeen you selected sequences -each row represent the dna sequence of an individual hit formyour blast search -the numbers above the alignemnt represent the base pair position in the sequence -top line repsernt consencus sequence

2nd row of sequence viewer

-series of purple bar -tell you the level of confidence you can have that each base call is correct -if the purple bars pass the horizontal blue line threshol that runs through them meaning you can trust the base call -if the purple bars are below the threshold line then you will be less confident that the base is called coorectly

a good sequence

-sufficently long (500 bp or more) -most of the 500 base pairs in the section of dna that you sequenced have been called accuraltye by the sequencer

pair builder

-the 2 strands come from the same part of the sequence just opposite ends so they contain the same information-2 reads at as a way to double check that the sequence information is correct -links the sequences together in the right frm

Phylip ML

-the branch tips are the DNA sequences of indiviudal taxa that you analyzed -two or ore brancehs are conncected to each other by a nod whch represents ththe commmon ancestor of those taxa -the lenght of each horizontal branch is a mesure of the evolutionary distance from the ancestral sequence at the node -taxa with short horizontal branches form a node are closely related -longer horizonatal branches are more distantly related

blasts

-uses an algorithm for comparing sequence infomration such as the amino acid sequences of differnt proteins or the nucleotides of dna seqeunces

when is a dna barcode completed

-when its trimmed, edited, consesensus dna seuences

can an unknown species be identified by a blstsearch

-yes

accession number

2nd column -unique identifier given to eahc sequence submitted to the database

how many substops are blocked before you start your project

3

colors of the nucleotides

A=green T=red C=blue G=black

how are columns matched or mismatch

at a single nucleotide position across all sequences -dashes are gaps in the sequence where nucleotdies in one sequence are not represetnted in other sequences

bit score

calculated using a formula tht takes inot account the lenght of the query sequence used and the number of mismatches between your query sequence and each hit from your blast search -the higher the bit score the better the alingment

PHYLIP NJ

consensus of 100 different computer simulations that attempt to optimize parsmony -the numbers on each node represent the % of iterations that create a split at that partiuclar node -the higher the number at each node the more lilkey the correct split -help identiy what taxonomic level you can identify your unknown

reference data

data that will let you compare your barcode seqeunce to barcodes of other more common species that are closley reltd to your organsims -helpful in creating an obvios outgroup for your phylogenetic analyses -help place your organism into a larger taxonomic context

assembling sequences

goal is to ensure that your dna sequence from your specime are good

polymorphisms

grey areas in muscle graph -locations hwere the base pairs are differnt from that of the consensus sequence -each color repserntes a specific nucleotide

alignment length

how many base pairs of your sequence were used to create an alignment where most base pairs matched -the loner the alignment length the betrer becasue a greater number of base pairs is being used to make the match beteen your sequence and your hits

details column

includes taxonomic information on the hit and a description of the sequence information that was used such as the nameo of the gene that was squenced

bioinformatics

interdisciplinary feild of science that combines computer science, statistiscs, math, and engineering to analyze and interpret complex biological data like dna sequences -

e value

likelihood that the best match blast made between your query sequence and your hit could occur by chance -the lower the e value the higher the probability that the hit is truly related to your sequence -a value of 0 means there is a 0% chznce that a particular atch is just by chance

N in base calls

means that the base at that point in your sequence could not be determined

outgroup

outgroup is least similiar taxa from your muscle alignment

consensus sequence

represents the best agreement between the forward and reverse sequeuncing reads which enable you to generate a longer dna barcoding sequence than a single forward or reverse read alone -dont trim your cosnensus seqeunce

visual representations at stop 3

sequence alignment and phylogenetic trees

sequence similairty

similiarity between all samples -the hgiher the similiarity value the more closley related these tow sequence are

add sequences

vlasts

dna subway

wwebsiste that has combined the tools of bioinformatics into a simple way for novices -using the blue line


Kaugnay na mga set ng pag-aaral

RN Mental Health Online Practice 2023 A

View Set

Alterations of Electrolyte Balance-Sherpath

View Set

Writing Equations from Word Problems

View Set

Chapter 23: Personality Disorders

View Set

JNCIA- Junos Cert prep Questions 1-50

View Set