DADA2 ITS Package Tutorial

¡Supera tus tareas y exámenes ahora con Quizwiz!

OTU( operational taxonomic unit)

-a neutral way to refer to samples in a phylogeny -operational definition used to classify groups of closely related individuals; can also be defined as a collection of 16S rRNA sequences that have a certain percentage of sequence divergence

adonis2()

Analysis of variance using distance matrices — for partitioning distance matrices among sources of variation and fitting linear models (e.g., factors, polynomial regression) to distance matrices; uses a permutation test with pseudo-FF ratios.

cast()

Cast functions Cast a molten data frame into an array or data frame.

dir.create()

Create a directory in the current working directory

getUniques()

Get the uniques-vector from the input object - this function extracts the uniques-vector from several different data objects, including dada-class and derep-class objects, as well as data.frame objects that have both sequences and abundance columns. The return value is an integer vector named by sequence. and valued by abundance. if the input is already in uniques-vector format, the same vector will be returned.

getSequences()

Get vector of sequneces from input object - This function extracts the sequences from several different data objects, including dada-class and derep_class objects, as well as data frame objects that have both sequences and abundance columns. This function wraps the getUniques function, but return only the names (i.e. sequences). Can also be provided the file path to a fasta or fastq file, a taxonomy table, or a DNAStringSet object. Sequences are coerced to upper-case characters.

ITS region

Internal transcribed spacer) most widely sequenced DNA region in molecular ecology of fungi and has been recommended as the universal fungal barcode sequence

rarefy_even_depth

Resample an OTU table such that all samples have the same library size

colnames()

Row and column names - get or set the row or column names of a matrix-like object

labs()

Modify axis, legend, and plot labels

estimate_richness

Performs a number of standard alpha diversity estimates, and returns the results as a data.frame. alpha diversity- diversity on a local scale, describing the species diversity (richness) within a functional community

scale_colour_brewer

Sequential, diverging and qualitative colour scales from colorbrewer.org

tax_glom()

This method merges species that have the same taxonomy at a certain taxaonomic rank. Its approach is analogous to tip_glom, but uses categorical data instead of a tree. In principal, other categorical data known for all taxa could also be used in place of taxonomy, but for the moment, this must be stored in the taxonomyTable of the data. Also, columns/ranks to the right of the rank chosen to use for agglomeration will be replaced with NA, because they should be meaningless following agglomeration.

facet_wrap()

Wrap a 1d ribbon of panels into 2d

llumina Paired-end sequencing fastq files

a file that has the result from the sequencing of both ends of a fragment. This allows generation of high-quality, alignable sequence data. Pair end sequencing facilitates detection of genomic rearrangements and repetitive sequence elements, as well as gene fusions and novel transcripts

Amplicon sequence variant (ASV) table

a higher-resolution analogue of the traditional OUT table, which records the number of times each exact amplicon sequence variant was observed in each sample

Anova()

a statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables.

IdTaxa

algorithm can quickly and accurately classify nucleotide or amino acid sequences into a taxonomy of organisms or functions:

sapply

apply a function over a list-like or vector-like object - lapply returns a list of the same length as a x, each element of which is the result of applying FUN to the corresponding element of x. - sapply is a user-friendly version and wrapper of lapply

Amplicons(PCR product)

are DNA products of a polymerase chain reaction

rowSums()

calculates the totals for each row of a matrix Form Row and Column Sums and Means

rbind

combine objects by rows or columns - rbind and cbind take one or more objects and combine them by columns or rows, respectively

vegdist

computes dissimilarity indices that are useful for or popular with community ecologists

do.call()

constructs and executes a function call from a name or a function and a list of arguments to be passed to it

as.character()

converts a numeric object to a string data type or a character object. If the collection is passed to it as an object, it converts all the elements of the collection to a character or string type.

as.data.frame()

converts objects into a data frame

matrix

creates a matrix from the given set of values.

data.frame( )

creates data frames, tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R's modeling software.

maxN

default 0 After truncation, sequences with more than maxN will be discarded. Note that dada does not allow Ns.

aes()

often used within other graphing elements to specify the desired aesthetics

saveRDS and readRDS

provide the means to save a single R object to a connection (typically a file) and to restore the object

rc(sq)

reverse complement DNA sequences - this function reverse complements DNA sequences provided. This function is nothing more than a concisely-named convenience wrapper for reversecomplement that handles the character vector DNA sequences generated in the dada2 package

seq_along(along. with)

sequence generation - generate regular sequences.

reverseComplement

sequence reversing and complementing -use these functions for reversing sequences and/or complementing DNA or RNA sequences

Demultiplexed

split into individual per sample fastq files

vcountPattern

string searching functions - A set of functions for finding all the occurrences (aka "matches" or "hits") of a given pattern (typically short) in a (typically long) reference sequence or set of reference sequences (aka the subject)

merge_phyloseq

takes a comma-separated list of phyloseq objects as arguments, and returns the most-comprehensive single phyloseq object possible.

str.split()

Split and Elements of a Character Vector - Split the elements of a character vector x into substrings according to the matches to substring split within them

ddply()

Split data frame, apply function, and return results in a data frame.

cbind

Combine objects by rows or columns - rbind and cbind take one or more objects and combine them by columns or rows, respectively

cutadapt

finds and removes adapter sequences, primers, poly-A-tails and other types of unwanted sequences from high-thouroughput sequencing reads

ITS1 and ITS2

forward and reverse primers respectively.

Chimerism

in genetics, the presence of cells of different origin in an individual, whether by mutation, transplant, or some other process; named from the chimera, a hybrid monster depicted as an amalgam of a lion, goat, and serpent

theme()

is used to control non-data parts of the graph including

lm()

is used to fit linear models

DADA2

package infers exact amplicon sequence variants from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The end product is an ASV table, like "seqtab.nochim_ITS.rds", and assignment of taxonomy to the output sequences, like "taxid_ITS.rds"

Rcpp

package that provides R functions as well as a C++ classes which offer a seamless integration of R and C++

Requirements before beginning dada2

-Samples must be demultiplexed -Non-biological nucleotides have been removed, e.g. primers, adapters, linkers,etc -If pair-end sequencing data, the forward and reverse fastq files contain reads in matched order

geom_bar()

Bar charts

Shortread

Bioconductor package that is a class of short read. this provides a way to store and manipulate, in a coordinated fashion, uniform-length short reads and their identifiers.

assignTaxonomy

Classifies sequences against reference training dataset

aggregate()

Compute Summary Statistics of Data Subsets

DNAString()

DNAString objects - A DNAString object allows efficient storage and manipulation of a long DNA sequence

dim(x)

Dimension of an object - retrieve or set the dimension of an object

dada()

High resolution sample inference from amplicon data - The dada function takes as input dereplicated amplicon sequencing reads and returns the inferred composition of the sample (or samples). Put another way, dada removes all sequencing errors to reveal the members of the sequenced community.

psmelt()

Melt phyloseq data object into large data.frame

derepFastq()

Read in and dereplicate a fastq file - A custom interface to FastqStreamer for dereplicating amplicon sequences from fastq or compressed fastq files, while also controlling peak memory requirement to support large files

removaebimeradenovo()

Remove bimeras from collections of unique sequences - This function is a convenience interface for chimera removal. Two methods to identify chimeras are supported: Identification by consensus across samples. Sequence variants identified as bimeric are removed, and a bikers-free collection of unique sequences returned.

transform_sample_counts()

This function transforms the sample counts of a taxa abundance matrix according to a user-provided function. The counts of each sample will be transformed individually. No sample-sample interaction/comparison is possible by this method.

DECIPHER

Tools for curating, analyzing, and manipulating biological sequences - is a software toolset that can be used for deciphering and managing biological sequences efficiently using the R statistical programming language. The program is designed to be used with a non-destructive workflows for importing, maintaining, analyzing, manipulating, and exporting a massive amount of sequences

filterAndTrim

filter and trims an imput fastq file (can be compressed) based on several user-definable criteria, and outputs fast files (compressed by default) containing those trimmed reads which passed the filters. Corresponding forward and reverse fastq files can be provided as input, in which case filtering is performed on the forward and reverse reads independently, and both reads must pass for the read pair to be output.


Conjuntos de estudio relacionados

Ch. 5 Homeowners Policy - Random Questions 1 - MI P&C Licensing

View Set

Chapter 16 - Nursing Management During the Postpartum Period

View Set

Health Insurance Policy Provisions CH. 6

View Set

Chapter 2 - Population and Health

View Set

Chapter 32 ebook quiz, Module 7 Review (eBook Quizzes)

View Set

Reading 4: Introduction to the Global Investment Performance Standards (GIPS)

View Set

Chapter 7: Innovation and Entrepreneurship

View Set