BCHM421-R midterm

¡Supera tus tareas y exámenes ahora con Quizwiz!

Given the code and result below, what is the value for my_expression (replace BLANK)?

"-|, | \\(|\\)"

Which special characters must be double escaped to be detected correctly as a pattern? Don't guess! Test the patterns very carefully with R! Is the pattern detected when present? Is the pattern not detected when absent?

( . ?

aesthetic description: 1) color 2) shape 3) fill 4) size

1) Color of points, lines and outlines. 2) The point type or shape specified as a number. 3) Color for interior of bars, boxes and certain point types. 4) The size for points.

1) str_replace 2)str_replace_all 3) str_trim 4) str_to_title

1) Replace the first occurrence of a pattern with a replacement pattern. 2) Replace all occurrence of a pattern with a replacement pattern. 3) Remove leading and/or trailing whitespace from a character string. 4) Capitalize the first letter for each word in a string.

logical order for ggplot

1) ggplot 2)aes 3)geom_boxplot 4)ggsave

geom plot type: 1)geom_point 2)geom_bar 3)geom_col 4) geom_boxplot

1)x-y scatterplot 2)barplot for count or proportion data. 3)barplot for summary statistics of numeric data such as means 4)boxplot of numeric data

1. Average 2. Euclidean 3. Manhattan 4. Complete

1. Linkage 2. Distance 3. Distance 4. Linkage

Assign the correct step number to the following R expressions so that you would get a meaningful result and not an error.

1. my_data%>% 2. select(var1, var2)%>% 3. group_by(var1)%>% 4. summarize(mean_var2=mean(var2))%>% 5. arrange(desc(mean_var2))

my_list <- list(element01=1:4, element02=c("A", "B", "C"), element03=c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)) What value(s) result from the following expression? lapply(my_list, length)

4, 3, 6

Which of the following statements are TRUE about Gene Ontology Annotations?

A gene may have many annotations to the same GO Term, even with the same evidence code, but the source of the annotation and supporting data may differ. The Gene Ontology is hierarchical, and a single direct annotation to a specific GO Term can up-propagate to Parent Terms.

The proportion of True DEGs and TRUE non-DEGs relative to all genes.

Accuracy

which of the following are Aspects of the Gene Ontology

Cellular compartment Biological process molecular function

my_list <- list(element01=1:4, element02=c("A", "B", "C"), element03=c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)) What is the result of the following expression? my_list[[3]][c(3, 2)]

FALSE, TRUE

A GFF file should always include the column names in the file.

False

Data visualization is an advanced technique that is only useful after you have analyzed your data.

False

GFF files use a 1-based system for the start and end coordinates of the features that are listed. You can calculate the length for any feature using this formula: length = abs(start - end + 1)

False

Given the tibble, ar_exp, the following expression will return the result 5655.598. filter(ar_exp, Value == 0) %>% pull(Value) %>% max()

False

In an amino acid substitution matrix, e.g. BLOSUM 62, an amino acid that is conserved at the same position in an alignment is always given a score of zero.

False

Missing data values are not important. You can simply assign zero for these values.

False

The following expression, str_replace("string.", ".", ""), will return "string".

False

The functions str_replace and str_replace_all are interchangeable.

False

There is no reason to convert a character variable to a factor because R will do this automatically.

False

Tidy data is arranged so that variables are in rows with and observations are in columns.

False

To convert a genomic region or feature from a 0-based to a 1-based coordinate system, you must add 1 to both the start and end coordinates.

False

When you download protein sequences from NCBI, the sequence is always full length.

False

With list.files, use the argument recursive=TRUE to list hidden files ("dot" files).

False

You can use the function pull to extract the strand from a GRanges object.

False

You can use the function source to run Rmd files.

False

You have performed an RNA-seq experiment in human cells. To perform a Gene Ontology Enrichment analysis, you should use all human genes as your background gene list.

False

You need special software to open a FASTA file.

False

You should never scale your variables before you perform hierarchical clustering.

False

In the attributes column of a GFF file, tag=value data is separated by a colon, :.

False. By a ;

In a protein alignment, the gap opening penalty and gap extension penalty are always set to 10 and 1 respectively.

False. it is 8 and 4 respectively

How can you increase the power (the ability to detect DEGs) of an RNA-seq experiment. Select all that are likely to make a difference.

Increase your seqeuncing depth (get more reads). Increase the number of replicates

Given two tibbles, tibble 01 (200 rows x 4 columns with one variable named "GeneID") and tibble02 (100 rows x 3 columns with one variable named "Symbol"). In addition, these tibbles do not have any variables with the same name. Which of the following do you know to be correct for tibble03 given the fact that it is different than either starting tibble and not empty. tibble03 <- full_join(x=tibble01, y=tibble02, by=c("GeneID" = "Symbol"))

It will have at least 200 rows. It will have 6 columns. GeneID and Symbol have values in common.

Relative Path and Absolute Path

Look at quiz 2

What is the major problem with the following function? my_function <- function(x, y){my_result <- str_c(x, y, sep="_")}

No result is returned.

The proportion of True DEGs called vs all True DEGs

Power

The proportion of True DEGs called vs all DEGs called

Precision

The proportion of True non-DEGs called vs all True non-DEGs

Specificity

If you used col_types="cccc" with read_delim when you imported your data, which of the following do you know is correct?

The data set has four character variables.

Given the BSGenome object hs_bs, what is the result of the following expression? hs_bs[[2]][1:2]

The first two DNA bases of the second chromosome or sequence as a DNA String.

A boxplot shows which of the following for a variable? Choose all correct answers.

The median. Possible outliers. The range of the variable. The inter quartile range.

What is a potential problem with the following code? my_motif <- DNAString("GCCNNAAT") my_match <- matchPattern(pattern=my_motif, subject=ath_bs[["Chr4"]], fixed=FALSE)

The result will include matches that are simply runs of Ns.

Which of the following statements are correct about a GRanges object?

They contain variables that describe the position of genomic features. The coordinates are stored as IRanges not as integer vectors. The seqname and strand are Rle vectors.

A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects.

True

A tibble is suitable for rectangular data sets.

True

Alignment complexity and coverage are characteristics of good RNA-seq data.

True

An R function should always return a value.

True

Camel Case can be used for names in R.

True

Elements of an R list can be nearly any class of object including vectors, tibbles, and functions.

True

For factors, the allowed values are called levels.

True

For ggplot you can use + to build upon it

True

Hierarchical clustering is an unsupervised learning method that can uncover patterns in data.

True

If you get a "could not find function error", you might have forgotten to load a package.

True

If you use the BLOSUM62 to score a protein alignment, pairs of proteins with many conserved residues and few gaps will have a higher scores than proteins with frequent substitutions and more gaps.

True

If your current directory is ~/BCHM421/Data , then list.files("../..") will list all files in your home directory.

True

In some special cases (writing functions), you may need to use !! sym to convert a character string to a naked column name to use with dplyr functions such as select.

True

It is reasonable to eliminate genes that encode components of the ribosome from lists of tissue-specific genes because ribosomes are likely to be expressed in all tissues.

True

Rmd files are plain text files.

True

Some Gene Ontology annotations have no biological data to support them

True

The first step in data visualization is to prepare the data.

True

The following expressions will produce the same result. str_replace_all("My Bad Name", " ", "") str_remove_all("My Bad Name", " ")

True

The function aes is used to map variables from your data set to visualization aesthetics or properties.

True

The geom functions, e.g. geom_point, determine the plot type.

True

When you perform hierarchical clustering, the similarity between observations or samples in your data set is usually expressed as a distance.

True

With data sets organized as rectangular text files, a delimiter is the character that is used to distinguish columns as data.

True

With factors, you can specify the order of the levels to control how they will be plotted with ggplot2.

True

You can often vectorize a function by using lists

True

You can pull a variable from a tibble to create a vector.

True

You can use the "or" operator (vertical slash |) to create regular expressions.

True

You can use the function source to run plain R scripts (.R files).

True

You should use rename to fix any column names in your tibble so that you can easily manipulate your variables using naked column names.

True

Which of the following methods can you use to get files from a lab computer to Scholar?

Upload the files with RStudio Server on Scholar. Map your Scholar home directory to the lab computer. Use Remote Desktop of Firefox to download files directly to Scholar.

When you knit or render an Rmd file, the output document format is controlled by which of the following?

YAML header

Given the tibble named ar_exp, which pipe will produce a new variable named Max_Value in ar_exp that shows the maximum AR expression value for each cancer study, retaining all the original variables and rows?

ar_exp <- ar_exp %>% group_by(Study) %>% mutate(Max_Value = max(Value))

The distance between the centers of clusters

average (linkage method)

Good factors are...

character vectors that can be grouped

In a tibble, variables are stored as which of the following?

columns

The maximum distance between two clusters

complete (linkage method)

Which R function can you use to extract groups from the results of a hierarchical clustering? Choose the best answer.

cutree

which of the following can you use to extract elements from an R list?

double square brackets, [[]] and $

if-else statement

executes a segment of code when a given condition is true, and a different segment of code when it is false

Which of the following functions can be used to extract rows of data from a tibble?

filter, slice

Which function can be used to save a plot made with ggplot2 as a pdf file?

ggsave

GRange, lapply, GFF

https://purdue.brightspace.com/d2l/le/content/204901/viewContent/4986410/View

Clustering and correlation

https://purdue.brightspace.com/d2l/le/content/204901/viewContent/5791368/View

Protein Alignment

https://purdue.brightspace.com/d2l/le/content/204901/viewContent/5831734/View

Given the following sets: list1 <-c("AR", "NRIP1", "NCOR1", "NCOR2") list2 <- c("EP300", "PXN", "NCOA2", "AR", "NRIP1") Which function would return the genes that are common to both lists?

intersect

Blossom65

matrix that scores similarities in gene

Given a DNAStringSet named my_seqs with 100 sequences and the code below: my_af <- alphabetFrequency(my_seqs) How could you extract the A content of the 99th sequence?

my_af[99, "A"]

Which of the following are parts of a valid R function?

name arguments returned value or result body with expressions

For the RNAseq dataset, which pipe could you use to create the Status variable in the tibble rnaseq?

rnaseq <- rnaseq %>% mutate(Status = "unchanged") %>% mutate(Status = replace(Status, logFC > 0 & FDR <= 0.05, "up") %>% mutate(Status = replace(Status, logFC < 0 & FDR <= 0.05, "down")

Heatmap: clustering based on gene or sample

row=gene column=sample

Which function can you use to get the values from a run length encoded vector (Rle)?

runValue

Which functions could you use to convert a Views object to a GRanges object?

saaply GRanges Rle

Given a BSGenome object named hs_bs, which of the following functions would you use to get the seqnames, lengths and topology of the sequences?

seqinfo

The minimum distance between two clusters

single (linkage method)

Which function can you use to extract the start coordinates for a GRanges object?

start

Given two GRanges object with the same seqinfo, what expression would you used to extract the ranges that are in GRange1 that are entirely within GRange2?

subsetByOverlaps(GRange1, GRange2, type="within")


Conjuntos de estudio relacionados

Chapter 3 Collecting Objective data

View Set

physical activity and public health

View Set

Population health exam 3 (chapters: 45, 9, 35, 46, 47, 38, 39, 48)

View Set

Chapter 13: Essential Trace & Ultratrace Minerals

View Set