BIOL 359 Final Quiz

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Regularization imposes a penalty on the complexity of teh model formulation to minimize the risk of overfitting

True

Regularization introduces a tradeoff in parameter optimization. This tradeoff can be described for less accurate model fitness (residuals) so long as the norm of the optimized paramter vector decreases

True

Supervised learning algorithms mae use of training data to determine the separation boundary /line

True

Supervised machine leraning algorithms use test data to evaluate their performance

True

The determinant of a matrix is equal to the product of its eigenvalues

True

A p-value is the probability of acquiring a specific value, x, or a more extreme strictly by chance

this is true

How many data points are required to solve for the solution of the following model analystically exactly and uniquely y = a + b*x + c*x^2

3

To identify a unique solution for the following euation, I would need [??] data points. y = a + bx^2 + cx^4 + dsin(x)

4, one point for each parameter

If two random variables are uncorrelated, their Pearson's coefficient is equal to

0

How many data points are required to solve for the solution of the following modell analytically, exactly, and uniquely y=sin(x0

0 since it has no parameters

How many data points are required to solve for the solution of the following model analystically exactly and uniquely y = m*x

1

If a dataset comprises 100 features and 1000 observations, each principal component would be length

100

If a dataset comprises 100 features and 1000 observation and we reduce it to 2 dimensions, the reduced dataset would comprise

1000 observations

If a dataset comprises 100 features and 1000 observations, its covariance matrix is size

100x100

What is the length of the vector [0 2]

2

SST

=SSE+SSR, how much data points y vary around their mean

The power of a statistic is the probability of correctly accepting a hypothesis

False, power is the probability of correctly rejecting a false null, not just any hypothesis

Correlation is equivalent to causation

False

Eigenvectors of full rank matrices are always orthogonal

False

If a MLR model has almost no error (assume it's perfect!), the relationship between the explanatory and response variables can be considered causal

False

Stochasticity derives from sources of variation solely attributed to instrument calibration

False

When separating data into training and test sets, it is recommended to divde them 50/50 otherwise the algorithm will not train properly

False

A type I error occurs when the null hypothesis is false

False, A type I error occurs when there i a false positive, so the null hypothesis is actually true in this case

PCA reduced data using nonlinear transformation

False, PCA reduces data using linear transformation

If two random variables are independent of one another, they will have a Peasron's correlation coefficient equal to 1 or -1

False, a correlation coefficient of 0 occurs when two variables are independent.

A simple linear regression model walways has two explanatory variables

False, a simple linear regression model always has one explanatory variable

A simple linear regression model would be an appropriate machine learning strategy to predict y from x when y and x are not correlated

False, a simple linear regression model would be an appropriate machine learning strategy to predict y from x when y and x ARE correlated

A normal distribution always has a mean of 0 and a std dev of 1

False, a standardized normal distribution has these parameters

If two random variables are associated, they must be correlated

False, association only occurs when one variable gives information about the other

The following is an example of a linear regression model where a, b, and c represent regression coefficients y = sin(a) + b(x)

False, because a is no longer linear

The following is an example of a linear regression model, where a,b, and c represent regression coefficients: y = a + B^x + cx^2

False, because b is no longer linear

The covariance between two random variables is always between -1 and 1

False, becuase covariance is not standardized and can range from negative to positive infinity, correlation however, is the scaled version of covariance.

If two random variables are uncorrelated, they must be independent

False, becuase they may be associated still

Linear regression identifies causal relationships between dependent and independent variables

False, correlation does not imply causation

Principal component analysis is often used to prove hypotheses

False, hypotheses can't be proven, and PCA identifies interesting structure and patterns to inform new hypotheses that can be tested through carefully considered experiments

If the P-value of an observation is greater than the alpha error, we accept our null hypothesis

False, if the p-value is greater than the alpha error, we fail to reject the null hypothesis

One objective of LDA is to maximize the covariance within classes in a given dataset

False, it aims to minimize covariance within classes

A scree plot displays eigenvectors in order of signficance

False, it displays eigenvectors in terms of the amount of information contained which is indicated by the magnitude of the eigenvalues

When separating data into training and test sets it is best to have the training data represent one class and have the test data represent another class

False, it is best to have an 80/20 split, with data distributed throughout the two

The dot product is a measure of how a vector projects onto a given matrix space

False, it is measure of how a vector projects onto a given vector

Principal component analysis is most useful when applied to 2D data

False, it is most useful when applied to higher dimension data

One objective of LDA is to identify the direction of maximal covariance across an entire dataset

False, it is to identify the direction of maximal variance between classes

LDA reduces the dimensionality of data to 3 dimensions in order to separate classes

False, it reduces the dimensionality to one less than the number of classes

Model complexity can be defined by the magnitude of the training data points

False, model complexity can be defined by the magnitude of the model parameters as well as the number of parameters

If two random variables are both associated and correlated, they must be causal

False, neither association of correlation imply causation

PCA and LDA are both used to classify data

False, only LDA is

One risk of overfitting data is that the model tends to explain more of the trend and less of the noise

False, overfitting tends to explain more of the noise and less of the trend

A pearson's correlation coefficient of 1 implies a tighter stronger relationship among random variabes than a Peasron's correlation coefficient of -1

False, sign only relates to if there is an increasing or decreasing trend

In general the sum of square errors (or the cost of the optimization function) increases as the complexity of the model structure increases

False, the SSE increases as the complexity of the model structure decreases

The following system describes a possible linear regression model y = a^2*x + x*sin(b)

False, the a parameter is non-linear

In most model formulations the feature that has the greatest influence on the response variable is the feature with the highest power

False, the feature that has the greatest influence on the response variable is the feature with the greatest magnitude model parameter

One goal of PCA is to reduce the dimensionality using a subset of principal components that minimize the variance explained

False, the goal is to maximize the variance explained

As the difference/distance between the means of H0 and HA increase, the type II error also increases

False, the type II error which is defined by Beta will decrease

Stochastic systems often have unique solutions

False, they are not deterministic

Unsupervised machine learning algorithms are not used for clustering data

False, they are such as k-means clustering

Principal components are the eigenvalues of the covariance matrix

False, they are the eigenvectors

Underdetermined systems always have one unique solution

False, they have an infinite number of solutions

The difference between LASSO, Ridge, and Elastic Net optimization is the way in which they account for and penalize the magnitude of explanatory variables

False, they penalize the magnitude of model parameters

As the sensitivity of a model increases, so does its selectivity

False, they tradeoff so usually one is prioritized with considerations of the system

To determine a p-value you must always define an alternative distribution

False, this is based on the alpha error which only requires knowledgde of the null distribution, however, finding Beta in order to find the power does require defining the alternative distribution

It is good practice to continue replicating an experiment until the p-value is deemed significant

False, this is known as p-hacking and is an unethical way to improve the statistical support of a finding that may not merit support

The linear disciminant points in the same direction as the second PC

False, though this may occur by chance in simple examples, such as when only two dimensions are included

Linear regression always provides unique solutions

False, unique solutions can only be obtained in a deterministic system

Fllipping a defined area by matrix operation is the same as rotating that area 180 degrees

False, when considering the spatial aspects, it also determines whether an eigen vector can exist

In LASSO the optimization function is defined by argmin(E^T*E + a*||B||1). The term a defines

How strongly the optimization should account for model complexity

You fit the following regression model to mean-centered (not standardized) data: y = a*x1 + b*x2 +c*x3 where a = .001 b = 5 and c = -11 Which explanatory variable has the greatest impact on the response variabel

Impossible to know when using raw training data, the data needs to be standardized to allow for comparisons among variables

You fit the following regression model to raw (not normalized) data: y = a*x1 + b*x2 +c*x3 where a = .001 b = 5 and c = -11 Which explanatory variable has the greatest impact on the response variabel

Impossible to know when using raw training data, the data needs to be standardized to allow for comparisons among variables

You git three different model formulation to the same training data. Model A resulting in an SSE = 100 Model B resulted in an SSE = 50 Model C resulted in an SSE =2. Which model is most likely to accuratey predict future data.

Impossible to know without validating/testing the model on different data, because overfitting may have occurred, but it is impossible to know without validation

Variance in experimental observations / data dervies from many sources. Select all possible sources of variation

Instrument error, the time of day the experiment takes place, inherent biological noise, batch effects

If matrix A transforms vectors from 2D to 1D

It must have a determinant of 0, and it must have an eigenvalue of 0, and it is not full rank

Examples of unsupervised ML algorithms

K-means clustering, PCA, eigendecomposition is used for it but is not an example of the algorithm

Examples of Supervised ML algorithms

LDA, decision trees, linear regression, data reduction is accomplished but it is not an example of the algorithm

Patrial least squares

MLR meets PCA, identify direction of greatest covariance, while simultaneously applying regression allows for decreased complexity of model

Model complexity accounts for (select all possible answers):

Magnitude of model parameters, number of model parameters

LDA aims to

Maximize the between class variance and minimze the within class variance

Sequence of operations that take plaec to conduct PCA

Mean center data, solve for the covariance matrix, perform an eigen decomposition, determine the number of principal components to preserve, project the data onto reduced principal component space

R2 vs Q2

R2 for a training set may be close to 1, but if Q2 is negative, this suggests that the model is not very accurate when predicting new data

Solve the numerical least squares optimization problem of a linear regression model by putting the following steps in order

Solve for the sum of squared errors, Take the derivative of the function, solve for the sum of squared errors

Which of the following experimental design choices can impact the chance of overfitting a regression model to your training data

The number of samples observed, the number of explanatory variables, and the fraction of data reserved for validation/test

How does increasing the number of samples affect the distribution

The sample variance decreases and the distribution looks more normal

PCA reduces data by dropping/omitting entire feature dimensions that carry the least amount of information

This is false, each principal component is a linear combination of the original features of the data, but rather PCA reduces data by dropping/omitting loadings principal components that carry the least amount of information

The covariance of a given feature can be found in the covariance matrix

This is true, the diagonal elements of the matrix contains the variances of the variables, and the off diaganol elements contain the covariances between all possible pairs of variables.

A scree plit helps identify the nnumber of PCs necessary to reduce data while preserving information content

True

A selective model favors false negatives

True

A sensitive model favors false positives

True

Biological data is inherently stochastic

True

Colinearity describes the situation in which two or more model parameters are linearly correlated or proprtional such that an infinite number of sollutions minimize the optimization the problem

True

Deterministic systems have unique solutions

True

Each principal component is a linear combination of the original features of the data

True

Eigenvectors of covariance matrices are always orthogonal

True

If the standard deviation of a random variables is known the variance is also known

True

In general all models are wrong but many can still be useful

True

In linear regression optimal models will have error terms aka residuals that are normally distributed

True

In linear regression, optimal models will have error terms aka residuals that form a distribution with mean equal to zero

True

In most model formulations, the explanatory variable with the greatest influence on the response ariable is the one with the greatest magnitude parameter coefficient

True

In regression the coefficient corresponding to the intercept should be equal to zero when the data is mean centered

True

LDA is a supervised ML algorithm that reduces dimensionality of data to most accurately classify data

True

One objective of LDA is to identify a linear boundary/line that separates two or more classes of data

True

One objective of LDA is to maximize the covariance between classes within a dataset

True

Overdetermined systems do not have a unique solution and rely on optimization to identify solutions that minimze a defined error function

True

Overly complex models run the risk of predicting new outcomes based on variance (or noise) in data rather than underlying biological noise

True

PCA and LDA are both used to reduce the dimensionality of data

True

Principal component analysis is an unsupervised machine learning method that transforms high dimensional data to lower dimensions

True

The explanatory variable x and the response variable y must be correlataed to ensure an appropriate simple linear regression model

True

The linear regression probelm is solved by solving for the regression coefficients that minimize the sum of squared error

True

The objective of linear regression is to accurately predict responses using a linear modeal that minimizes a defined error

True

The regularization term in an optimization problem is a user-defined value that can (and should!) vary from model to model

True

Eigenvalues are ordered from greatest magnitude to least magnitude for a covariance matrix

True this helps identify which are the most informative principal components

The threshold for whether an observation is significant is defined by the alpha error

True, alpha is also the probability of rejecting the null hypothesis when the null is true

The p-value is a continuous random variable

True, as a result we should avoid arbitrary alpha threshold and no assume that a repeat experiment would return a similarly low p-value

The field of machine learning focuses on identifying solutions to overdetermined systems, where the number of observations significantly exceeds the number of features

True, because in this case a unique solution cannot be achieved, only an overdetermined one can be

The following is an example of a linear regression model where a and b reresent regression coefficient y = a + b*log(x^4)

True, because the coefficients are still linear

Correlation is proportional to covariance

True, becuase correlation is the scaled version of covariance

The variance of a random variable is always greater than or equal to zero

True, becuase it is the average of (x-uX)^2, so it must be positive

In general, the accuracy of model fitness to training data increases as the compleixty of the model increases

True, but this is not necessarily a good thing

If two random variables are correlated, they must be associated

True, correlation is a specific form of association where the variables display increasing or decreasing trends together

PCA and LDA are both used to visualize data

True, especially considering that they decrease the number of pairwise comparisions needed

Analytical solutions to the least squares linear regression problem do not always exist

True, if the system is underdetermined, when a singularity occurs due to the observations being greater than the number of observations

Linear regression always provides optimal solutions

True, it does this by minimizing the residuals

Ridge regression imposes a penalty on model complexity by including the sum of squared coefficients in the cost function

True, ridge tends to penalize the magnitude of the parameters

Unsupervised machine learning algorithms are used when data do not have assigned labels classes groups

True, supervised methods are used when data does

The sum of square error is also defined by the square of the L2 norm

True, the L2 norm is defined as the square root of the sum of the squares of the values

The following system describes a possible linear regression model y= a*x^2 _ b*sin(x)

True, the parameters are linear

Overdetermined systems always have more data points that they have parameters to fit

True, therefore only an optimal solution can be reached

The dot product between two 2D vectors or two 10D vectors results in a single scalar number

True, this dot product determines the magnitude one vector is projected on the other

The most common cost of or loss funciton in linear regression is the sum of squared error

True, this is also the L2 norm

It is good practice to report the confidence interval of resulting data as an additional metric for significance

True, this is an alternative way of reporting statistical support for a finding

To determine a p-value you must always define a null distribution

True, this is based on the alpha error which only requires knowledgde of the null distribution

Unless the statistical power of a test is very high, >90% the p-value is not a reliable measure of the strength of evidence against H0.

True, this is becuase the p-value can exhibit wide sample-sample variability

A type II error occurs when you incorrectly fail to reject the null hypothesis

True, this is defined by the Beta parameter, in this case the null is actually false

The dot product is a useful matrix operation that enables efficient summing of squared vector component terms

True, this is effectively taking the inner product, or the L2 norm I believe

The regularization function in regression models imposes a penalty on the norm of the parameter coefficients

True, this may the L1 norm or L2 norm or a combination of them

In a linear regression, the best models have errors (or residuals) that follow a Gaussian distribution with a mean of 0

True, this means that there is low error

Matrix defined by [10 0] [ 0 100] will

Will stretch vectors along the y-axis more than along the x-axis and transform the unit circle into an ellipse

Regularization can minimize the impact of colinariety

Yes, becuase there will be a single solution where the optima of SSE and regularization function are reached

The covariance matrix always has square dimensions

Yes, it has dimensions of pxp where p is the number of features

The following 2D matrix is used to transfrom the unit x-vector located at (1,0). Where does the vector land upon transformation by the 2D matrix [a b] [c d]

[a c]

Q2

a way to measure accuracy of prediction, provides a way to compare different types of models, negative Q2 is not very predictive

coefficient of determination

allows to assess fitness/error of model, provides a number to compare amongst models

R2

as variance approaches zero R2 approaches negative infinity, as SSE approaches 0 R1 approaches 1

LDA aims to predict a response such as

benign v malignant tumor type

You fit the following regression model to standardized data: y = a*x1 + b*x2 +c*x3 where a = .001 b = 5 and c = -11 Which explanatory variable has the greatest impact on the response variabel

c because it has the greatest magnitude

A matrix...

can represent a system of linear equations, can represent a data set, can operate on vectors, can comprise a set of vectors

A full rank 5D matrix

can transform a 5d vector into a new 5d space

a good experimental design has a

cue that is tuned, signal intermediates that need to be uncovered, and responses that are specific

Increased sample size of an experiment results in

decreased variance of HA, decreased variance of H0, beta decreases, and alpha decreases

leave out one cross validation

define one exp condition as validation data set, 25 of 25 support training data, fit to training data then predict outcome of validation, evaluate Q2, then repeat for all other variables

deterministic vs stochastic

deterministic is certain, stochastic has some probabilistic mechanism

Matrix A defined as [x 0] [0 Y] is

diagonal, symmetric, and full rank

The eigenvector of a matrix

does not point in the direction of greatest scaling, does not always exist, does scale by a scaling factor known as the eigenvalue, if real does not rotate upon transformation by A

Rotation matrices all have

eigenvalues of 1, determinants of 1, square dimensions, and eigenvectors with imaginary components

For simple linear regression, coefficient of determination is

equivalent to square of Pearson's correlation coefficient, which tells how tightly associated two variables are, SLE effectively tells how well the single variable is able to predict a given outcome

A 2d matrix A has a determinant of -10, so it

expands the unit circle by a factor of 10, and flips the unit circle

A 3D matrix with a determinant of -500 will

flip and expand a fiven volume upon transformation

If matrix has at least one negative eigenvalue, it

flips at least some of the vectors in operates on

Assuming the data has g features and h samples, the dimensions of the covariance matrix are

g xg

PCA identifies the direction of ... in a given dataset

greatest covariance

If a matrix has a determinant of 0, it must also

have at least one eigenvalue equal to 0 and decrease the dimensionality of a vector upon transformation

Matrix A [0 0] [0 0] will

have at least one eigenvalue equal to 0, have a determinant equal to 0, would transform a unit vector and unit circle to the point at the origin

PCA

identify the direction of maximum covariance in the data

biological systems are

inherently nonlinear, are very complex, variable, and stochastic

The matrix operation for approximating the covariance matrix

is taking the inner or outer product of the mean centered data, this is also the dot product or L2 norm

If a full rank matrix had real valued eigenvectors and eigenvalues

it would not rotate vectors pointed int he direction of eigenvectors

top-down approach

machine learning use higher throughput, higher level data, to inder about lower level interactions, make nework inferences

If a dataset comprises p features and n observations and we reduce it to 2D using LDA, the reduced data would comprise

n data points

SLR

only one variable used to describe output, develop correlation plot to see is a single variable associates best

If a dataset comprises p features and n observations each PC would be length

p, length is not the same as magnitude

z-score

standardized horizontal is predicted Y is observed, parity line is where prediction equals observed

bottom-up approach

such as physics, to cimulate behaviors forward

Pearson's correlation

tests how linear the relationship between x and y is

Spearman's correlation

tests monotonocity, so just increasing or decreasing together

The principal components of a data set are

the eigenvectors of the covariance matrix of the data

To avoid singularities it is best to collect data such that

the number of observations are signficantly more than the number of features

Matrix [1 1] [1 1] will

transform all vectors onto a diagonal line

A full rank 2x2 matrix

transforms data from 2D to 2D, does not operate on vectors in 3D, has nonzero eigenvalues

response

variable trying to predict, signal is x, cue is perturbation, such as finding combinations of cues that prevent cell growth

A matrix can

zero out, shrink or expand, transform into a different dimensional space, flip, or rotate a vector


Set pelajaran terkait

Intro to Drugs and Behavior Chpater 11

View Set

Dr. Draper PSY 4300 ALL Chapters

View Set

Chapter 7: Assessing and Securing Your Credit

View Set

PrepU: Chapter 69: Management of Patients With Neurologic Infections, Autoimmune Disorders, and Neuropathies

View Set

MaryANN HoGAN MedSurg. Nursing Chapter 25 Principles

View Set

Impact of European Explorers on Native Americans

View Set

Generalized Anxiety Disorder (Sherpath)

View Set