BIOL 359 Final Quiz
Regularization imposes a penalty on the complexity of teh model formulation to minimize the risk of overfitting
True
Regularization introduces a tradeoff in parameter optimization. This tradeoff can be described for less accurate model fitness (residuals) so long as the norm of the optimized paramter vector decreases
True
Supervised learning algorithms mae use of training data to determine the separation boundary /line
True
Supervised machine leraning algorithms use test data to evaluate their performance
True
The determinant of a matrix is equal to the product of its eigenvalues
True
A p-value is the probability of acquiring a specific value, x, or a more extreme strictly by chance
this is true
How many data points are required to solve for the solution of the following model analystically exactly and uniquely y = a + b*x + c*x^2
3
To identify a unique solution for the following euation, I would need [??] data points. y = a + bx^2 + cx^4 + dsin(x)
4, one point for each parameter
If two random variables are uncorrelated, their Pearson's coefficient is equal to
0
How many data points are required to solve for the solution of the following modell analytically, exactly, and uniquely y=sin(x0
0 since it has no parameters
How many data points are required to solve for the solution of the following model analystically exactly and uniquely y = m*x
1
If a dataset comprises 100 features and 1000 observations, each principal component would be length
100
If a dataset comprises 100 features and 1000 observation and we reduce it to 2 dimensions, the reduced dataset would comprise
1000 observations
If a dataset comprises 100 features and 1000 observations, its covariance matrix is size
100x100
What is the length of the vector [0 2]
2
SST
=SSE+SSR, how much data points y vary around their mean
The power of a statistic is the probability of correctly accepting a hypothesis
False, power is the probability of correctly rejecting a false null, not just any hypothesis
Correlation is equivalent to causation
False
Eigenvectors of full rank matrices are always orthogonal
False
If a MLR model has almost no error (assume it's perfect!), the relationship between the explanatory and response variables can be considered causal
False
Stochasticity derives from sources of variation solely attributed to instrument calibration
False
When separating data into training and test sets, it is recommended to divde them 50/50 otherwise the algorithm will not train properly
False
A type I error occurs when the null hypothesis is false
False, A type I error occurs when there i a false positive, so the null hypothesis is actually true in this case
PCA reduced data using nonlinear transformation
False, PCA reduces data using linear transformation
If two random variables are independent of one another, they will have a Peasron's correlation coefficient equal to 1 or -1
False, a correlation coefficient of 0 occurs when two variables are independent.
A simple linear regression model walways has two explanatory variables
False, a simple linear regression model always has one explanatory variable
A simple linear regression model would be an appropriate machine learning strategy to predict y from x when y and x are not correlated
False, a simple linear regression model would be an appropriate machine learning strategy to predict y from x when y and x ARE correlated
A normal distribution always has a mean of 0 and a std dev of 1
False, a standardized normal distribution has these parameters
If two random variables are associated, they must be correlated
False, association only occurs when one variable gives information about the other
The following is an example of a linear regression model where a, b, and c represent regression coefficients y = sin(a) + b(x)
False, because a is no longer linear
The following is an example of a linear regression model, where a,b, and c represent regression coefficients: y = a + B^x + cx^2
False, because b is no longer linear
The covariance between two random variables is always between -1 and 1
False, becuase covariance is not standardized and can range from negative to positive infinity, correlation however, is the scaled version of covariance.
If two random variables are uncorrelated, they must be independent
False, becuase they may be associated still
Linear regression identifies causal relationships between dependent and independent variables
False, correlation does not imply causation
Principal component analysis is often used to prove hypotheses
False, hypotheses can't be proven, and PCA identifies interesting structure and patterns to inform new hypotheses that can be tested through carefully considered experiments
If the P-value of an observation is greater than the alpha error, we accept our null hypothesis
False, if the p-value is greater than the alpha error, we fail to reject the null hypothesis
One objective of LDA is to maximize the covariance within classes in a given dataset
False, it aims to minimize covariance within classes
A scree plot displays eigenvectors in order of signficance
False, it displays eigenvectors in terms of the amount of information contained which is indicated by the magnitude of the eigenvalues
When separating data into training and test sets it is best to have the training data represent one class and have the test data represent another class
False, it is best to have an 80/20 split, with data distributed throughout the two
The dot product is a measure of how a vector projects onto a given matrix space
False, it is measure of how a vector projects onto a given vector
Principal component analysis is most useful when applied to 2D data
False, it is most useful when applied to higher dimension data
One objective of LDA is to identify the direction of maximal covariance across an entire dataset
False, it is to identify the direction of maximal variance between classes
LDA reduces the dimensionality of data to 3 dimensions in order to separate classes
False, it reduces the dimensionality to one less than the number of classes
Model complexity can be defined by the magnitude of the training data points
False, model complexity can be defined by the magnitude of the model parameters as well as the number of parameters
If two random variables are both associated and correlated, they must be causal
False, neither association of correlation imply causation
PCA and LDA are both used to classify data
False, only LDA is
One risk of overfitting data is that the model tends to explain more of the trend and less of the noise
False, overfitting tends to explain more of the noise and less of the trend
A pearson's correlation coefficient of 1 implies a tighter stronger relationship among random variabes than a Peasron's correlation coefficient of -1
False, sign only relates to if there is an increasing or decreasing trend
In general the sum of square errors (or the cost of the optimization function) increases as the complexity of the model structure increases
False, the SSE increases as the complexity of the model structure decreases
The following system describes a possible linear regression model y = a^2*x + x*sin(b)
False, the a parameter is non-linear
In most model formulations the feature that has the greatest influence on the response variable is the feature with the highest power
False, the feature that has the greatest influence on the response variable is the feature with the greatest magnitude model parameter
One goal of PCA is to reduce the dimensionality using a subset of principal components that minimize the variance explained
False, the goal is to maximize the variance explained
As the difference/distance between the means of H0 and HA increase, the type II error also increases
False, the type II error which is defined by Beta will decrease
Stochastic systems often have unique solutions
False, they are not deterministic
Unsupervised machine learning algorithms are not used for clustering data
False, they are such as k-means clustering
Principal components are the eigenvalues of the covariance matrix
False, they are the eigenvectors
Underdetermined systems always have one unique solution
False, they have an infinite number of solutions
The difference between LASSO, Ridge, and Elastic Net optimization is the way in which they account for and penalize the magnitude of explanatory variables
False, they penalize the magnitude of model parameters
As the sensitivity of a model increases, so does its selectivity
False, they tradeoff so usually one is prioritized with considerations of the system
To determine a p-value you must always define an alternative distribution
False, this is based on the alpha error which only requires knowledgde of the null distribution, however, finding Beta in order to find the power does require defining the alternative distribution
It is good practice to continue replicating an experiment until the p-value is deemed significant
False, this is known as p-hacking and is an unethical way to improve the statistical support of a finding that may not merit support
The linear disciminant points in the same direction as the second PC
False, though this may occur by chance in simple examples, such as when only two dimensions are included
Linear regression always provides unique solutions
False, unique solutions can only be obtained in a deterministic system
Fllipping a defined area by matrix operation is the same as rotating that area 180 degrees
False, when considering the spatial aspects, it also determines whether an eigen vector can exist
In LASSO the optimization function is defined by argmin(E^T*E + a*||B||1). The term a defines
How strongly the optimization should account for model complexity
You fit the following regression model to mean-centered (not standardized) data: y = a*x1 + b*x2 +c*x3 where a = .001 b = 5 and c = -11 Which explanatory variable has the greatest impact on the response variabel
Impossible to know when using raw training data, the data needs to be standardized to allow for comparisons among variables
You fit the following regression model to raw (not normalized) data: y = a*x1 + b*x2 +c*x3 where a = .001 b = 5 and c = -11 Which explanatory variable has the greatest impact on the response variabel
Impossible to know when using raw training data, the data needs to be standardized to allow for comparisons among variables
You git three different model formulation to the same training data. Model A resulting in an SSE = 100 Model B resulted in an SSE = 50 Model C resulted in an SSE =2. Which model is most likely to accuratey predict future data.
Impossible to know without validating/testing the model on different data, because overfitting may have occurred, but it is impossible to know without validation
Variance in experimental observations / data dervies from many sources. Select all possible sources of variation
Instrument error, the time of day the experiment takes place, inherent biological noise, batch effects
If matrix A transforms vectors from 2D to 1D
It must have a determinant of 0, and it must have an eigenvalue of 0, and it is not full rank
Examples of unsupervised ML algorithms
K-means clustering, PCA, eigendecomposition is used for it but is not an example of the algorithm
Examples of Supervised ML algorithms
LDA, decision trees, linear regression, data reduction is accomplished but it is not an example of the algorithm
Patrial least squares
MLR meets PCA, identify direction of greatest covariance, while simultaneously applying regression allows for decreased complexity of model
Model complexity accounts for (select all possible answers):
Magnitude of model parameters, number of model parameters
LDA aims to
Maximize the between class variance and minimze the within class variance
Sequence of operations that take plaec to conduct PCA
Mean center data, solve for the covariance matrix, perform an eigen decomposition, determine the number of principal components to preserve, project the data onto reduced principal component space
R2 vs Q2
R2 for a training set may be close to 1, but if Q2 is negative, this suggests that the model is not very accurate when predicting new data
Solve the numerical least squares optimization problem of a linear regression model by putting the following steps in order
Solve for the sum of squared errors, Take the derivative of the function, solve for the sum of squared errors
Which of the following experimental design choices can impact the chance of overfitting a regression model to your training data
The number of samples observed, the number of explanatory variables, and the fraction of data reserved for validation/test
How does increasing the number of samples affect the distribution
The sample variance decreases and the distribution looks more normal
PCA reduces data by dropping/omitting entire feature dimensions that carry the least amount of information
This is false, each principal component is a linear combination of the original features of the data, but rather PCA reduces data by dropping/omitting loadings principal components that carry the least amount of information
The covariance of a given feature can be found in the covariance matrix
This is true, the diagonal elements of the matrix contains the variances of the variables, and the off diaganol elements contain the covariances between all possible pairs of variables.
A scree plit helps identify the nnumber of PCs necessary to reduce data while preserving information content
True
A selective model favors false negatives
True
A sensitive model favors false positives
True
Biological data is inherently stochastic
True
Colinearity describes the situation in which two or more model parameters are linearly correlated or proprtional such that an infinite number of sollutions minimize the optimization the problem
True
Deterministic systems have unique solutions
True
Each principal component is a linear combination of the original features of the data
True
Eigenvectors of covariance matrices are always orthogonal
True
If the standard deviation of a random variables is known the variance is also known
True
In general all models are wrong but many can still be useful
True
In linear regression optimal models will have error terms aka residuals that are normally distributed
True
In linear regression, optimal models will have error terms aka residuals that form a distribution with mean equal to zero
True
In most model formulations, the explanatory variable with the greatest influence on the response ariable is the one with the greatest magnitude parameter coefficient
True
In regression the coefficient corresponding to the intercept should be equal to zero when the data is mean centered
True
LDA is a supervised ML algorithm that reduces dimensionality of data to most accurately classify data
True
One objective of LDA is to identify a linear boundary/line that separates two or more classes of data
True
One objective of LDA is to maximize the covariance between classes within a dataset
True
Overdetermined systems do not have a unique solution and rely on optimization to identify solutions that minimze a defined error function
True
Overly complex models run the risk of predicting new outcomes based on variance (or noise) in data rather than underlying biological noise
True
PCA and LDA are both used to reduce the dimensionality of data
True
Principal component analysis is an unsupervised machine learning method that transforms high dimensional data to lower dimensions
True
The explanatory variable x and the response variable y must be correlataed to ensure an appropriate simple linear regression model
True
The linear regression probelm is solved by solving for the regression coefficients that minimize the sum of squared error
True
The objective of linear regression is to accurately predict responses using a linear modeal that minimizes a defined error
True
The regularization term in an optimization problem is a user-defined value that can (and should!) vary from model to model
True
Eigenvalues are ordered from greatest magnitude to least magnitude for a covariance matrix
True this helps identify which are the most informative principal components
The threshold for whether an observation is significant is defined by the alpha error
True, alpha is also the probability of rejecting the null hypothesis when the null is true
The p-value is a continuous random variable
True, as a result we should avoid arbitrary alpha threshold and no assume that a repeat experiment would return a similarly low p-value
The field of machine learning focuses on identifying solutions to overdetermined systems, where the number of observations significantly exceeds the number of features
True, because in this case a unique solution cannot be achieved, only an overdetermined one can be
The following is an example of a linear regression model where a and b reresent regression coefficient y = a + b*log(x^4)
True, because the coefficients are still linear
Correlation is proportional to covariance
True, becuase correlation is the scaled version of covariance
The variance of a random variable is always greater than or equal to zero
True, becuase it is the average of (x-uX)^2, so it must be positive
In general, the accuracy of model fitness to training data increases as the compleixty of the model increases
True, but this is not necessarily a good thing
If two random variables are correlated, they must be associated
True, correlation is a specific form of association where the variables display increasing or decreasing trends together
PCA and LDA are both used to visualize data
True, especially considering that they decrease the number of pairwise comparisions needed
Analytical solutions to the least squares linear regression problem do not always exist
True, if the system is underdetermined, when a singularity occurs due to the observations being greater than the number of observations
Linear regression always provides optimal solutions
True, it does this by minimizing the residuals
Ridge regression imposes a penalty on model complexity by including the sum of squared coefficients in the cost function
True, ridge tends to penalize the magnitude of the parameters
Unsupervised machine learning algorithms are used when data do not have assigned labels classes groups
True, supervised methods are used when data does
The sum of square error is also defined by the square of the L2 norm
True, the L2 norm is defined as the square root of the sum of the squares of the values
The following system describes a possible linear regression model y= a*x^2 _ b*sin(x)
True, the parameters are linear
Overdetermined systems always have more data points that they have parameters to fit
True, therefore only an optimal solution can be reached
The dot product between two 2D vectors or two 10D vectors results in a single scalar number
True, this dot product determines the magnitude one vector is projected on the other
The most common cost of or loss funciton in linear regression is the sum of squared error
True, this is also the L2 norm
It is good practice to report the confidence interval of resulting data as an additional metric for significance
True, this is an alternative way of reporting statistical support for a finding
To determine a p-value you must always define a null distribution
True, this is based on the alpha error which only requires knowledgde of the null distribution
Unless the statistical power of a test is very high, >90% the p-value is not a reliable measure of the strength of evidence against H0.
True, this is becuase the p-value can exhibit wide sample-sample variability
A type II error occurs when you incorrectly fail to reject the null hypothesis
True, this is defined by the Beta parameter, in this case the null is actually false
The dot product is a useful matrix operation that enables efficient summing of squared vector component terms
True, this is effectively taking the inner product, or the L2 norm I believe
The regularization function in regression models imposes a penalty on the norm of the parameter coefficients
True, this may the L1 norm or L2 norm or a combination of them
In a linear regression, the best models have errors (or residuals) that follow a Gaussian distribution with a mean of 0
True, this means that there is low error
Matrix defined by [10 0] [ 0 100] will
Will stretch vectors along the y-axis more than along the x-axis and transform the unit circle into an ellipse
Regularization can minimize the impact of colinariety
Yes, becuase there will be a single solution where the optima of SSE and regularization function are reached
The covariance matrix always has square dimensions
Yes, it has dimensions of pxp where p is the number of features
The following 2D matrix is used to transfrom the unit x-vector located at (1,0). Where does the vector land upon transformation by the 2D matrix [a b] [c d]
[a c]
Q2
a way to measure accuracy of prediction, provides a way to compare different types of models, negative Q2 is not very predictive
coefficient of determination
allows to assess fitness/error of model, provides a number to compare amongst models
R2
as variance approaches zero R2 approaches negative infinity, as SSE approaches 0 R1 approaches 1
LDA aims to predict a response such as
benign v malignant tumor type
You fit the following regression model to standardized data: y = a*x1 + b*x2 +c*x3 where a = .001 b = 5 and c = -11 Which explanatory variable has the greatest impact on the response variabel
c because it has the greatest magnitude
A matrix...
can represent a system of linear equations, can represent a data set, can operate on vectors, can comprise a set of vectors
A full rank 5D matrix
can transform a 5d vector into a new 5d space
a good experimental design has a
cue that is tuned, signal intermediates that need to be uncovered, and responses that are specific
Increased sample size of an experiment results in
decreased variance of HA, decreased variance of H0, beta decreases, and alpha decreases
leave out one cross validation
define one exp condition as validation data set, 25 of 25 support training data, fit to training data then predict outcome of validation, evaluate Q2, then repeat for all other variables
deterministic vs stochastic
deterministic is certain, stochastic has some probabilistic mechanism
Matrix A defined as [x 0] [0 Y] is
diagonal, symmetric, and full rank
The eigenvector of a matrix
does not point in the direction of greatest scaling, does not always exist, does scale by a scaling factor known as the eigenvalue, if real does not rotate upon transformation by A
Rotation matrices all have
eigenvalues of 1, determinants of 1, square dimensions, and eigenvectors with imaginary components
For simple linear regression, coefficient of determination is
equivalent to square of Pearson's correlation coefficient, which tells how tightly associated two variables are, SLE effectively tells how well the single variable is able to predict a given outcome
A 2d matrix A has a determinant of -10, so it
expands the unit circle by a factor of 10, and flips the unit circle
A 3D matrix with a determinant of -500 will
flip and expand a fiven volume upon transformation
If matrix has at least one negative eigenvalue, it
flips at least some of the vectors in operates on
Assuming the data has g features and h samples, the dimensions of the covariance matrix are
g xg
PCA identifies the direction of ... in a given dataset
greatest covariance
If a matrix has a determinant of 0, it must also
have at least one eigenvalue equal to 0 and decrease the dimensionality of a vector upon transformation
Matrix A [0 0] [0 0] will
have at least one eigenvalue equal to 0, have a determinant equal to 0, would transform a unit vector and unit circle to the point at the origin
PCA
identify the direction of maximum covariance in the data
biological systems are
inherently nonlinear, are very complex, variable, and stochastic
The matrix operation for approximating the covariance matrix
is taking the inner or outer product of the mean centered data, this is also the dot product or L2 norm
If a full rank matrix had real valued eigenvectors and eigenvalues
it would not rotate vectors pointed int he direction of eigenvectors
top-down approach
machine learning use higher throughput, higher level data, to inder about lower level interactions, make nework inferences
If a dataset comprises p features and n observations and we reduce it to 2D using LDA, the reduced data would comprise
n data points
SLR
only one variable used to describe output, develop correlation plot to see is a single variable associates best
If a dataset comprises p features and n observations each PC would be length
p, length is not the same as magnitude
z-score
standardized horizontal is predicted Y is observed, parity line is where prediction equals observed
bottom-up approach
such as physics, to cimulate behaviors forward
Pearson's correlation
tests how linear the relationship between x and y is
Spearman's correlation
tests monotonocity, so just increasing or decreasing together
The principal components of a data set are
the eigenvectors of the covariance matrix of the data
To avoid singularities it is best to collect data such that
the number of observations are signficantly more than the number of features
Matrix [1 1] [1 1] will
transform all vectors onto a diagonal line
A full rank 2x2 matrix
transforms data from 2D to 2D, does not operate on vectors in 3D, has nonzero eigenvalues
response
variable trying to predict, signal is x, cue is perturbation, such as finding combinations of cues that prevent cell growth
A matrix can
zero out, shrink or expand, transform into a different dimensional space, flip, or rotate a vector