CS 345 Final

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Which of the following offsets, do we use in linear regression's least square line fit? Suppose horizontal axis is independent variable and vertical axis is dependent variable.

Vertical offsets

A hyperplane in n-dimensional vector space Rn, is defined to be the set of n-dimensional vectors satisfying the linear equation of the form a1x1+a2x2+⋯+anxn=b. What is the dimensionality of the hyperplane a1x1+a2x2+⋯+anxn=0 where a1, a2,... are some fixed real numbers and not all of these are zero.

n-1

It is not possible to do numerical operations e.g. addition on numpy arrays of different shapes.

False

Jupyter notebooks in the same folder share the same name space.

False

Larger learning rate (𝜂 in the update equation 𝐰′=𝐰+𝑦 𝜂𝐱 ) will lead to quicker convergence of the perceptron.

False

It is usually not a good idea to replace the missing values in a feature vector with 0.

True

NumPy array x and y's shape are both (2,3,4,5). What is the shape of numpy.hstack((x,y))

(2,6,4,5)

NumPy array x's shape is (2,3,4). What is the shape of numpy.argmax(x, axis=0)?

(3,4)

NumPy array x and y's shape are both (2,3,4,5). What is the shape of numpy.vstack((x,y))

(4,3,4,5)

Suppose, you got a situation where you find that your multivariate linear regression model is under fitting the data. In such situation which of the following options would you consider? Please select all that apply.

- Add more independent variables - Map x into polynomial basis feature space of higher degrees

Nearest neighbor classier is suited for lower dimensional data. When handle high dimensional data, which of the following option can be helpful. Please select all that apply.

- Dimensionality reduction - Feature selection - Both A and B

Which of the following distance metric can be used in Nearest Neighbor classifier? Please select all that apply.

- Euclidean distance - Manhattan distance - Cosine similarity - Correlation distance - All can be used

The image below is the first diagonal plot of the seaborn pairplot of the iris data set showed in class. The y axis is the sepal length The x axis is the sepal length

- False - True

f is a real-valued function defined on a domain X, X∈R3. There are some points on f at which the first derivative of X must equal 0. These points could be ..., please select all that apply.

- Local maximum - Local minimum - Global maximum - Global minimum - Saddle point

In computer vision, image normalization is a common data preprocess step that changes the range of pixel values. One example of image normalization is used in the cat dog example discussed in class, as shown in the two lines of code below. dogPixN = [float(x / 255.0) for x in dogPix] Answer 1: Answer 2: catPixN = [float(x / 255.0) for x in catPix] Now imagine the vector that will transform the cat image into the dog image in high dimensional space. After image normalization as specified by the code above, the transform vector's magnitude will change ? the transform vector's direction will change ?

- True - False

A matplotlib.pyplot.figure contains 4 subplots that are arranged according to fig, ax = plt.subplots(2,2) If you want to put a subplot at the left lower corner of the figure, how will you index ax? Please select all that apply.

- ax[1,0] - ax[1][0]

Select below all the immutable data types in Python.

- int - float - str - tuple

Select all answers that apply to Jupyter Notebook.

- is a server-client application that can be run through a web browser. - is an interactive data science environment across many programming languages including Python. - is a great tool for research-based coding, visualizing and analyses sharing.

Which of the following are true for the k-nearest neighbor (k-NN) algorithm?

- k-NN can be used for both classification and regression. - As k increases, the bias usually increases. - k-NN makes no assumptions about the functional form of the problem being solved.

If you want to visualize time series data, what plotting strategy would you choose? If you want to visualize clustering of multiple groups of data, what plotting strategy would you choose? If you want to visualize the distribution of a metric on a population, what plotting strategy would you choose?

- line plot - scatter plot - histogram

An image can be represented as a vector in high dimensional space. When an image is brightened, that is increasing the values of all the pixels in the image by a certain amount. What properties of the image vector will change? Please select all that apply.

- magnitude - direction

Which of the following evaluation metrics can be used to evaluate a machine learning model that predicts a continuous output variable?

- mean squared error - mean absolute error

import numpy as np x = np.linespace(0,10, 100) fig = plt.figure() plt.plot(x, np.sin(x), '-') plt.plot(x, np.cos(x), '--') The code snippet above is discussed in class, it will display the sine and cosine plots in one figure as shown here. However on top of the figure, it will also display a string output [<matplotlib.lines.Line2D at 0x7f62e3f51c50>]. If you want to get rid of this string output, what can you do? Select all that apply.

- put a ; at the end of the last line of code from the snippet - add plt.show() at the end of the cell - add pass at the end of the cell

Select all the answers that apply to pandas DataFrame.

- two dimensional - size is mutable - data items can be heterogeneous

Correlation and causation are two distinct concepts. Please analyze the following scenarios and select the ones that have a causal relation between the two events i.e. x causes y.

- x: Physical activity ; y: Reduced mortality risk - x: take Covid19 vaccine shots; y: increased immunity against Covid19 virus - x: gravity; y: object falls

Vector w and point x are as illustrated in the picture above, w is of unit length and x is on the line that has a number 0.3 on it. All the points on this line including x will satisfy 𝑤1𝑥1+𝑤2𝑥2+𝑏=0, so what is the value of b?

-0.3

The pictures below are 2 eigenfaces (https://en.wikipedia.org/wiki/Eigenface), what is the dot product of these 2 eigenfaces ?

0

foo = [None, 0, 1e-1000, '', []] bar = [i for i in foo if i] What is the length of bar?

0

We generate a challenging machine learning task by first sampling points for a multivariate Gaussian and then at random assign each sample a label. What is the expected accuracy using 5-fold stratified cross validation and a 3 nearest neighbor classifier? Oh, and this is important, there are four possible labels (0, 1, 2, 3) and they are all equally likely in the dataset. Indeed, in the data set of size 400,000 each label appears 100,000 times.

0.25

The Iris dataset has been discussed in module01_02_labeled_data.ipynb, there are three classes in the Iris dataset: setosa, versicolor and virginica. Does setosa linearly separable from versicolor? Yes Dose versicolor linearly separable from virginica? No Does virginica linearly separable from setosa? Yes Please deduce your conclusion based only on the diagonal plots of the pair plots as shown below. 1- 2- 3-

1- Yes 2- No 3- Yes

Gradient descent is used to fit a multivariate linear regression function (𝑦̂ =𝐰⊤𝐱+𝑏) to a dataset. The loss function used for optimization is mean squared error (Loss = 1N∑i=1N(yi−y^i)2). During one step in training, for an input x1 ,the predicted output y^1 is 1 and the true output y1is 0. What is the gradient with respective to bias b (∂Loss∂b) at this step for input x1. Hint, enter your answer as an integer. Please refer module03_04_linear_regression_gradient_descent.ipynb for the derivation of gradient.

2

Nearest neighbor classifier performs more computation on training time than testing time. (Hint: examine the fit() and predict() methods in class nearest_neighbor)

False

Learning rate manipulation is one plausible way to prevent the loss function of a machine learning algorithm from getting stuck in a local minimum. The figure above showed the plots of loss vs epoch when training a machine learning model using 4 different learning rate A, B, C, D, respectively. Which learning rate is most likely to have helped the machine learning model to get out of local minima and reach the global minimum.

D - has the least loss with the most epoch

x After running this cell above in Jupyter notebook, what output will show?

Depends on what value x has been assigned before

What best describes the topic of artificial neural networks to CS 345?

Artificial neural networks are covered after establishing a foundation

The three figures (A, B, C) below are scatter plot of a set of points (x-axis is the independent variable, y-axis is the dependent variable), the dash line in each figure is the fitted curve using polynomial basis regression, which figure has the largest degree?

C Maps most closely to each point with lots of spikes in the line

The figure above is the scatter plot of a set of points. Which of the 4 labeled vectors should be the first principle component when principle component analysis is applied to this dataset.

D (In the graph its the only one pointing in the direction of the correlation of the graph)

Here are two slightly different versions of a nearest neighbor classifier. Version 1 - take the data as provided and use Euclidean distance to measure how close samples are to each other. Version 2 - take the data and normalize each dimension so it has zero sample mean and sample standard deviation one, then use Euclidean distance. Now match versions to descriptions. Version 1

Dimensions with great excursion naturally receive greater importance.

A dataset is subject to regression analysis, it was revealed through marginal analysis that the coefficients for every input features are all positive, therefore, for multivariate regression analysis on this dataset, the coefficient for every input features will also be all positive.

False

Data in the scatter plot above (different symbols represent different classes) is linearly separable.

False

In Python, a variable needs to be declared ( e.g. like in Java, int foo; ) before usage.

False

In Python3, the return value of expression range(4) == [0,1,2,3] is True.

False

In machine learning, each iteration over one sample in the training dataset is called an epoch.

False

In the image above. the dot product of vector B and vector C is larger than the dot product of vector B and vector A.

False

Note: this question has been rephrased on October 18, 2021 Suppose Pearson correlation between V1 and V2 is zero. Are the following two statements both True or both False 1) The angle between V1 and V2 is unknown. 2) Given a value from V1 it is possible to predict the corresponding value in V2 with better than chance odds of guessing correctly.

False

NumPy array regardless of its dimension is stored in a contiguous and fixed block of memory that contains the references to its data items in a fixed order.

False

Principal component analysis (PCA) is a supervised learning technique.

False

Principal component analysis (PCA) is invariant to affine transformation (https://en.wikipedia.org/wiki/Affine_transformation) when apply to image data.

False

The mean and variance of the angle between any two randomly selected vectors are smaller in higher dimensional space than lower dimensional space.

False

The more complex a model is, the more likely that a good empirical result is not just due to the peculiarities of the sample a.k.a better generalization.

False

Vector is a quantity that has magnitude, direction and location.

False

We can get multiple local optimum solutions if we solve a linear regression problem by minimizing the sum of squared errors using gradient descent.

False

We can only compute the parameters of linear regression using the iterative gradient descent approach.

False

kNN (k Nearest Neighbor) classifier's decision boundary/surface is continuous.

False

A labeled data set is typically presented in the form of a data matrix X and label vector y. What best describes the contents of y?

Integers

There are 10000 samples and 10 different classes (class 1, class2 ... class10) in a dataset. After stratified 5-Folds train/test data split (with shuffle), how many samples of class 1 are in fold 5?

It can't be determined, because class distribution of the dataset is unknown.

If 512 different faces are used to create a set of PCA vectors expressing facial appearance, i.e. eigenfaces then how many new peoples faces are we able to generate - with modest error - as linear combinations of these PCA vectors?

Many (more than 512)

Which linear regression error measure is the less vulnerable/sensitive to outliers?

Mean absolute error

The above figure shows the scatter plot of two features (X1 and X2) with the class information (Red: top cluster, Blue :bottom cluster). The direction of the first principal component from principle component analysis (PCA) is shown at the lower right corner. Is this data more easily classified if first it is project onto the first principal component?

No (In the graph the PCA points in the exact same direction as the two data sets)

Which programming language will be used in CS345? Python Hint: this programming language is also one of the primary data science/machine learning languages in recent years.

Python

The gradient of MAE (mean absolute error) is not continuous when y_pred = y_true, there is no defined derivative at that point. therefore, MSE (mean squared error) is more commonly used as the error measure for gradient descent in linear regression model optimization.

True

The weight vector w of a perceptron can be initialized into all 0s or any random numbers.

True

What is the output after running the following code cell in Jupyter Notebook. empty = [None,[],'',[[],[[]]]] empty[-4] == None

True

Which conclusion can be drawn from the scatterplot below?

There is a positive correlation between ice cream sales and sunglasses sold.

What is so special about the random seed 42?

Towards the end of Douglas Adams's popular 1979 science-fiction novel The Hitchhiker's Guide to the Galaxy, the supercomputer Deep Thought reveals that the answer to the great question of "life, the universe and everything" is 42.

Here are two slightly different versions of a nearest neighbor classifier. Version 1 - take the data as provided and use Euclidean distance to measure how close samples are to each other. Version 2 - take the data and normalize each dimension so it has zero sample mean and sample standard deviation one, then use Euclidean distance. Now match versions to descriptions. Version 2

Treats movements in all directions (dimensions) as equally important.

A single-layer perceptron with a properly selected learning rate, will eventually find a hyperplain that can optimally separate a linearly separable dataset that has binary classes.

True

Causation as a term has, and still, caused/causes a lot of drama and confusion. Consider the following situation, you observe two Boolean variables associated with something observable in the world, e.g. cats, people, cars. More often than not when variable A is observed to be true for an instance variable B is observed to be true as well. However, an instance (perhaps a few) have been found where A is true yet B is false. Under these conditions, where to you stand on the true/false status of the following declarative statement. A (variable A being true) causes B (variable B being true). And if you are not comfortable with the right answer designated for this quiz question - then perhaps think a bit about the limits of a world where all statements are forced into the straight jacket of being true/false (also, for the sake of your grade, please play along and give the answer expected).

True

For the most part, Python is an interpreted language and not a compiled one.

True

If a model performs well on the test set, then should you assume in general the same model will do comparably on future data roughly like that in the test set. data in general.

True

In engineering, machine learning algorithm can sometimes achieve better design than the best effort of human design.

True

Is the following statement, in your best judgement, true or false: The same exact point in feature space may appear twice in the data matrix X with conflicting label values in y.

True

It is generally a good practice to store figures generated from Jupyter notebook in vector image format such as SVG, PDF instead of raster image format such as PNG, JPG. This is because vector-based graphics are more malleable than raster images. For example, vector images can be quickly and perfectly scaled. There is no upper or lower limit for resizing vector images.

True

kNN will do poorly if the test data is sampled from the same distribution as the training data, but outside the training data's range, i.e. the test data' value is either larger than the maximum value or smaller than the minimum value of the training dataset for every dimension, whilst linear regression may perform better in this scenario.

True

When using nearest neighbor classifier, is it applicable to unroll data (e.g. a 2D image) into a 1D vector?

Yes

Is it possible to make linearly inseparable data linearly separable in higher dimensions. e.g. a set of points can not be linearly separated in 2D dimension, but may be linearly separable though introducing more dimensions.

Yes, it is possible

foo = map(str, [1,'a',0.5]) bar = [s*2 for s in enumerate(foo)] What is the value of bar?

[(0, '1', 0, '1'), (1, 'a', 1, 'a'), (2, '0.5', 2, '0.5')]

x=[1] def foo(x): x = x.append(2) return x y=foo(x) After running the code cell above, what is the value of x ? what is the value of y ?

[1,2] None

After running the code cell below, import numpy as np x = np.array([1,2,3]) x[0]=2.5 What is the value of x?

[2, 2, 3]

import numpy as np x = np.arange(12) idx = np.array([[3,7],[4,5]]) What is the value of x[idx]?

array([[3, 7], [4, 5]])

import numpy as np y = np.arange(30) Please match the reshape code in the left to the return value of the reshape method. y.reshape(2,-1,5)

array([[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]], [[15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]])

import numpy as np y = np.arange(30) Please match the reshape code in the left to the return value of the reshape method. y.reshape(2,5,-1)

array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]], [[15, 16, 17], [18, 19, 20], [21, 22, 23], [24, 25, 26], [27, 28, 29]]])

import numpy as np y = np.arange(30) Please match the reshape code in the left to the return value of the reshape method. y.reshape(-1,2,3)

array([[[ 0, 1, 2], [ 3, 4, 5]], [[ 6, 7, 8], [ 9, 10, 11]], [[12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23]], [[24, 25, 26], [27, 28, 29]]])

In Jupyter Notebook, after running the code cell bar = 5 // 2 What is the type of bar? What is the value of bar?

int 2

As shown in plot above, a blue point (covered in pink shade) of the negative class are miss-classified by the previous hyperplane (yellow color) initially, the perception used the information from this blue point, made an update and formed a new hyperplane (brown color), now all dots are correctly classified. What is the sign of the cosine similarity (cosine of angle between vectors) between the blue point (the one misclassified) and the new (brown) normal to the hyperplane?

negative

What is a zero dimensional tensor?

scalar

foo = map(str, [1,'a',0.5]) What is the type of list(foo)[1] ?

str

import numpy as np x = np.linespace(0,10, 100) fig = plt.figure() plt.plot(x, np.sin(x), '-') fig= plt.figure() plt.plot(x, np.cos(x), '--') fig.savefig('my_figure.png') Run the cope snippet above in a Jupyter notebook cell. What image will be saved.

the cosine plot in dashed line style

Which version of the programming language will be used in CS345?

version 3.x

a, x, y and z are unit length vectors in space. the dot product of x and a is positive, the dot product of y and a is negative and the dot product of z and a is 0, which vector is more similar to vector a?

x

import numpy as np x=np.ones((4,4)) Select all the numpy array slicing statements below that will change x's value into array([[1., 1., 1., 1.], [1., 0., 0., 1.], [1., 0., 0., 1.], [1., 1., 1., 1.]])

x[1:3,1:3]=0 x[-3:-1,-3:-1]=0


Set pelajaran terkait

Honors Biology Final Study Guide

View Set

AP US History Note Cards - All 1600!

View Set

Early Hinduism and Early Buddhism

View Set

define and relate the terms electronegativity and polarity

View Set

Collaborative Practice and Care Coordination Across Settings

View Set

PHI2010-PHILOSOPHY AND YOU- Final

View Set