Machine Learnin'
The shape of a numpy array is (10,8), then the array is reshaped with .reshape(-1,5), what is the shape of the reshaped array?
(16,5)
Why is it important to examine your dataset as a first step in applying machine learning? (Select all that apply):
-See what type of cleaning or preprocessing still needs to be done -Gain insight on what machine learning model might be appropriate, if any -Get a sense for how difficult the problem might be -You might notice missing data
The dimension of a Numpy array is called
Rank
Training a model using categorically labelled data to predict labels for new data is known as __________.
Classification
Please select the example that falls into a classification machine learning problem:
Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
Please select the example that falls into a regression machine learning problem:
Given a picture of a person, we have to predict their age on the basis of the given picture.
Please select the real world problems that machine learning has already been applied (Select all apply):
Data security-e.g., identify malware. Financial trading-help to predict the stock markets Making Smart Cars Online Search Healthcare-Predict diseases Marketing Personalization-e.g., provide targeted ads
A Numpy array can have values of different types.
False
A data instance refers to a column in a two-dimension data table.
False
According to the course syllabus, one can work with others on a same jupyter notebook and submit the same notebook for credit.
False
According to the course syllabus, you can work with others on quizzes and exams.
False
In machine learning terminology, a 'label' represents the actual word(s) of a column.
False
In pandas, Dataframe is a 2-dimension data structure, and the types of all columns have to be the same.
False
In python, index starts at 1.
False
In python, you can change the value of an element in a tuple, for example: p=(1,2,3,4,5) p[1]=78
False
In x and y are numpy arrays, the expression of x*y computes the matrix multiplication of x and y.
False
K nearest neighbors often performs well on sparse data.
False
Regularization refers to standardize the input data.
False
The accuracy score on training data is 67%, and the accuracy score on test data is 64%. This is an overfitting.
False
The accuracy score on training data is 98%, and the accuracy score on test data is 97.5%. This is an overfitting.
False
Please select the examples that are supervised learning problems (select all that apply):
Given stock prices of last month, we need to predict the stock price for tomorrow. Given the attitudes of American people toward Donald Trump and Hillary Clinton expressed in their twitter tweets during election campaign, we need to predict who will be the president. Given a picture of a person, we have to predict their age on the basis of the given picture.
Training a model using labelled data where the labels are continuous quantities to predict labels for new data is known as __________.
Regression
Generalization ability refers to an algorithm's ability to accurately predict for new, previously unseen data.
True
In Python, the extent of a code block is defined by the indentation level (usually a tab or say four white spaces).
True
In pandas, dataframes can be sorted by either index or columns.
True
In pandas, series is a one dimension data structure.
True
In pandas, the difference between .loc[] and .iloc[] is that .loc[] uses index while .iloc[] uses absolute integer position starting from 0.
True
In the course syllabus, weekly discussion assignment includes: each student should create a new post to share a python code with example and reply to a post by others to correct errors or add a new example.
True
Python is a dynamically typed language, we do not need to specify the type of a variable when we create one.
True
Ridge regression and Lasso regression are different from ordinary least square regression in that they restrict model complexity.
True
The regularization parameters for ridge and lasso regression is alpha, with larger alpha, the models are more restricted, that is less complex.
True
To fit your data a sklearn supervised learning classifier, the right step is: 1. import the classifier. 2. create the classifier object. 3. fit your data, with fit() method.
True
To save a pandas dataframe into a csv file, we can use .to_csv(), and specify the file name and directory in the bracket with qutoes.
True
With matplotlib, you can control every element in a figure.
True
With matplotlib, you can create 3D graphics.
True
With matplotlib, you can create a figure with multiple lines.
True
Modeling the features of an unlabeled dataset to find hidden structure is known as ____________.
Unsupervised Learning
Training a model using labeled data and using this model to predict the labels for new data is known as ____________.
Supervised Learning
The key purpose of splitting the dataset into training and test sets is:
To evaluate how well the learned model will generalize to new data
%matplotlib inline configures matplotlib to show figures embedded in the jupyter notebook.
True
A Numpy array can be any dimension, such as 2 dimension, 3 dimension, 10 dimension, etc.
True
According to the syllabus, there is required reading for each week.
True
What is the output of following python codes: p=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] x=p[1:10:2] print(x)
[2, 4, 6, 8, 10]
df is a pandas dataframe, what does df.mean(axis=1) do?
compute the mean of each row
df is a pandas dataframe, what does df.apply(numpy.cos) do?
convert all values with numpy function cos()
What is the output of following python codes: s = "Hello world" print(s[1:7])
ello w
With matplotlib, to create an 900x600 pixel, 300 dots-per-inch figure, we can do:
fig = plt.figure(figsize=(3,2), dpi=300)
To use a module in python, we need to use the _______ statement.
import
Check the invalid variable names in Python that you cannot define yourself (multiple answers):
lambda, global, assert, break