AI Exam 1
not equal to
!=
less than
<
Within a for loop, which line of code would you use to increase the number within a variable?
None of these answers are correct. NOT: add(count, '1') number = count + '1' count + '1' + '1' count
Suppose you wanted to escape an otherwise infinitely repeating loop or function. How would you do that?
Stop the kernel
Imagine we have a data frame, df. What would be the purpose for running code like the below? df.loc[1]
To look for, retrieve a value from df
For pandas to work, data must be formatted as lists before it is imported
false
pandas can be used to join two data frames together
true
pandas has functionality to work with complicated dates
true
First create a simple data frame using the below code.... Questions: If you ran: df = df.dropna() df How many rows would remain in the data set
3
When nesting if statements, how many spaces do you need to ensure proper indentation? Please type in the number of spaces you need below.
4
Imagine we have a pandas dataframe we have named 'df'. The dataframe consists of 2 columns. "col1" is 30 values long and is a random mix of the letters 'a' 'b' and 'c' "num1" is also 30 values long, and is a random set of numerical data (all integers) Which of the following would give you the mean of the numerical (num1) column, grouped by the values from column "col1"?
df.groupby('col1').mean()
Assume all packages are loaded that need to be to make the code run successfully. Assume the test data ('test.csv') is loaded into your environment and named 'df'. So something like df = pd.read_csv("../data/test.csv") What would produce the result below? *** picture shows list with 0-5 *** you pick df.head with ONE PLUS however many numbers is showed on problem
df.head(6)
Assume we have all packages in place we need. Assume all spaces etc are correct. Canvas sometimes shows strange gaps etc. Lets download the "homes.csv" file located in the Canvas file folder under the data tab. What code do we need to import the dataframe? Use the code we have been using from the notebooks. There are many ways to do this, but the demonstrated approach is the most popular convention. So make sure you type that in. homes = __________
pd.read_csv("homes.csv")
Coach Napier is trying to count his total wins for the 2022 season. Which for loop function will help him do so and produce the following output: Coach Napier's Record at Florida 1 loss 2 win 3 win 4 win 5 loss Games played = 5 Wins to date = 3
print("Coach Napier's Record at Florida") games_played = 0 games_won = ["Loss", "Win", "Win", "Win", "Loss"] for N in games_won : games_played = games_played + 1 print(games_played, N) print("Games played = ", games_played) print("Wins to date = ", games_won.count("Win"))
Pandas can be imported as import pandas as pd
true
Usually a programmer will use conventional names when importing packages. But it is not strictly necessary. Numpy for example can be imported as: import numpy as humpty_dumpty
true
When importing data from a local drive, the relative path was defined as the path FROM where your code in your current working directory is, TO where your data is
true
datasets to be joined generally need something in common, like a customer ID. the relationship doesnt need to be 1 to 1
true
numPy allows us to do more complicated math on lists and other data structures, and is used in most of the more advanced modules we will use (such as pandas)
true
What is the output of the following code? y = -0 if y >= 0: print('0 or more') else: print('less than 0')
0 or more
imagine we create a pandas series using the below code. what is one simple way to retrieve the value 0.5 from the series? import pandas as pd df = pd.Series([0.75, 0.25, 0.50, 1.0], index=['a', 'b', 'c', 'd']) df
df['c']
What package have we been using to import our data, and what is the abbreviation (as...) we have been using? Import _____ as _______
pandas; pd
Pandas allows us to use multiple different data types (like objects and numbers) in a single table.
true
Look at the below code carefully. It is not at all uncommon to see errors of omission in code chunks like this. How can you fix the below so that it produces the output 'array([50, 50, 100])' import numpy as np list1 = [5, 5, 5] list_2 = [10, 10, 20] np_list1 = np.array(list1) np_list2 = np.array(list2) np_list1*np_list2
rewrite list_2 to list2
imagine you have a dataframe called 'tickets', with 4 columns: ('name', 'address', 'parking_spot', 'number_of_tickets') If you wanted to subset out 2 columns, what code could you use
tickets[['name', 'number_of_tickets']] tickets.loc[:,['name', 'number_of_tickets']]
What is the shape of the following numPy array? np.random.seed(1955) x = np.random.randn(2, 2, 2, 2) print(x.shape) x
(2, 2, 2, 2)
Missing values can be imputed/replaced with other values. If my dataset has 1000 rows, and 200 missing values for the category age. What could I impute for age? (This question is not asking which of values you SHOULD use. Just what you COULD use)
-impute the median -impute the most common value -impute the mean
In the titanic dataset we used in the videos: I discussed the cabin fare for the titanic, and how some values were really really big. I mention that is not necessarily a mistake, the fare could in fact be that high and be distributed widely. This is different than if you see outliers that can't exist (like negative 100 for age). Nevertheless if we WANTED to fix fare we could do one of a 2 things, demonstrated in the video
-use some code to replace the outlier with the mean of the values -use some code to replace the outlier with the median of the values
Let's take a look at the homes dataset. What is the mean Acreage of the homes? How many records (just the count of rows) does the dataset have? What is the mean TotalHeatedSqFt in this dataset? What is the mean TotalBedrooms? What is the mean Total Bathrooms?
.267 7478 2708 3.98 2.9
what is the mean of the column "A" in this data frame generated below? import numpy as np rng = np.random.default_rng(7685623456987365) #create a dataframe using these random values! df = pd.DataFrame(rng.integers(0,100,size=(15,4)), columns=list('ABCD'))
50.33
equal to
==
greater than
>
greater than or equal to
>=
Sometimes when working with (struggling with!) missing values, you find that it is not missing at all! Sometimes someone has been helpful (!) and entered some placeholder value like "THISISMISSING" when that happens, what could you do (choose the best answer, it won't be the ONLY possible answer, just the best one here)
Replace "THISISMISSING" with a missing value (np.NaN)
what was the purpose of np.array in the below code? a = [1, 2, 3, 4, 5] b = [2, 3, 4 , 5, 6] np_a = np.array(a) np_b = np.array(b)
convert the lists 'a' and 'b' to a NumPy array
University of Florida President Kent Fuchs wanted to define a function to count the student enrollment of the top three colleges: Liberal Arts, Engineering, and Business. What two parts are MISSING in his code? His code looks like this: MISSING college_count(liberal_arts, engineering, business): enrollment = liberal_arts + engineering + business MISSING enrollment Note that the function doesn't do anything yet... there are no values for the 3 variables.
def and return
Outliers are common in some types of variables, an example discussed was the income variable in an online survey. Imagine you hav conducted a survey on shopping habits, and receive 1000 responses. One of your variables is a question on income. The vast majority of people responded with an income of 50k-200k per year. 5 individuals respond with an income in the billions. What should you do?
impute/override/fix that value using a mean or median
Now that we have the homes dataset loaded, lets explore a little bit. What are 3 of the ways we have explored a dataset in the course videos? I don't do each of these every time, but each of these you have seen me run many times.
.head() .info() .describe()
less than or equal to
<=
Price is a variable we are interested in building a model on (later, once we've learned that stuff) that makes missing values and outliers particularly important to address. If price has an outlier variable that is really really extreme, what should we do with it
Delete those rows
We are learning a lot from our exploration of the homes dataset. Let's take a look at price, this is listed as "LastSalePrice". What strikes you as weird about the price variable in this dataset? (this is a non technical way of asking you what the outlier is) (choose the best answer)
Many homes appear to have been sold for 100
Steve Jobs wrote some Python code to determine if Apple met its quota for the Apple Newton. What does his code below output? quota = 10000000 newton = 500000 print("Welcome to Mac") if newton < quota: print("Apple did not meet its quota of", quota, "Newtons.") print("Thank you for using Mac")
Welcome to Mac Apple did not meet its quota of 10000000 newtons. Thank you for using Mac
What is the output of the following code? Import numpy as np list1 = [5, 5, 5] list2 = [10, 10, 10] np_list1 = np.array(list1) np_list 2 = np.array(list2) np_list/np_list2
array([0.5, 0.5, 0.5])
what would be returned by the following code? assume this is the only code in the workbook, nothing else is loaded or present. import pandas as pd today = datetime.datetime.now() print(now)
an error
A while loop is most closely related to which other type of function or statement?
for loop