ALL business analytics Q&A

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is the shape of the following numPy array? np.random.seed(1955) x = np.random.randn(2, 2, 2, 2) print(x.shape) x #Hint: I have not loaded the necessary package here...but you should (load pandas, import numpy)

(2, 2, 2, 2)

Now that we have the homes dataset loaded, let's explore a little bit. What are 3 of the ways we have explored a dataset in the course videos? I don't do each of these every time, but each of these you have seen me run many times.

.info() .describe() .head()

What is the output of the following code? У = -0 if y >= 0: print('0 or more') else: print ('less than 0')

0 or more

First create a simple dataframe using the below code. import pandas as pd # create a list of lists data = [['A1', 2, 4, 8], ['A2', 3, 7, 17], ['A3', 1, None, 7], ['A4', 989, 186, 3698], ['A5', 0, 0 ,None]] # Create the pandas DataFrame df= pd.DataFrame(data, columns=['ID', 'Value 1', 'Value 2','Value 3']) df If you ran: df = df.dropna() df How many rows would remain in the dataframe?

3

What is the mean of the column "A" in this DataFrame generated below? (choose the closest value) (you may need to import additional packages to run the below code!) import numpy as np rng = np.random.default_rng(768561456987365) #create a dataframe using those random values! df = pd.DataFrame(rng.integers(0,100,size=(15, 4)), columns=list('ABCD'))

58.13

greater than or equal to less than not equal to greater than equal to less than or equal to

>= < != > == <=

What is the purpose of np.array in the below code? a = [6.1, 5.8, 5.97, 5.43, 7.34, 8.67, 6.55, 3.66, 2.31, 6.84] b = [2.5, 3.19, 2.26, 3.17, 8.17, 2.76, 5.22, 9.82, 3.95, 8.38] np_a = np.array(a) np_b = np.array(b)

Convert the lists 'a' and 'b' to a NumPy array

Price is a variable we are interested in building a model on (later, once we've learned that stuff) that makes missing values and outliers particularly important to address. If price has an outlier variable that is really really extreme, what should we do with it? (the choices I am offering you below are very narrow. There is obviously more we could do... but given what you see in the dataset, and what I have said before about this issue... what would you do???)

Delete those rows

Matplotlib is built on top of seaborn (uses seaborn code)

FALSE

for pandas to work, data must be formatted as lists before it is imported

FALSE

Missing values can be imputed/replaced with other values. If my dataset has 1000 rows, and 200 missing values for the category age. What could I impute for age? (This question is not asking which of values you SHOULD use. Just what you COULD use)

Impute the mean Impute the most common value Impute the median

Outliers are common in some types of variables, an example discussed was the income variable in an online survey. Imagine you have conducted a survey on shopping habits, and receive 1,000 responses. One of your variables is a question on income. The vast majority of people respond with an income of 50k-200k per year. 5 individuals respond with an income in the billions. What should you do?

Impute/overide/fix that value using a mean or median

Within a for loop, which line of code would you use to increase the number within a variable?

NONE OF THESE: count + '1' + '1' count number = count + '1' add(count, '1')

Sometimes when working with (struggling with!) missing values, you find that it is not missing at all! Sometimes someone has been helpful (!) and entered some placeholder value like "THISISMISSING" when that happens, what could you do (choose the best answer, it won't be the ONLY possible answer, just the best one here)

Replace "THISISMISSING" with a missing value (np.NaN)

How many outputs will this following code have? mosquito = 1 while mosquito > 0: print (mosquito) mosquito = mosquito + 1 print (mosquito)

There are infinite number of mosquitos

Imagine we have a dataframe, df. What would be the purpose for running code like the one below? (why would we run it?) df.loc[1]

To look for, and retrieve a value from df

Datasets to be joined generally need something in common, like a customer ID. The relationship does not need to be 1 to 1. (eg. Customer ID 75883 may occur once in the first dataset, and many times in the second dataset

True

Heatmaps can be used to quickly understand correlated variables in a dataset

True

Pandas can be used to join two data frames together

True

When creating a chart using seaborn, it is possible to make formatting changes to the chart using matplotlib code.

True

When importing data from a local drive, the relative path was defined as the path FROM where your code in your current working directory is, TO where your data is.

True

When using matplotlib, if the color is the only part of the format string, you can use any matplotlib colors spec (eg. full names like "red") or hex strings

True

In the titanic dataset we used in the videos: I discussed the cabin fare for the titanic, and how some values were really really big. I mention that is not necessarily a mistake, the fare could in fact be that high and be distributed this widely. This is different than if you see outliers that can't exist (like negative 100 for age). Nevertheless, if we WANTED to fix fare, and remove the outlier fare we could do one of a 2 things, demonstrated in the video.

Use some code to replace the outlier with the mean of the values OR Use some code to replace the outlier with the median of the values

What would be returned by the following code? Assume this is the only code in the workbook, nothing else is loaded or present. import pandas as pd today = datetime.datetime.now() print(now)

an error

What is the output of the following code? import numpy as np list1 = [5, 5, 5] list2 = [10, 10, 10] np_list1 = np.array(list1) np_list2 = np.array(list2) np_list1/np_list2

array([0.5, 0.5, 0.5])

Look at the below code carefully. It is not at all uncommon to see errors of omission in code chunks like this. How can you fix the below so that it produces the output 'array([50, 50, 100])' import numpy as np list1 = [5,5,5] list2 = [10,10,20] np_list1 = np.array(list1) np_list2 = np.array(list2) np_list1 is_and np_list2

change "np_list1 is_and np_list2" to "np_list1*np_list2"

University of Florida President Kent Fuchs wanted to define a function to count the student enrollment of the top three colleges: Liberal Arts, Engineering, and Business. What two parts are MISSING in his code? His code looks like this: MISSING college_count(liberal_arts, engineering, business): enrollment = liberal_arts + engineering + business MISSING enrollment

def and return

Imagine we have a pandas dataframe we have named 'df'. The dataframe consists of 2 columns. "col1" is 30 values long, and is a random mix of the letters 'a', 'b', and 'c'. "num1" is also 30 values long, and is a random set of numerical data (all integers). Which of the following would give you the mean of the numerical (num1) column, grouped by the values from column "col1"?

df.groupby('col1').mean()

Assume all packages are loaded that need to be to make the code run successfully. Assume the test data ('test.csv') is loaded into your environment and named 'df'. So something like df = pd.read_csv("../data/test.csv") What would produce the below result? (First 6 rows, starting from 0-5)

df.head(6) starts counting at zero! remember this

Imagine we create a pandas series using the below code. What is one simple way to retrieve the value 0.5 from the series? import pandas as pd df = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd']) df

df['b']

Assume we have all packages in place we need. Assume all spaces etc are correct. Canvas sometimes shows strange gaps etc. Let's download the "homes.csv" file located in the Canvas file folder under the data tab. What code do we need to import that dataframe? Use the code we have been using from the notebooks. There are many ways to do this, but the demonstrated approach is the most popular convention. So make sure you type that in. Assume the data file is in the same folder and location as your code. In other words you do not need to create a relative path or any path for that matter. You just need the name of the file. We will not use a path variable. Just the name of the file. Note that in week 3 we did this with the "segments.csv" dataset, so you could refer to that as an example.

homes = pd.read_csv("homes.csv")

Company is a list containing 4 strings, sales is a list with 4 integers. Which of these code snippets would create a bar chart?

import matplotlib.pyplot as plt plt.bar(company ,sales, color='grey')

What package have we been using to import our data, and what is the abbreviation (as..) we have been using? This would look something like the below in a line of code. Note I am not asking what would work... but rather what has been demonstrated in the class notebooks. import PACKAGE as ABBREVIATION

import pandas as pd

What is the purpose of the code below? %matplotlib inline

make the plots show up inline

Will the following nested if statement run? If not, why? y = -0 if y > 0 if y > 5 print('higher') if y <= 5 print('lower') if y <=0 print ('0 or less')

no, syntax error

Coach Napier is trying to count his total wins for the 2022 season. Which for loop function will help him do so and produce the following output: Coach Napier's Record at Florida 1 Loss 2 Win 3 Win 4 Win 5 Loss Games played = 5 Wins to date = 3

print( "Coach Napier's Record at Florida") games_played = 0 games_won = ["Loss", "Win", "Win", "Win", "Loss"] for N in games_won: games_played = games_played + 1 print (games_played, N) print("Games played = games_played) print Wins to date = ". • games_won.count "Win"))

When working with a pandas dataframe, what is one advantage seaborn has over using native matplotlib to visualize two of the columns. Note the question does not ask about using matplotlib in pandas.

seaborn can use columns from pandas. Matplotlib requires additional formatting of data.

Assuming this is a complete code chunk, and we expect to see output printed after running this, why is the below code incorrect? (chose the best answer, it may not be a great answer!) if tom brady == the goat: [TAB] print("The Bucs just won another Super Bowl")

the variables are not defined

Imagine you have a dataframe, called 'tickets', with 4 columns: ('name', 'address', 'parking_spot', 'number_of_tickets') If you wanted to subset out 2 columns, what code could you use (choose all that apply) (By subsets, I mean show just 2 of the 4 columns, not the entire dataframe

tickets.loc[:,['name', 'number_of_tickets']] OR tickets[['name', 'number_of_tickets']]

Usually a programmer will use conventional names when importing packages. But it is not strictly necessary. numpy for example can be imported as: import numpy as humpty_dumpty

true

numPy allows us to do more complicated math on lists and other data structures, and is used in most of the more advanced modules we will use (such as pandas)

true

pandas allows us to use multiple different data types (like objects and numbers) in a single table.

true

pandas can be imported as import pandas as pd

true

pandas has functionality to work with complicated dates.

true

Which is the correct IF statement to determine if you're accelerating, decelerating, or staying at constant velocity?

x= -0.4 if x == 0: [TAB]print( "you're cruising") if x > 0: [TAB]print("you feel the need, the need for speed!") if x < 0: [TAB]print("you're losing speed!")


Conjuntos de estudio relacionados

#19 À quelle heure? page 8 NTMS Czora Morgan French

View Set

Digital & Content Marketing Practice Questions Exam 2

View Set

Agents to control blood glucose levels prepu

View Set

Chapter 66: Shock, Sepsis, and Multiple Organ Dysfunction Syndrome

View Set

Chapter 7 - Receiving, Storage, and Inventory

View Set

Nurse 425 Case study & practice questions

View Set