01:198:439 Python and Pandas
When was Python released?
1991
Consider the function: def foo(n) {return lambda a: a**n}. What is the output of (foo(5))(2)?
2^5 = 32
What is the output of the following python code? for x in range(3,10): print(x);
3 4 5 6 7 8 9
How many bytes is an Int32 pandas column?
4 bytes
How many bytes is a Float64 pandas column?
8 bytes
How many bytes is an Object pandas column?
8 bytes
True or false. All elements in a series must be the same.
False. A series could contain strings, integers, etc... at the same time
True or false. Python is slower than Java.
True. Because Python doesn't have data types, they tend to run slower and use more memory.
Which of the following statement is/are true about Python? Choose all that apply. a) Python is an interpreted language b) Python data variables have types such as int c) Python is easy to learn d) Python is a preferred language for web application builders e) Python interpreter is written in C
a) Python is an interpreted language c) Python is easy to learn d) Python is a preferred language for web application builders e) Python interpreter is written in C
consider the following dataframe df that contains election results data = {'Candidate': ['Obama', 'McCain', 'Obama', 'Romney', 'Clinton', 'Trump'], 'Party': ['D', 'R', 'D', 'R', 'D', 'R'], '%':[52.9, 45.7, 51.1, 47.2, 48.2, 46.1], "Year":[2008, 2008, 2012, 2012, 2016, 2016], "Result":['W', 'L', 'W', 'L', 'L', 'W'], } df = pd.DataFrame.from_dict(data) What is the output of the following code ? Choose all that apply df.loc[(df['Result'] == 'win') & (df['%'] < 50), 'Candidate': '%'] a) a DataFrame that contains the presidents who won with less than 50% vote b) a panda series that contains presidents who won with less than 50% of vote c) a list of presidents who won with less than 50% vote d) A data frame that contains 3 columns
a) a DataFrame that contains the presidents who won with less than 50% vote d) A data frame that contains 3 columns
Which of the following is/are true about pandas? Choose all that apply. a) pandas is an open source library b) pandas is used for data analysis c) pandas is great for data munging d) using pandas is less efficient than using python
a) pandas is an open source library b) pandas is used for data analysis c) pandas is great for data munging
When would Java/C be used for data science?
building big data systems
Consider a DataFrame df that has columns labeled [quizzes, midterm, finals]. You would like to create a new DataFrame that contains only the rows where final scores are > 50. Which of the following code can be used to find the answer? a) df ['finals'] >15 b) df.loc['finals] > 50] c) df.loc[(df ['finals'] > 50)]
c) df.loc[(df ['finals'] > 50)]
Assume that DataFrame df has 100000 rows and 10 columns with the following types of blocks/columns, 5 Int32Blocks, 3 float64Blocks and 2 objectBlocks (assume that each object block is 8 bytes). What is the most likely memory utilization (in bytes) by this panda DataFrame? (note this is only an approximation based on what data is being used) a) ~ 5 MB b) ~ 2 MB c) ~ 6 MB d) ~ 4 MB
c) ~ 6 MB
When would MATLAB be used for data science?
fast and efficient matrix operations
What does it mean when Python is a type free language?
interpreted language b/c it has dynamic type systems (do not have to worry about type)
Does range(n) include n?
no
What's the difference between data structures in numpy vs pandas?
numpy: - low level data structure (np.array) - large dimensional arrays/matrices pandas: - high level data structures (dataframe) - tabular data
Who uses Python for data science?
programmers
Who uses R for data science?
statisticians
What does range(17,100,2) return?
all odd numbers between 17 and 99
What are lambda functions?
an inline function (not stored)
Is memory management manual or automatic in Python?
automatic
How do you identify bugs in your code? Choose all that apply. a) I ask a friend b) I write some assert statements c) I use PyChecker d) I run the code with thousands of data samples
b) I write some assert statements c) I use PyChecker
Consider the following figure where DataFrame df on the left contains two columns data frame with keys and values. df: A | 3 B | 1 c | 4 A | 1 B | 5 C | 9 A | 2 D | 5 B | 6 -> A | 6 B | 12 C | 13 D | 5 What code might be producing the results on the right with 4 rows and two columns? Choose all that apply a) all of this b) df.groupby('key').agg(sum) c) df.groupby('key').sum() d) df.sum().groupby('key')
b) df.groupby('key').agg(sum) c) df.groupby('key').sum()
Consider the following code where people is a DataFrame and name and Color are two columns. Which of the following must be true about this code? Choose all that apply. grps = grps = people.groupby('Color')['name'] a) grps is a groupby DataFrame b) grps is a Groupby Series c) creates groups with Color as a key d) creates groups with Color as a value e) the total number of groups is equal to number of unique colors in the DataFrame
b) grps is a Groupby Series c) creates groups with Color as a key e) the total number of groups is equal to number of unique colors in the DataFrame