01:198:439 Python and Pandas

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

When was Python released?

1991

Consider the function: def foo(n) {return lambda a: a**n}. What is the output of (foo(5))(2)?

2^5 = 32

What is the output of the following python code? for x in range(3,10): print(x);

3 4 5 6 7 8 9

How many bytes is an Int32 pandas column?

4 bytes

How many bytes is a Float64 pandas column?

8 bytes

How many bytes is an Object pandas column?

8 bytes

True or false. All elements in a series must be the same.

False. A series could contain strings, integers, etc... at the same time

True or false. Python is slower than Java.

True. Because Python doesn't have data types, they tend to run slower and use more memory.

Which of the following statement is/are true about Python? Choose all that apply. a) Python is an interpreted language b) Python data variables have types such as int c) Python is easy to learn d) Python is a preferred language for web application builders e) Python interpreter is written in C

a) Python is an interpreted language c) Python is easy to learn d) Python is a preferred language for web application builders e) Python interpreter is written in C

consider the following dataframe df that contains election results data = {'Candidate': ['Obama', 'McCain', 'Obama', 'Romney', 'Clinton', 'Trump'], 'Party': ['D', 'R', 'D', 'R', 'D', 'R'], '%':[52.9, 45.7, 51.1, 47.2, 48.2, 46.1], "Year":[2008, 2008, 2012, 2012, 2016, 2016], "Result":['W', 'L', 'W', 'L', 'L', 'W'], } df = pd.DataFrame.from_dict(data) What is the output of the following code ? Choose all that apply df.loc[(df['Result'] == 'win') & (df['%'] < 50), 'Candidate': '%'] a) a DataFrame that contains the presidents who won with less than 50% vote b) a panda series that contains presidents who won with less than 50% of vote c) a list of presidents who won with less than 50% vote d) A data frame that contains 3 columns

a) a DataFrame that contains the presidents who won with less than 50% vote d) A data frame that contains 3 columns

Which of the following is/are true about pandas? Choose all that apply. a) pandas is an open source library b) pandas is used for data analysis c) pandas is great for data munging d) using pandas is less efficient than using python

a) pandas is an open source library b) pandas is used for data analysis c) pandas is great for data munging

When would Java/C be used for data science?

building big data systems

Consider a DataFrame df that has columns labeled [quizzes, midterm, finals]. You would like to create a new DataFrame that contains only the rows where final scores are > 50. Which of the following code can be used to find the answer? a) df ['finals'] >15 b) df.loc['finals] > 50] c) df.loc[(df ['finals'] > 50)]

c) df.loc[(df ['finals'] > 50)]

Assume that DataFrame df has 100000 rows and 10 columns with the following types of blocks/columns, 5 Int32Blocks, 3 float64Blocks and 2 objectBlocks (assume that each object block is 8 bytes). What is the most likely memory utilization (in bytes) by this panda DataFrame? (note this is only an approximation based on what data is being used) a) ~ 5 MB b) ~ 2 MB c) ~ 6 MB d) ~ 4 MB

c) ~ 6 MB

When would MATLAB be used for data science?

fast and efficient matrix operations

What does it mean when Python is a type free language?

interpreted language b/c it has dynamic type systems (do not have to worry about type)

Does range(n) include n?

no

What's the difference between data structures in numpy vs pandas?

numpy: - low level data structure (np.array) - large dimensional arrays/matrices pandas: - high level data structures (dataframe) - tabular data

Who uses Python for data science?

programmers

Who uses R for data science?

statisticians

What does range(17,100,2) return?

all odd numbers between 17 and 99

What are lambda functions?

an inline function (not stored)

Is memory management manual or automatic in Python?

automatic

How do you identify bugs in your code? Choose all that apply. a) I ask a friend b) I write some assert statements c) I use PyChecker d) I run the code with thousands of data samples

b) I write some assert statements c) I use PyChecker

Consider the following figure where DataFrame df on the left contains two columns data frame with keys and values. df: A | 3 B | 1 c | 4 A | 1 B | 5 C | 9 A | 2 D | 5 B | 6 -> A | 6 B | 12 C | 13 D | 5 What code might be producing the results on the right with 4 rows and two columns? Choose all that apply a) all of this b) df.groupby('key').agg(sum) c) df.groupby('key').sum() d) df.sum().groupby('key')

b) df.groupby('key').agg(sum) c) df.groupby('key').sum()

Consider the following code where people is a DataFrame and name and Color are two columns. Which of the following must be true about this code? Choose all that apply. grps = grps = people.groupby('Color')['name'] a) grps is a groupby DataFrame b) grps is a Groupby Series c) creates groups with Color as a key d) creates groups with Color as a value e) the total number of groups is equal to number of unique colors in the DataFrame

b) grps is a Groupby Series c) creates groups with Color as a key e) the total number of groups is equal to number of unique colors in the DataFrame


Kaugnay na mga set ng pag-aaral

introduction macroeconomics ch 16

View Set

Chapter 7: The Flow of Food: Storage

View Set

Ch. 13: Personal Selling and Sales Promotion

View Set

Sociology Chapter 3: Doing Sociological Research

View Set

Economics Unit 1 (Complementary & Substitute Goods)

View Set

Exam 2 Adult Health, Immune, Musculoskeletal, Neurological Disorders. final

View Set

Changes in family roles and relationships

View Set