Introduction to Data Analysis

Ace your homework & exams now with Quizwiz!

What will the following code print out? import pandas as pd import numpy as np s = pd.Series([np.nan, 1, 2, np.nan, 3]) s = s.fillna(method='ffill') print(s)

0 NaN 1 1.0 2 2.0 3 2.0 4 3.0 dtype: float64

About how much memory does the integer 5 consume in plain Python?

20 bytes

What will the following code print out? import pandas as pd import numpy as np s = pd.Series(['a', 3, np.nan, 1, np.nan]) print(s.notnull().sum())

3

Given a file named certificates.csv with these contents: Name$Certificates$Time (in months) Tom$8$16 Kris$2$5 Ahmad$5$9 Beau$6$12 Fill in the blanks for the missing arguments below: import csv with open(__A__, 'r') as fp: reader = csv.reader(fp, delimiter=__B__) next(reader) for index, values in enumerate(reader): name, certs_num, months_num = values print(f"{name} earned {__C__} certificates in {months_num} mon

A: 'certificates.csv' B: '$' C: certs_num

What does the loc method allow you to do?

Access a group of rows and columns by supplying label(s) arguments.

What kind of data can you import and work with in a Jupyter Notebook? Excel files. CSV files. XML files. Data from an API. All of the above.

All of the above

What is not allowed in a Jupyter Notebook's cell?

An Excel sheet

What will the following code print out? import pandas as pd certificates_earned = pd.DataFrame({ 'Certificates': [8, 2, 5, 6], 'Time (in months)': [16, 5, 9, 12] }) certificates_earned.index = ['Tom', 'Kris', 'Ahmad', 'Beau'] print(certificates_earned.iloc[2])

Certificates 5 Time (in months) 9 Name: Ahmad, dtype: int64

What will the following code print out? import pandas as pd certificates_earned = pd.DataFrame({ 'Certificates': [8, 2, 5, 6], 'Time (in months)': [16, 5, 9, 12] }) names = ['Tom', 'Kris', 'Ahmad', 'Beau'] certificates_earned.index = names longest_streak = pd.Series([13, 11, 9, 7], index=names) certificates_earned['Longest streak'] = longest_streak print(certificates_earned)

Certificates Time (in months) Longest streak Tom 8 16 13 Kris 2 5 11 Ahmad 5 9 9 Beau 6 12 7

What are the three main types of Jupyter Notebook Cell?

Code, Markdown, and Raw

When using Matplotlib's global API, what does the order of numbers mean here? plt.subplot(1, 2, 1)

My figure will have one row, two columns, and I am going to start drawing in the first (left) plot.

Why is Numpy an important, but unpopular Python library?

Often you won't work directly with Numpy

What Python library has the .read_html() method we can use for parsing HTML documents and extracting tables?

Pandas

Which of the following is not part of Data Analysis? -Building statistical models and data visualizations. -Picking a desired conclusion for the analysis. -Fixing incorrect values and removing invalid data. -Transforming data into an appropriate data structure.

Picking a desired conclusion for the analysis.

What is the relationship between size of objects (such as lists and datatypes) in memory in Python's standard library and the NumPy library? Knowing this, what are the implications for performance?

Standard Python objects take up more memory than NumPy objects; operations on NumPy objects complete very quickly compared to comparable objects in standard Python.

What method does a Cursor instance have and what does it allow?

The Cursor instance has an .execute() method which will receive SQL parameters to run against the database.

What will the following code print out? import pandas as pd certificates_earned = pd.Series( [8, 2, 5, 6], index=['Tom', 'Kris', 'Ahmad', 'Handsome'] ) print(certificates_earned[certificates_earned > 5])

Tom 8 Beau 6 dtype: int64

What will the following code print out? import pandas as pd certificates_earned = pd.Series( [8, 2, 5, 6], index=['Tom', 'Kris', 'Ahmad', 'Handsome'] ) print(certificates_earned)

Tom 8 Chris 2 Ahmad 5 Handsome 6 dtype: int64

What is the main difference between lists and tuples in Python?

Tuples are immutable.

How do we define blocks of code in the body of functions in Python?

We use indentation, usually right-aligned 4 spaces.

What is the value of a after you run the following code? a = np.arange(5) a + 20

[0, 1, 2, 3, 4]

What will the following code print out? a = np.arange(5) print(a <= 3)

[True, True, True, True, False]

What will the following code print out? A = np.array([ ['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'] ]) print(A[:, :2])

[['a' 'b'] ['d' 'e'] ['g' 'h']]

What code would add a "Certificates per month" column to the certificates_earned DataFrame like the one below? Certificates Time (in months) Certificates per month Tom 8 16 0.50 Kris 2 5 0.40 Ahmad 5 9 0.56 Beau 6 12 0.50

certificates_earned['Certificates per month'] = round( certificates_earned['Certificates'] / certificates_earned['Time (in months)'], 2 )

The Python method .duplicated() returns a boolean Series for your DataFrame. True is the return value for rows that:

contain a duplicate, where the value for the row is at least the second occurrence of that value.

How would you iterate over and print the keys and values of a dictionary named user?

for key, value in user.items(): print(key, value)

How would you import the CSV file data.csv and store it in a DataFrame using the Pandas module?

import pandas as pd df = pd.read_csv("data.csv")


Related study sets

Networking Essentials Chapter 9 Quiz

View Set

Assessment in the Classroom (Professional Knowledge)

View Set

Module 9: Interprofessional Collaborative teamwork

View Set

Conduit and Panel Boards Mid Term

View Set

AQA Enzymes | Maths and Physics Tutor

View Set