Coding Fundamentals with Python Final

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

parent or base class

the previously defined class from which the new class inherits attributes and methods

kind = 'box'

to create a box plot

sharey = True

to ensure plots share the same scale for the x-axis

.keys()

to list all the keys of a dictionary

.values()

to list all the values of a dictionary

.items()

to list bth the keys and values of a dictionary

You are given two DataFrames, alpha and beta. you run the following piece of code to join them: omega = pd.merge(alpha, beta, on = 'theta', how = 'right') the omega data frame will contain all rows in beta and only those rows in alpha that have matching key values

true

by default, the .describe() method of a dataframe only returns summary statistics for the numeric columns in a dataframe

true

the pandas plot() method is an abstraction of some of the functions and methods of the matplotlib package

true

the while loop is very similar to a conditional statement because it is made up of a condition and a response

true

when collecting data, the absence of existing data on certain subpopulations can lead to bias in ground truth data

true

try-except (example)

try: 1/0 print(a) except NameError: print('The variable is not defined!') except: print('You can't divide by 0!')

(parenthese)

tuple

if we try to create a NumPy array from a list with elements of different data types, python will convert all the elements to a single data type. what is this process called?

upcasting

combining strings

use addition (+) operator

extracting n-th character in a string

use square brackets ([ ])

.agg()

used to apply multiple aggregation functions to one or more columns in a dataframe or to apply different aggregations to different columns at once

assignment operator (=)

used to create a variable

del()

used to delete a variable

.format()

used to display a message that includes information stored in a single variable or several variables

try-except statement

used to handle exceptions tat occur during code execution

for loop

used to iterate over the items of a sequence or container

operators

used to perform calculations in python

.rename(columns = {"" : ""}

used to rename columns

xlim and ylim

used to zoom in to a certain part of the plot in order to get a closer look at the data

triple quotes

used when strings go across multiple lines

class attributes

used when we want to define attributes with values that are shared by all objects created from a class

grouped bar chart

useful when we want to compare values across two or more categories

def function_name(arguments): """docstring""" <code> return output

user-defined functions

python supports three main types of function: built-in functions, ___, and ___

user-defined functions, anonymous functions

which of these is not one of the five key considerations we must keep in mind when collecting data for the analytics process?

value

global variables

variables defined outside of a function, can be used inside and outside of a function

Create a scatterplot that shows the relationship between city miles per gallon (on the x-axis) and CO2 emissions (on the y-axis) for all vehicles in the vehicles dataset.

vehicles.plot(kind = 'scatter', x = 'citympg', y = 'co2emissions')

Create two overlapping histograms from the vehicles dataset. The first histogram should show the distribution of city miles per gallon, while the second should show the distribution of highway miles per gallon. Set the opacity of the histograms to 0.4 and the make the plot 10 inches wide by 6 inches high. Label the x-axis "Miles Per Gallon" and the y-axis "Number of Vehicles".

vehicles[["citympg", "highwaympg"]].plot(kind = "hist", figsize = (10, 6), alpha = 0.4) plt.xlabel("Miles Per Gallon") plt.ylabel("Number of Vehicles")

resolving duplicate columns example

vehicles_concat_col = vehicles_concat_col.loc[:, ~vehicles_concat_col.columns.duplicated()]

Create a new DataFrame called washer_config from the washers dataset that lists the minimum, median, mean and maximum Energy Usage and Water Usage values for each type of washer configuration (i.e top load or front load). Output the washer_config DataFrame.

washer_config = pd.DataFrame(washers.groupby(["Configuration"])["EnergyUse", "WaterUse"].agg(["min", "median", "mean", "max"])) washer_config

Create a new DataFrame called washers by importing the CSV file located at https://coding-fundamentals.s3.amazonaws.com/residentialwashers.csv. Preview the first 5 rows of the DataFrame.

washers = pd.read_csv("https://coding-fundamentals.s3.amazonaws.com/residentialwashers.csv") washers.head(5)

Create a new DataFrame called washers by importing the CSV file located at https://coding-fundamentals.s3.amazonaws.com/residentialwashers.csv. Return a concise summary of the rows and columns in the washers DataFrame. Hint: The summary must include the number of columns, number of rows, column names, data type of each column, number of non-missing values in each column and how much memory is used to store the DataFrame.

washers = pd.read_csv("https://coding-fundamentals.s3.amazonaws.com/residentialwashers.csv") washers.info()

Create a new DataFrame called washers by importing the Residential Washers CSV file located at https://coding-fundamentals.s3.amazonaws.com/residentialwashers.csv. Set the ID column as the row index (either after the import or during the import) and preview the last 10 rows of the washers DataFrame.

washers = pd.read_csv("https://coding-fundamentals.s3.amazonaws.com/residentialwashers.csv", index_col = "ID") washers.head(10)

Sort the washers DataFrame in descending order, by DateAvailable and DateCertified. Note: Perform an in place sort.

washers.sort_values(by = ['DateAvailable', 'DateCertified'], inplace = True, ascending = False) washers

Output a count of each unique value in the BrandName column of the washers DataFrame.

washers["BrandName"].value_counts()

Based on the data in the BrandName column of the washers DataFrame, output the percentage of washers that belong to each brand.

washers["BrandName"].value_counts(normalize = True)

Convert the DateAvailable and DateCertified columns in the washers DataFrame to datettime. Use a DataFrame attribute to output the data type of both columns after the conversion is done. Hint: The display() function allows us to output more than one result at the same time.

washers['DateAvailable'] = pd.to_datetime(washers['DateAvailable']) washers['DateCertified'] = pd.to_datetime(washers['DateCertified']) display(washers["DateAvailable"].dtype, washers["DateCertified"].dtype)

Output the number of non-missing values, average, standard deviation, minimum, maximum, 25th percentile, 50th percentile and 75th percentile for the Volume, IMEF, EnergyUse, IWF and WaterUse columns in the washers DataFrame.

washers[["Volume", "IMEF", "EnergyUse", "IWF", "WaterUse"]].describe()

Output the 20th, 40th, 60th and 80th percentile values for the Volume, IMEF, EnergyUse, IWF and WaterUse columns in the washers DataFrame. In [26]:

washers[["Volume", "IMEF", "EnergyUse", "IWF", "WaterUse"]].quantile([0.20, 0.40, 0.60, 0.80])

Create a new DataFrame called water_config_brand from the washers dataset that lists the maximum water usage for each configuration and brand. Output the water_config_brand DataFrame.

water_config_brand = pd.DataFrame(washers.groupby(["Configuration", "BrandName"])["WaterUse"].max()) water_config_brand

instance attributes

we specify their value whenever we create a new instance of the object

descriptive analytics

what happened, what is happening; using summary statistics and visualizations to describe historical data

predictive analytics

what is likely to happen; uses statistical models and machine learning to estimate the likelihood of future outcome

prescriptive analytics

what should be do; least used and most complex; considers implications of several possible decisions and makes recommendations on which actions to take in order to maximize stated objective

integer

whole number

diagnostic analytics

why did it happen; root cause analysis, identify outliers, isolate patterns, and discover hidden relationships in data

verify_integrity = True

will give a value error is it encounters duplicate indices

import pandas as pdtrackwomen = pd.read_html("https://und.com/sports/track/roster/season/2021-22/")[1]trackwomen.head() trackwomen.groupby('______')['______'].value_counts().unstack().plot( kind = '______', stacked = ______, figsize = (10,6))

1) POSITION 2) Class 3) barh 4) True

import pandas as pdmbball = pd.read_html("https://und.com/sports/mbball/roster/season/2021-22/")[0]mbball[['Number','Name','POSITION','Hometown','High School','Class']] Complete the following piece of code so that I get a summary similar to the one below. mbball['_______'].________()

1) class 2) describe

array

a fixed-type data structure that allows us to store multidimensional data in an efficient way

arguments or parameters

a function can accept data (objects) as input

read_csv()

a function from the readr library used to import csv files

given a pivot table called centuri. what type of chart will the following piece of code create? centuri.plot(kind = "bar", stacked = False)

a grouped bar chart

DataFrame

a heterogeneous two-dimensional data structure with labeled axes (rows and columns)

pandas series

a homogeneous one-dimensional array-like data structure with a labeled axis (rows)

given a pandas series object called epsilon. what type of chart will the following piece of code create? epsilon.plot(kind = "barh")

a horizontal bar chart

variable

a label or name that is assigned to an object (such as a number)

list

a mutable, heterogeneous, ordered, multi-element container

Before we can generate group-level aggregations, we first need to group data using the groupby() method of a series or data frame. in the following code snippet, what would the date type of florida_county be? florida_county = votes[votes['state']=='FL'].groupby('county')

a pandas GroupBy object

function

a reusable piece of code that performs a certain task

while loop

a type of loop that runs as long as a logical condition is true and stops running when the logical condition becomes false

5 aspects of data collections

accuracy, relevance, quantity, variability, ethics (privacy, security, informed consent, bias)

.append()

add an element to the end of a list

title =

adds a title to a plot

np.append)

adds an element to a NumPy array

descriptive statistics or summary statistics

aggregations or statistical measures are used to describe the general and specific characteristics of our data

if-elif-else statement

allows us to chain multiple conditional statements together in order to execute different blocks of code depending on which condition is met

joins

allows us to combine two or more datasets based on the values in related columns from each dataset

inheritance

allows us to define a class that inherits the attributes and methods of a previously defined class

if-else statement

allows us to execute a separate block of code if the condition in the if statement is not met

Complete the function below so that when you pass an unspecified number of numbers to it, it returns the largest. def max_number(*alpha): beta = alpha[0] for gamma in ________ : if gamma > beta: beta = gamma return beta

alpha

a package is a collection of code used to perform a specific type of task. In order to use the code provided by a package in python, we have to first import the package in the following way: import pandas as pd what do we call the pd in the line of code above?

an alias

what will the following piece of code return? mylist = ['My true love sent to me', 1, 'Partridge', 1, 'Pear Tree'] mylist.sort() mylist

an error

each cycle through a loop is called

an iteration

another name for a NumPy array is

an nd-array

string (str)

an object that holds a block of text

boolean (bool)

an object that holds the dichotomous values TRUE or FALSE

dictionary (dict)

an unordered and mutable data structure that stores information as key-value pairs

&

and

avg = 72 std = 4.3 tom = 75 print(tom >= (avg - std) ____ tom <= (avg + std))

and

isinstance('variable', 'data type')

another way to check the data type of a variable

class attributes and instance attributes

are both mutable

required arguments

arguments that must be passed to the function when we use it in our code (ex. if function is specified with 3 arguments, you must give 3 arguments)

referencing an element in a 2-d array

arrayname[row_index, column_index]

evaluation

asses how well the chosen analytics approach works

how to create a boolean

assign TRUE or FALSE to a variable or assign the result of a comparison, logical, or membership operation to a variable (ex. my_boolean = 5 > 4)

Create a line plot from the vehicles DataFrame that shows the change in the average city miles per gallon by year.

avg_citympg = vehicles.groupby('year')[['citympg']].mean() avg_citympg.plot(kind = "line")

Create a line chart from the vehicles DataFrame that shows the change in both the average city and highway miles per gallon by year.

avg_mpg = vehicles.groupby('year')[['citympg', 'highwaympg']].mean() avg_mpg.plot(kind = "line")

Create two separate line plots in the same figure from the vehicles DataFrame that show the change in the average city and highway miles per gallon by year. The city miles per gallon plot should be on top of the highway miles per gallon plot.

avg_mpg = vehicles.groupby('year')[['citympg', 'highwaympg']].mean() avg_mpg.plot(kind = 'line', y = ['citympg', 'highwaympg'], subplots = True)

Create two separate line plots in the same figure from the vehicles DataFrame that show the change in the average city and highway miles per gallon by year. This time, the city miles per gallon plot should be to the left of the highway miles per gallon plot. Title the plot "Average City MPG versus Average Highway MPG" and make it 12 inches wide by 4 inches high.

avg_mpg = vehicles.groupby('year')[['citympg', 'highwaympg']].mean() avg_mpg.plot(kind = 'line', y = ['citympg', 'highwaympg'], title = 'Average City MPG versus Average Highway MPG', figsize = (12, 4), subplots = True, layout = (1, 2))

keyword (or named) arguments

be explicit in specifying which value goes with which argument when calling a function

how to create a pandas series

brics = pd.Series(["Brazil", "Russia", "India", "China", "South Africa"])

escape character

can be used to represent whitespace characters or characters that are typically not allowed in strings

lists aree heterogeneous

can contain elements of different data types

optional (or default) arguments

can specify a default value for some or all of our arguments when defining a function

.capitalize()

capitalize only the first letter in each sentence

.title()

capitalizes the first letter of each word

bins =

changes the number of bins in a histogram

np.reshape()

changes the shape of an array

in (membership operator)

checks if a substring exists within a string (ex. 'Python' in my_string)

.find()

checks if a substring exists within the string (returns the starting index position)

modeling

choosing and applying the right analytics approach that works well with the data we have and solves the problem we intend to solve

Instantiate an object from the ParttimeEmployee class named chris for an employee called "Chris Clark", who works thirty hours per week and has been at the company for ten years. Call the intro() method for chris.

chris = ParttimeEmployee('Chris Clark', 10, 30) chris.intro()

Define a Car class that has two instance attributes - color and capacity.

class Car: def __init__(self, color, capacity): self.color = color self.capacity = capacity

how to define a child class

class ChildClassName(ParentClassName): <code>

defining a class example

class Dog: pass

Define an Employee class that has one class attribute called salaried and three instance attributes called first_name, last_name, and work_years. The class attribute should have a default value of True.

class Employee: salaried = True def __init__(self, first_name, last_name, work_years): self.first_name = first_name self.last_name = last_name self.work_years = work_years

Define a child class called ParttimeEmployee from the Employee parent class. The ParttimeEmployee class should have an additional instance attribute called weekly_hours.

class Employee: def __init__(self, name, work_years): self.name = name self.work_years = work_years def intro(self): print("Hi, my name is {}. I've worked here for {} years.".format(self.name, self.work_years)) class ParttimeEmployee(Employee): def __init__(self, name, work_years, weekly_hours): self.name = name self.work_years = work_years self.weekly_hours = weekly_hours

Redefine a child class called ParttimeEmployee from the Employee parent class. The ParttimeEmployee class should have an additional instance attribute called weekly_hours. The ParttimeEmployee class should have its own intro() method, which prints a message that reads "Hi, my name is {name}. I've worked here for {work_years} years and I work {weekly_hours} hours per week.".

class PartimeEmployee(Employee): def __init__(self, name, work_years, weekly_hours): self.name = name self.work_years = work_years self.weekly_hours = weekly_hours def intro(self): print('Hi, my name is {}. Ive worked here for {} years and I work {} hours per week.'.format(self.name, self.work_years, self.weekly_hours))

Redefine the ParttimeEmployee class by adding another method called health_benefit. The method should respond according to the following rules: if an employee works thirty or more hours per week, the method prints a message that reads "I am eligible for employer-provided health benefits." if the employee works less than thirty hours per week, the method prints a message that reads "I am not eligible for employer-provided health benefits."

class ParttimeEmployee(Employee): def __init__(self, name, work_years, weekly_hours): self.name = name self.work_years = work_years self.weekly_hours = weekly_hours def intro(self): print('Hi, my name is {}. Ive worked here for {} years and I work {} hours per week.'.format(self.name, self.work_years, self.weekly_hours)) def health_benefit(self): if (int(self.weekly_hours) >= 30): print('I am eligible for employer-provided health benefits.') else: print('I am not eligible for employer-provide health benefits.')

key

commonly used name for related columns

a conditional statement is a combination of one or more ____ and ____.

conditions, responses

.plot()

creates a plot form a pandas data structure

subplots = True

creates multiple plots within a figures

figsize =

customizes the size of a plot or figure

statically-typed programming language

data type of a variable has to be explicitly defined in advance before being assigned a value

Define a function called calculator that accepts 3 arguments called operation, x and y. The allowed values for operation are 'add', 'subtract', 'multiply', and 'floor'. If a user enters a value for operation that is not one of these, return a message that reads "Invalid Operation!". Otherwise, depending on the value of the operation argument, the functon should return one of the following: x plus y x minus y x times y the floor division of x by y

def calculator(operation, x, y): if operation == 'add': result = x + y elif operation == 'subtract': result = x - y elif operation == 'multiply': result = x * y elif operation == 'floor': result = x // y else: result = 'Invalid Operation!' return result

Define a function called round_mean that accepts an unspecified number of numeric values as arguments and returns the mean of the numbers (rounded to two decimal places). Add a docstring to your function that explains what the function does, how many arguments it accepts, and what it returns.

def round_mean(*args): result = round(sum(args)/len(args), 2) ''' This function accepts a variable number of numeric values and returns the mean of these numbers. ''' return result

Define a function called to_fahrenheit that accepts a temperature value in celsius as an argument, and returns the temperature in fahrenheit rounded to no decimal places.

def to_fahrenheit(number1): result = round(9/5*(number1)+32, 0) return result

Modify the function you defined in Problem 2 so that the height argument becomes an optional argument with a default value of 10.

def triangle_area(number1, number2 = 10): result = (1/2 * number1 * number2) return result

Define a function called triangle_area that accepts arguments for the base and height of a triangle and returns the area of the triangle

def triangle_area(number1, number2): result = (1/2 * number1 * number2) return result

Define a function called vowel_count that returns the number of English vowels in a variable passed to it.

def vowel_count(arg): count = 0 vowel = set("aeiouy") for alphabet in arg: if alphabet in vowel: count = count + 1 return (count)

sparsity and density

degree to which data exists in a dataset

np.delete()

deletes a specific element from an array

{curly brackets}

dictionary

which of these should I run if I want to get a list of all methods supported by the tuple data structure?

dir(tuple)

Create a DataFrame called dive_women by selecting the Name, Class and Hometown columns from the swim_women DataFrame for those swimmers who are members of the dive team. Output the dive_women DataFrame sorted in ascending order of Name.

dive_women = swim_women.sort_values( by=["Name", "Class", "Hometown"], ascending = [True, False, False] )[["Name", "Class", "Hometown"]] dive_women

[2:]

end is the last element of the list

Create a new DataFrame called energy_config_brand from the washers dataset that lists the minimum and maximum energy usage for each configuration and brand. The min column should be called min_energy_use and the max column should be called max_energy_use. Output the energy_config_brand DataFrame.

energy_config_brand = pd.DataFrame(washers.groupby(["Configuration", "BrandName"])["EnergyUse"].agg({"min", "max"})) energy_config_brand.rename(columns = {'min':'min_energy_use', 'max':'max_energy_use'}, inplace = True) energy_config_brand

what does the following piece of code do? balance == 45

evaluates whether balance is equal to 45

rounding to even (banker's rounding)

even number is returned (ex. 1147.5 is rounded to 1148)

dict()

ex. university_info = dict( name = 'University of Notre Dame', mascot = 'Leperchaun', city = 'Notre Dame', state = 'Indiana')

change any elements in a list by using index notation

ex. color_list[3] = 'orange'

list()

ex. list('Python is my friend) separates each character by comma

how to create a dictionary

ex: university_info = { 'name' : 'University of Notre Dame', 'mascot' : 'Leperchaun', 'city' : 'Notre Dame', 'state' : 'Indiana' }

if statement

executes a block of code if one or more logical conditions are met

listname[index]

extract an individual element in a list

Given two pandas DataFrames called shake and bake with the same number of columns, rows, and index values, the following code will combine the columns of the two DataFrames pd.concat([shake, bake])

false

Python supports three types of conditional statements, the if statement, the try-if-else statement, and the try-except-finally statement.

false

a programming language in which the data type of a variable has to be explicitly defined is known as a dynamically typed language

false

one of the benefits of using dictionaries is that the keys in a dictionary are mutable

false

the actions an object can take or the functions it can perform are known as its attributes

false

the characteristics of an object are known as its methods

false

the strong data type is used to represent the dichotomous values of TRUE and FALSE

false

when using slice notation, the stop index value is inclusive

false

Given the list nums = [10, 20, 30, 40, 50, 60, 70, 80, 90], which type of loop is most appropriate if my goal is to calculate the square of every element in the list?

for loop

Use a loop to iteratively call the calculator function using each item in op_list as the value of the operation argument for the numbers 135 (as x) and 75 (as y). Print the returned value in each iteration of the loop. For example, the first output should be "add: 210".

for operation in op_list: print(operation,':', calculator(operation, 135, 75))

.groupby()

for single columns, pass name of column we intend to group by

for loop to iterate through values of dictionaries

for value in fruit_price.values(): discount_value = round(value * 0.8, 2) print(discount_value)

descriptive statistics

frequency distributions, measures of central location, measures of spread

class gamma: def __init__(self, x): self.x = x def calc(self): print(self.x ** 2) class theta(gamma): def __init__(self, x, y): self.x = x self.y = y def calc(self): print(self.x ** self.y ** self.x) Based on the class definitions above, we know that __ is the parent class and __ is the child class

gamma; theta

.value_counts()

gives a count of each unique value in a single column within a series

.count()

gives a count of the number of occurrences of a particular value

dir()

gives a list of methods supported for a particular type of object in python

.value_counts(normalize = True)

gives a percentage rather than a count

.describe(include='all')

gives descriptive statistics for all columns

.index

gives information about the index or row labels of a DataFrame

.info()

gives quick overview of structure of data including number of columns, number of rows, column names, data type of each column, number of non-missing values, and how much memory is used

.mean()

gives the average of the values within a series or the column of a dataframe

docstring

gives the description of what the function does (optional)

.ndim

gives the dimensions of a NumPy array

.itemsize

gives the number of bytes used to store each element of a NumPy array

.nbytes

gives the number of bytes used to store the entire array

.shape

gives the number of elements in each dimension of a NumPy array

.quantile()

gives the percentiles of the values within a series or the column of a dataframe

.sum()

gives the sum of the values within a series or the column of a dataframe

.size

gives the total number of elements in a NumPy array

.values

gives the values in the cells of the DataFrame

Use the list approach to create a pandas DataFrame called grades from the data presented in the following table. Set the Name column as the index (in place) and output the grades DataFrame.

grades = pd.DataFrame([["John", "Physics", 74, 82, 67, "B"], ["Carol", "Math", 76.5, 86, 82.5, "A"], ["Jim", "Economics", 71, 77.5, 62.5, "C"], ["Laura", "Engineering", 84.5, 92, 87.5, "A"], ["Tom", "Biology", 79, 80.5, 77, "B"], ["Chris", "Theology", 70.5, 73.5, 71.5, "C"]], columns = ["Name", "Major", "Exam1", "Exam2", "Midterm", "Final"]) grades.set_index("Name", inplace = True) grades

Use the dictionary approach to create a pandas DataFrame called grades from the data presented in the following table. Output the grades DataFrame.

grades_dict = brics_dict = {"Name": ["John", "Carol", "Jim", "Laura", "Tom", "Chris"], "Major": ["Physics", "Math", "Economics", "Engineering", "Biology", "Theology"], "Exam1": [74, 76.5, 71, 84.5, 79, 70.5], "Exam2":[82, 86, 77.5, 92, 80.5, 73.5], "Midterm": [67, 82.5, 62.5, 87.5, 77, 71.5], "Final": ["B", "A", "C", "A", "B", "C"]} grades = pd.DataFrame(grades_dict) grades

box plot

great for visualizing distribution of values for a variable (min, 1st quartile, median, 3rd quartile, max)

Create a new DataFrame called hometown by importing the 'Hometown' sheet in the Excel file located at https://coding-fundamentals.s3.amazonaws.com/students.xlsx. Combine the students (you created in the previous problem) and hometown DataFrames by using an inner join on the ID column. Call the new DataFrame students_hometown_inner and display it.

hometown = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Hometown") students_hometown_inner = pd.merge(students, hometown, on = "ID", how = "inner") students_hometown_inner

measures of spread

how similar or varied values of feature are

data collection

identify and gather the data we need for the analytics process

actionable inisht

identifying potential course of action or a series of actions based on the results of the model

if-elif-else (example)

if score >= 90: print('The grade is A.') elif score >= 80: print('The grade is B.') elif score >= 70: print('The grade is C.') elif score >= 60: print('The grade is D.') else: print('The grade is F.')

if-else (example)

if score >= 90: print('The grade is A.') else: print('The grade is not A.')

method overriding (or polymorphism)

if we define a method in a child class with the same name as a method defined in the parent class, the child method overrides the parent method

ignore_index = true

ignores original index values and assigns new ones

relationship visualization

illustrate correlation between two or more variables

comparison visualizations

illustrate difference between two or more items

Define a function called math_facts that only makes use of functions in the math module to return the square, square root, natural log (base e), and factorial of any whole number passed to it. Add a docstring to your function that explains what the function does, how many and what type of arguments it accepts, and what it returns.

import math as m def math_facts(arg): ''' This function accepts any whole numbers and returns the square, square root, natural log (base e), and factorial of that whole number. ''' return (m.pow(arg, 2), m.sqrt(arg), m.log(arg), m.factorial(arg))

Use a function from the math module to get the greatest common divisor between the numbers 30 and 76.

import math as m m.gcd(30, 76)

Use a function from the math module to get the result of 3^7

import math as m m.pow(3, 7)

add or override the labels for x and y axis

import matplotlib.pyplot as plt plt.xlabel() plt.ylabel()

from (module_name) import (function_name)

imports only the specified function ex. from math import factorial

f-string

include an 'f' at the beginning of the string (ex. f'{name} is Number {rank}.')

full outer join

includes all rows from both left and right datasets regardless of whether the key values match

left join

includes all the rows from the left dataset and only the rows from the right dataset with matching key values

right join

includes all the rows from the right dataset and only the rows from the left dataset with matching key values

inner join

includes only the rows from both datasets where the key values match

By default, the pandas concat() function combines the columns of two DataFrames by matching the ____ of both Data Frames

index labels

start

index value we start at (inclusive)

stop

index value we stop at (exclusive)

ground truth data

information that is known to be real or true

try-except-else (example)

input_number = input('Enter a number: ) try: reciprocal_number = 1/ float(input_number) except ZeroDivisionError: print('Zero does not have a reciprocal.') except: print('Invalid input.') else: print('The reciprocal of {} is {}.'.format(input_number, round(reciprocal_number, 2)))

np.insert()

inserts an element to a particular position in an array

extracting elements from dictionary

instead of indexing, use key

continue

instead of terminating the loop early, we skip the current iteration and move on to the next

given the following code snippet, what will the data type of x by: x = 55 // 6

int

int64 data type in pandas

int

int

integer variable

.describe(exclude= )

limits the types of columns to include in our input

[square brackets]

list

inplace = True

makes it so the set_index change persists

data preporation

making sure data is suitable for the analytics approach that we intend to use; resolving data quality issues and modifying/transforming structure of data to make it easier to work with

negative value for step

means we step from right to left

.update()

merge the contents of one dictionary with that of another (replaces value if key exists in both dicts)

the process of calling several methods on an object without having to create temporary variables is known as

method chaining

For example, if we wanted to compare the SAT average scores by type of college amongst colleges in Michigan and Indiana, we group the michiana_colleges DataFrame by both state and institutional_owner then get the mean of the sat_average column:

michinana_colleges.groupby(["state", "instirutional_owner"])["sat_average"].mean()

import math as m m.sqrt(567)

modules are sometimes imported with aliases so we don't have to type their long names

rules for naming variable

must not begin with number, must not contain punctuation, must not contain a space, must not be one of python's reserved words

while loop (example)

my_list = [ ] while len(my_list) < 5: x = input('Enter anything: ') my_list.append(x) print(x)

break (example)

my_list = [ ] while len(my_list) < 5: x = input('Enter anything: ') if x == "!": break my_list.append(x) print(x)

continue (example)

my_list = [ ] while len(my_list) < 5: x = input('Enter anything: ') my_list.append(x) if x == "*": continue print(x)

for loop (example)

my_list = [45, 57.5, 231.4, -56, 99.3, 132, 89.5] sum_value = 0 for item in my_list: sum_value = (sum_value + item) print(sum_value)

np.object

non numeric columns

two dimensional NumPy array

np.array([[2, 3, 4],[4, 5, 6]])

Create a bar chart that shows the number of vehicles in the vehicles dataset by model year. Make the plot 10 inches wide by 6 inches high.

number = vehicles.groupby(["year"])["make"].count() number.plot(kind = 'bar', figsize = (10, 6))

Convert the bar chart from the previous problem into a stacked bar chart that shows the number of vehicles in the vehicles dataset by model year, broken out by drive type (i.e. '2-Wheel Drive', 'Rear-Wheel Drive', etc). Make the plot 10 inches wide by 6 inches high.

number_drive = vehicles.groupby(["year"])["drive"].value_counts() number_drive = number_drive.unstack() number_drive.plot(kind = 'bar', stacked = True, figsize = (10, 6))

exception

occur when a line of grammatically correct code fails during execution (ex. ZeroDivisionError)

syntax error

occurs when a line of code does not abide by the rules of the language

|

or

child class's init()

overrides that of its parent class; child class won't inherit attributes from the parent class

b, h = 12, 5 ____ = ((b**2) + (h**2))**.5 print(p)

p

adding a legend

plt.legend() (loc = (location))

when we define a method in a child class with the same name as a method in its parent class, the child class method overrides that of the parents. this is known as ___

polymorphism

Call the round_mean function and pass the numbers 243, 435, 563, 412, 369 and 679 to it. Print the returned values.

print(round_mean(243, 435, 563, 412, 369, 679))

Call the to_fahrenheit function, pass 17 degrees celsius to it, and print the returned value.

print(to_fahrenheit(17))

Call the modified triangle_area function, pass 12 as the base, and print the returned value.

print(triangle_area(number1 = 12))

Call the triangle_area function, pass 6 and 15 as the height and base, respectively, and print the returned value.

print(triangle_area(number1 = 15, number2 = 6))

data exploration

process of describing, visualizing, and analyzing data in order to better understand it

data analytics

process of extracting value or insight from data through series of iterative and methodical processes

how to create a string

quotes (') (") ('")

.read_excel()

read an excel file into python

read_json()

reads JSON files into python

read_html()

reads an html table into python

negative index notation

refers to elements based on how far away they are from the end of the list (starts at -1 not 0)

.pop()

remove the last element from a list (or a specific value using the index method ex. .pop(0))

.replace()

replace a substring within a string (ex. my_new_string = my_string.replace('$', 's')

measures of central location

represents typical value for feature

.reset_index()

resets the index

.drop_duplicates()

resolves duplicate rows in a dataframe

listname[start: stop: step]

retrieves multiple elements from a list

Call the letter_count function, pass the word variable to it and print the returned value.

return_value = letter_count(word) print(return_value)

Call the math_facts function, pass num to it, and assign the returned values to variables called var1, var2, var3, and var4. Print a message that reads "The square, square root, natural log, and factorial of {num} is {var1}, {var2}, {var3}, and {var4}.".

return_value = math_facts(num) var1, var2, var3, var4 = math_facts(num) print('The square, square root, natural log, and factorial of {} is {}, {}, {}, and {}.'.format(num, var1, var2, var3, var4))

Call the vowel_count function, pass the word variable to it and print the returned value.

return_value = vowel_count(word) print(return_value)

.describe()

returns a statistical summary for each of the columns in a dataframe (count, mean, std, min, 25 percentile, 50 percentile, 75 percentile, max) - numeric columns (count, unique, top, freq) - non-numeric columns

.columns

returns the column labels

floor division

returns the integer portion of the division operation (whole number)

modulus

returns the remainder of a division operation

how to use .sort() in descending order

reverse argument to TRUE (ex. nums.sort(reverse=True)

.reverse()

reverse the order of a list

round()

rounds (even numbers are returned when fractional is exactly halfway between 2 numbers)

example of filtering dataframe

rows = brics["literacy"] >= .95 cols = ["country", "gdp", "population"] brics[rows][cols]

Create four new DataFrames called seniors, juniors, sophomores and freshmen by importing the 'Seniors', 'Juniors', 'Sophomores' and 'Freshmen' sheets in the Excel file located at https://coding-fundamentals.s3.amazonaws.com/students.xlsx. Combine all four DataFrames vertically into a new DataFrame called students and display it. Hint: Make sure that there are no duplicate index values in the students DataFrame and that the index values go from 0 to 19 (see the previous tutorial if you need a refresher).

seniors = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Seniors") juniors = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Juniors") sophomores = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Sophomores") freshmen = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Freshmen") students = pd.concat([seniors, juniors, sophomores, freshmen]).reset_index() students = students.drop("index", axis = "columns") students

Create four new DataFrames called seniors, juniors, sophomores and freshmen by importing the 'Seniors', 'Juniors', 'Sophomores' and 'Freshmen' sheets in the Excel file located at https://coding-fundamentals.s3.amazonaws.com/students.xlsx. Make the ID column the index label for each DataFrame and display all four of them.

seniors = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Seniors", index_col = "ID") juniors = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Juniors", index_col = "ID") sophomores = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Sophomores", index_col = "ID") freshmen = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Freshmen", index_col = "ID") display(seniors, juniors, sophomores, freshmen)

alpha =

sets opacity of a line within a line plot (value between 0 and 1)

style =

sets style of a line within a line plot ( - solid, -- dashed, -. dash-dot, . dotted)

color =

sets the color of a line within a line plot

Instantiate an object from the ParttimeEmployee class named shelly, for an employee called "Shelly Smith", who works thirty hours per week and has been at the company for two years. Call the intro() and health_benefit() methods for shelly.

shelly = ParttimeEmployee('Shelly Smith', 2, 30) shelly.intro() shelly.health_benefit()

histograms

show the frequency distribution of values within a dataset

np.array()

simplest way to create a NumPy array (ex. np.array([0, 1, 2, 3, 4, 5])

.sort()

sort a list (with elements of a single data type)

.sort_values(by = "")

sort the data by one or more columns

ascending = False

sorts a dataframe in descending order

step

specifies length of each loop

else clause

specifies the block of code that should be executed if the try clause does not raise an exception

finally clause

specifies the block of code that would be executed regardless of whether an exception was raised or not

variable-length arguments

specify a single variable name preceded by an asterisk (*) ex. def total_sum(*args)

.split()

splits a string into individual words

measures of ___ describe how similar or varied the set of observed values are for a particular feature

spread

def

stands for definition and indicates that a function definition follows

[:2]

start is the first element of the list

object data type in pandas

str

Remove the duplicate columns in the students_demo_major DataFrame that you created in the previous problem. Sort the DataFrame by the FirstName and LastName columns and display it.

student_demo_major = student_demo_major.loc[:, ~student_demo_major.columns.duplicated()] student_demo_major.sort_values(by = ["FirstName", "LastName"])

Create a new DataFrame called student_demographics by importing the 'Demographics' sheet in the Excel file located at https://coding-fundamentals.s3.amazonaws.com/students.xlsx. Make the ID column the index label. Combine the students (from problem 2) and student_demographics DataFrames horizontally. Call the new DataFrame students_demo and display it.

student_demographics = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Demographics", index_col = "ID") students_demo = pd.concat([students, student_demographics], axis = "columns") students_demo

Create a new DataFrame called student_hometown_left by combining the students and hometown DataFrames using a left join on the ID column. Display the student_hometown_left DataFrame.

student_hometown_left = pd.merge(students, hometown, on = "ID", how = "left") student_hometown_left

Create a new DataFrame called student_hometown_right by combining the students and hometown DataFrames using a right join on the ID column. Display the student_hometown_right DataFrame.

student_hometown_right = pd.merge(students, hometown, on = "ID", how = "right") student_hometown_right

Create a new DataFrame called student_major by importing the 'Major' sheet in the Excel file located at https://coding-fundamentals.s3.amazonaws.com/students.xlsx. Make the ID column the index label. Combine the students_demo (from Problem 3) and student_major DataFrames horizontally. Call the new DataFrame students_demo_major and display it.

student_major = pd.read_excel("https://coding-fundamentals.s3.amazonaws.com/students.xlsx", sheet_name = "Major", index_col = "ID") student_demo_major = pd.concat([students_demo, student_major], axis = "columns") student_demo_major

input()

student_name = input('Enter your first name: ")

Create a new DataFrame called students by combining the seniors, juniors, sophomores and freshmen DataFrames vertically. Sort the students DataFrame by its index and display it.

students = pd.concat([seniors, juniors, sophomores, freshmen]) students.sort_index()

Set the ID column as the index for the students and hometown DataFrames. Create a new DataFrame called student_hometown_outer by combining the students and hometown DataFrames using an outer join on the index. Display the student_hometown_outer DataFrame.

students = students.set_index("ID") hometown = hometown.set_index("ID") student_hometown_outer = pd.merge(students, hometown, on = "ID", how = "outer") student_hometown_outer

Create a new DataFrame called swim_women by importing the Women's roster from the Notre Dame Swimming and Diving home page located at https://und.com/sports/swim/roster/. Preview the first 10 rows of the swim_women DataFrame. Hint: Go to the webpage to identify the HTML tables on the page and what kind of data is stored in them first.

swim_women = pd.read_html( "https://und.com/sports/swim/roster/")[1] swim_women.head(10)

object_name = ClassName()

syntax used to instantiate a new object

.dtypes

tells us the data type of each column in the DataFrame

break

terminates the loop even if the conditional statement is still TRUE

methods

the actions that an object can take

dynamically-typed langauge

the data type of a variable is based on the data type of the object that it holds or represents

zero-indexed language

the first character is [0]

child or derived class

the new class in inheritance

not equal to operator

!=

complete the following piece of code to copy the elements 44 and 55 from epsilon into a new tuple called dalta: epsilon = (11, 22, 33, 44, 55, 66) delta = epsilon[____:-1:] print(delta)

-3

finally (example)

... finally: print('Thank you!')

def beta(a = 2): b = a + 4 b = b ** 2 return b If I call the beta() function without passing an argument to it and assign the result to a variable called c, what will the value of c be?

36

alpha = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) given the code snippet above, in terms of dimensions, alpha is a __ by __ array

4, 3

the sequence of numbers returned b this range(6, 21, 3) function will contain ____ elements

5

which of these variable names is not allowed in python?

7_catinthehat

def beta(): a = 2 ** 2 b = 3 ** 2 r eturn b b += 4 If I assign the beta() function to a variable, then print the variable, what would my output be?

9

equality operator

==

JSON

Java Script Object Notation used to store semi-structured data in human-readable form online

missing values are represented as NaN or np.nan in pandas data structures. NaN stands for ___

Not a Number

instantiation

The class we defined, is simply a blueprint. Once we have it defined, we can now create objects based on this blueprint.

In code we sometimes have to ask questions in order to decide what to do. this sort of question is also known as

a condition

data structure

a container that holds a sequence of objects

tuple

a core data structure that is very similar to a list except they are immutable

.insert()

add an element at a specific index

.lower()

all letters are lower case

.upper()

all letters are upper case

method chaining

allows us to call multiple methods on an object all at once without having to create temporary variables

tuple()

also used to create a tuple

built-in functions

always available for use in order to perform different types of tasks

what does the following piece of code do? balance = 45

assigns the value 45 to a variable called balance

data type f input

automatically string

beta = np.array([['red', 'orange'],['yellow', 'green'],['indigo', 'violet']]) which of these should I run to return 'green'?

beta[1, 1]

Instantiate two objects based on the Car class you defined in Problem 1. The first car should be "white" and have a seating capacity of 6. The second car should be "blue" and have a seating capacity of 6. Call the first car, car_one, and the second car, car_two. Print a message that reads "The first car is {color} with a seating capacity of {capacity}, while the second car is {color} with a seating capacity of {capacity}."

car_one = Car('white', 6) car_two = Car('blue', 6) print("The first car is {} with a seating capacity of {}, while the second car is {} with a seating capacity of {}.".format(car_one.color, car_one.capacity, car_two.color, car_two.capacity))

class

categorical

feature

categorical - discrete form continuous - integer

int()

changes a decimal number to a whole number

layout = (row, column)

changes how subplots are displayed

Modify the Employee class you defined in Problem 3 to include a method called intro(). When called, the intro()method should print a message that reads "Hi, my name is {first_name} {last_name}. I've worked here for {work_years} years."

class Employee: salaried = True def __init__(self, first_name, last_name, work_years): self.first_name = first_name self.last_name = last_name self.work_years = work_years def intro(self): print('Hi, my name is {} {}. Ive worked here for {} years.'.format(self.first_name, self.last_name, self.work_years))

axis = 1, axis = 'columns'

combines data horizontally

.concat()

combines multiple series or dataframe objects vertically

CSV file

comma-separated values file; one of the most common ways to save data in tabular format

let's assume that the variable numbers is a list of whole numbers between 1 and 50. complete the code snippet below so the loop prints all of the even numbers in the numbers variable: for n in numbers: if n % 2 != 0: _________ print(n)

continue

response

continuous

float()

convert a whole number to a decimal number

str()

converts the variable to a string

floating point

decimal number

Define a function called letter_count that returns the number of letters in a variable passed to it.

def letter_count(arg): return len(arg)

a dataset that is 80% dense is also 80% sparse

false

.head()

first five rows of a dataframe

float64 data type in pandas

float

float

floating point variable

for loop to iterate through keys of dictionaries

for key in fruit_price.keys(): print(key.capitalize())

for loop to iterate through all items of dictionaries

for key, value in fruit_price.items(): key = key.capitalize() value = round(value * 0.8, 2) sale_fruit_price[key] = value

return

functions immediately exit when they encounter a return statement

range(start, stop, step)

generates a sequence of numbers

.dtype

gives the data type

Write code to determine if chris is an instance of the Employee class.

isinstance(chris, Employee)

Reinstantiate the same two Employee objects as you did in Problem 4 and call the intro() method for both of them.

jack = Employee('Jack', 'Turner', 6) kate = Employee('Kate', 'Brown', 8) jack.intro() kate.intro()

Instantiate two objects from the Employee class you defined in Problem 3. The first employee is Jack Turner. He has worked at the company for 6 years and is a salaried employee. The second employee is Kate Brown. She is an hourly employee and has worked at the company for 8 years. Use the first name of each employee (in lower case) as the name for the object you instantiate. Print the first_name, last_name, work_years, and salaried attributes for both Employee objects.

jack = Employee('Jack','Turner', 6) kate = Employee('Kate','Brown', 8) print('{} {} {} {}'.format(jack.first_name, jack.last_name, jack.work_years, jack.salaried)) kate.salaried = False print('{} {} {} {}'.format(kate.first_name, kate.last_name, kate.work_years, kate.salaried))

.tail()

last five rows of a dataframe

Instantiate an object from the new ParttimeEmployee class named laura, for an employee called "Laura Walker", who works ten hours per week and has been at the company for eight years. Call the intro() method for laura.

laura = ParttimeEmployee('Laura Walker', 8, 10) laura.intro()

~

not

second number refers to

number of columns

dimensionality

number of features in dataset

frequency distributions

number of occupancies within feature

first number refers to

number of rows

np.number

numeric columns

extract every 2nd element in the list

nums[0::2]

operators are special symbols that tell python to take a discrete action. These actions are known as

operations

Given the two DataFrames above called alpha and beta, respectively, which line of code would produce the following DataFrame?

pd.merge(alpha, beta, on = ["make", "model"], how = "left")

Rachel built a model that helps her predict whether a particular patient is at risk for preterm birth. she used existing and historical patient health data to build the model. what type of data analytics did she use

predictive analytics

.remove()

remove a specific value from a list

reset_index(drop = True)

resets index values

how to create a list

separate a list of values by comma surrounded by square brackets [ ]

.set_index()

sets on of the existing columns of a DataFrame as the row index

index_col

sets one of the columns in the data as the index

composition visualizations

show component make up of data

distribution visualizations

show frequency distribution of values of feature

arguments

the objects or values we pass to the function as input (optional)

attributes

the properties of an object

and

the result is FALSE if at least one expression is false

not

the result is TRUE if the expression is FALSE and vice versa

or

the result is TRUE is at least one expression is true

kind = "hist"

to create a histogram

kind = 'barh'

to create a horizontal bar plot

kind = 'line'

to create a line plot

kind = 'scatter'

to create a scatter plot

kind = 'bar' stacked = True

to create a stacked bar graph

kind = 'bar'

to create a vertical bar plot

In a conditional statement, a response is the block of code that executes in response to a question

true

len()

used to get the length of the string

lists are mutable. this means that

we can modify the contents of a list

how to create a tuple

wrap a comma-separated list with parentheses () or assigning a list of elements to a variable

if (example)

x = 'Statements' if x.startswith('S'): print('Starts with S') if x.find('t') !=-1: print('Contains t')

assigning multiple variables to same value

x = y = z = 10

assigning multiple values at the same time

x, y, z = 10, 10.5, 1148.57


Set pelajaran terkait

LearnSmart Ch 12 Conceptual Questions

View Set

8.2.10 Client Pro Practice Questions

View Set

History 122 Chapter 32- The Building of Global Empires

View Set

Fundamentals Set One: Weeks 1-3 Exam Prep

View Set

7.2 Listen and indicate whether each statement is cierto or falso based on the conversation between Felipe and Mercedes.

View Set

Evaluating (sine, cosine, tangent func.) (degrees)

View Set