Tingnan ang lahat ng mga set ng pag-aaral

Intermediate Python

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

ll_walks = [] for i in range(10) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) random_walk.append(step) all_walks.append(random_walk) print(all_walks)

# Numpy is imported; seed is set # Initialize all_walks (don't change this line) all_walks = [] # Simulate random walk 10 times for i in ___ : # Code from before random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) random_walk.append(step) # Append random_walk to all_walks ___ # Print all_walks ___ Fill in the specification of the for loop so that the random walk is simulated 10 times. After the random_walk array is entirely populated, append the array to the all_walks list. Finally, after the top-level for loop, print out all_walks.

4 6 2

How many data points in the first histogram bin? second bin? third bin?

objects same

Make sure that you make comparisons between ob____________ of the sam_____________ type

dice = np.random.randint(1,7) if dice <= 2 : step = step - 1 elif dice <= 1 : step = step - 1 else : step = step + np.random.randint(1,7) print(dice) print(step)

Start with: step = 50 Use randint() to create the variable dice. It should start off with np.random and have (1,7) Finish the if, elif and else: If dice is 1 or 2, you go one step down. Example: if dice <= 2: step = step - 1 if dice is 3, 4 or 5, you go one step up. Example: else: step = step + ? Print out dice and step. Given the value of dice, was step updated correctly?

if else elif

What are examples of conditional statements? 1. i____ 2. els______ 3. el_______

and or not

What are the three common Boolean Operators? 1. an_________ 2. o__________ 3. no_________

Comma Separated Values

What does CSV mean?

all_walks = [] for i in range(250) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) if np.random.rand() <= 0.001: step = 0 random_walk.append(step) all_walks.append(random_walk) np_aw_t = np.transpose(np.array(all_walks)) plt.plot(np_aw_t) plt.show()

# numpy and matplotlib imported, seed set # Simulate random walk 250 times all_walks = [] for i in range(10) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) # Implement clumsiness if ___ : step = 0 random_walk.append(step) all_walks.append(random_walk) # Create and plot np_aw_t np_aw_t = np.transpose(np.array(all_walks)) plt.plot(np_aw_t) plt.show() Change the range() function so that the simulation is performed 250 times. Finish the if condition so that step is set to 0 if a random float is less or equal to 0.001. Use np.random.rand().

all_walks = [] for i in range(500) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) if np.random.rand() <= 0.001 : step = 0 random_walk.append(step) all_walks.append(random_walk) np_aw_t = np.transpose(np.array(all_walks)) ends = np_aw_t[-1,:] plt.hist(ends) plt.show()

# numpy and matplotlib imported, seed set # Simulate random walk 500 times all_walks = [] for i in range(500) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) if np.random.rand() <= 0.001 : step = 0 random_walk.append(step) all_walks.append(random_walk) # Create and plot np_aw_t np_aw_t = np.transpose(np.array(all_walks)) # Select last row from np_aw_t: ends ____ = ____[____,____] # Plot histogram of ends, display plot ____ ____ To make sure we've got enough simulations, go crazy. Simulate the random walk 500 times. From np_aw_t, select the last row. This contains the endpoint of all 500 random walks you've simulated. Store this Numpy array as ends. Use plt.hist() to build a histogram of ends. Don't forget plt.show() to display the plot.

all_walks = [] for i in range(10) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) random_walk.append(step) all_walks.append(random_walk) np_aw = np.array(all_walks) plt.plot(np_aw) plt.show() plt.clf() np_aw_t = np.transpose(np_aw) plt.plot(np_aw_t) plt.show()

# numpy and matplotlib imported, seed set. # initialize and populate all_walks all_walks = [] for i in range(10) : random_walk = [0] for x in range(100) : step = random_walk[-1] dice = np.random.randint(1,7) if dice <= 2: step = max(0, step - 1) elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) random_walk.append(step) all_walks.append(random_walk) # Convert all_walks to Numpy array: np_aw # Plot np_aw and show # Clear the figure plt.clf() # Transpose np_aw: np_aw_t # Plot np_aw_t and show Use np.array() to convert all_walks to a Numpy array, np_aw. Try to use plt.plot() on np_aw. Also include plt.show(). Does it work out of the box? Transpose np_aw by calling np.transpose() on np_aw. Call the result np_aw_t. Now every row in np_all_walks represents the position after 1 throw for the 10 random walks. Use plt.plot() to plot np_aw_t; also include a plt.show(). Does it look better this time?

rows columns

2D numpy arrays = my_array[ ro____, col______] For pandas = loc (label-based_ iloc (integer position-based) data.loc["RU"] is a panda series data.loc[["RU"]] is a panda DataFrame More than 1 = data.loc[["RU", "IN", "CH"]] To only show index in selected columns: data.loc[["RU", "IN", "CH"], ["country", "capital"]] To select all index in specified columns: data.loc[:, ["country", "capital"]] To use numeral indexing instead of string, use iloc: data.iloc[[1]] = RU Russia To only show index in selected columns with iloc:. data.loc[[1, 2, 3]], [0, 1]]

plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)

A dictionary is constructed that maps continents onto colors: dict = { 'Asia':'red', 'Europe':'green', 'Africa':'blue', 'Americas':'yellow', 'Oceania':'black' } plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2) Add c = col to the arguments of the plt.scatter() function. Change the opacity of the bubbles by setting the alpha argument to 0.8 inside plt.scatter(). Alpha can be set from zero to one, where zero is totally transparent, and one is not at all transparent.

observation variable

A row in a tabular data set is an obse___________________ A column in a tabular data set is a vari_____________________

for index, area in enumerate(areas) : print("room" + str(index +1) + ": " + str(area))

Adapt the print() function in the for loop on the right so that the first printout becomes "room 1: 11.25", the second one "room 2: 18.0" and so on. for index, area in enumerate(areas) : print("room" + str(index) + ": " + str(area))

matplotlib.pyplot as plt plt.plot(random_walk) plt.show()

Add some lines of code after the for loop: Import matplotlib.pyplot as plt. Use plt.plot() to plot random_walk. Finish off with plt.show() to actually display the plot.

my_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc } OR my_dict = { "country": ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'], "drives_right": [True, False, False, False, True, True, True], "cars_per_cap": [809, 731, 588, 18, 200, 70, 45] } cars = pd.DataFrame(my_dict) print(cars)

Add the following list: names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'] dr = [True, False, False, False, True, True, True] cpc = [809, 731, 588, 18, 200, 70, 45] Import pandas as pd. Use the pre-defined lists to create a dictionary called my_dict. There should be three key value pairs: key 'country' and value names. key 'drives_right' and value dr. key 'cars_per_cap' and value cpc. example: "country":names Use pd.DataFrame() to turn your dict into a DataFrame called cars. Print out cars and see how beautiful it is.

if area > 15 : print("big place!")

Add these values: room = "kit" area = 14.0 if room == "kit" : print("looking around in the kitchen.") Examine the if statement that prints out "Looking around in the kitchen." if room equals "kit". Write another if statement that prints out "big place!" if area is greater than 15.

import numpy as np np.random.seed(123) print(np.random.randint(1,7)) print(np.random.randint(1,7))

As Hugo explained in the video you can just as well use randint(), also a function of the random package, to generate integers randomly. The following call generates the integer 4, 5, 6 or 7 randomly. 8 is not included. import numpy as np np.random.randint(4, 8) Input import numpy as np np.random.seed(123) Use randint() with the appropriate arguments to randomly generate the integer 1, 2, 3, 4, 5 or 6. This simulates a dice. Print it out. Repeat the outcome to see if the second throw is different. Again, print out the result.

items

Calling the method ite__________ () will generate a key and value in each iteration. For 2D numpy array, use np.nditer

plt.xlabel(xlab) plt.ylabel(ylab) plt.title(title) plt.show()

Create a scatter plot with year and pop. Make sure to add plt.scale('log') Create the following strings: xlab = 'GDP per Capita [in USD]' ylab = 'Life Expectancy [in years]' title = 'World Development in 2007' Add the axis labels Add the title Show the plot

offset = 8 while offset != 0 : print("correcting...") offset = offset - 1 print(offset)

Create the variable offset with an initial value of 8. Code a while loop that keeps running as long as offset is not equal to 0. Start with *while offset* Inside the while loop: Print out the sentence "correcting...". Next, decrease the value of offset by 1. You can do this with offset = offset - 1. Finally, still within your loop, print out offset so you can see how it changes.

countries = ['spain', 'france', 'germany', 'norway'] capitals = ['madrid', 'paris', 'berlin', 'oslo'] ind_ger = countries.index('germany') print(capitals[ind_ger])

Dictionary Formula Example: world = {"afganistan":30.55, "albania":2.77, "algeria":39.21} Create a dictionary for countries and add spain, france, germany, norway. Create a dictionary for capitals and add cities madrid, paris, berlin, oslo Use the index() method on countries to find the index of 'germany'. Store this index as ind_ger. example: countries.index() Use ind_ger to access the capital of Germany from the capitals list. Print it out. example: (capitals[index])

w = dataset["Weight"] between = np.logical_and( w> 100, w < 200) medium = dataset[between]

Example: cpc = cars['cars_per_cap'] between = np.logical_and(cpc > 10, cpc < 80) medium = cars[between] import pandas as pd dataset = pd.read_csv('baseball.csv', index_col = 0) Select the Weight column from dataset and store it as w Create variable between that has w greater than 100 and less than 200 (Use np.logical_and) Create variable medium with subset between with data

dataframe

Filter observations from a Da_______ Fra________ Examples: import numpy as np np.logical_and(dataset["Height"] > 60, dataset["Height"] < 80) dataset[np.logical_and(dataset["Height"] > 60, dataset["Height"] < 80)]

columns rows label integer

For iloc, it's like this. loc and iloc are pretty similar, the only difference is how you refer to colu______________ and ro______________. With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is lab____________-based, which means that you have to specify rows and columns based on their row and column lab_____________. iloc is inte_____________r index based, so you have to specify rows and columns by their inte_______________ index like you did in the previous exercise.

plt.yticks([0, 2, 4, 6, 8, 10])

If you want an axis to start from a numbering of 0, then use the formula: plt.yti______([number sequence])

plt.xlabel ('name') plt.ylabel ('name') plt.title ('name')

If you want to add a title to the x and y axis and a title to the table, follow this code: plt.___label ('name') plt.___label ('name') plt.ti______('name')

for xcom in np_height : print(str(xcom) + " " + "inches") for xwing in np.nditer(np_baseball): print(xwing)

If you're dealing with a 1D Numpy array, looping over all elements can be as simple as: for x in my_array : ... If you're dealing with a 2D Numpy array, it's more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you'll need this syntax: for x in np.nditer(my_array) : ... Import the numpy package under the local alias np. Write a for loop that iterates over all elements in np_height and prints out "x inches" for each element, where x is the value in the array. Write a for loop that visits every element of the np_baseball array and prints it out.

import numpy as np np_pop = np.array(pop) np_pop = np_pop * 2 plt.scatter(gdp_cap, life_exp, s = np_pop) plt.xscale('log') plt.xlabel('GDP per Capita [in USD]') plt.ylabel('Life Expectancy [in years]') plt.title('World Development in 2007') plt.xticks([1000, 10000, 100000],['1k', '10k', '100k']) plt.show()

Import the numpy package as np. Use np.array() to create a numpy array from the list pop. Call this Numpy array np_pop. Double the values in np_pop setting the value of np_pop equal to np_pop * 2. Because np_pop is a Numpy array, each array element will be doubled. Change the s argument inside plt.scatter() to be np_pop instead of pop. Add the following customization: plt.xscale('log') plt.xlabel('GDP per Capita [in USD]') plt.ylabel('Life Expectancy [in years]') plt.title('World Development in 2007') plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])

for capital, value in europe.items() : print("the capital of " + " " + capital+ " " + "is" + " " + str(value))

In Python 3, you need the items() method to loop over a dictionary: Example: world = { "afghanistan":30.55, "albania":2.77, "algeria":39.21 } for key, value in world.items() : print(key + " -- " + str(value)) Insert this: europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' } Write a for loop that goes through each key:value pair of europe. On each iteration, "the capital of x is y" should be printed out, where x is the key and y is the value of the pair.

for xcom, ycom in data.iterrows(): data.loc[xcom, "TEAM"] = (ycom["Team"].upper())

In the video, Hugo showed you how to add the length of the country names of the brics DataFrame in a new column: for lab, row in brics.iterrows() : brics.loc[lab, "name_length"] = len(row["country"]) You can do similar things on the cars DataFrame. First get the baseball.csv table For the first half create a new column called "TEAM" and take the values from the team column and make it all capitals by using upper() at the end of the referenced column.

print(my_house >= 18) print(my_house < your_house)

Input these arrays: import numpy as np my_house = np.array([18.0, 20.0, 10.75, 9.50]) your_house = np.array([14.0, 24.0, 14.25, 9.0]) Which areas in my_house are greater than or equal to 18? You can also compare two Numpy arrays element-wise. Which areas in my_house are smaller than the ones in your_house? Make sure to wrap both commands in a print() statement so that you can inspect the output!

my_kitchen = 18.0 your_kitchen = 14.0 print(my_kitchen > 10 and my_kitchen < 18) print(my_kitchen < 14 or my_kitchen > 17) print(my_kitchen * 2 < your_kitchen * 3)

Insert these variables: my_kitchen = 18.0 your_kitchen = 14.0 Write Python expressions, wrapped in a print() function, to check whether: my_kitchen is bigger than 10 and smaller than 18. my_kitchen is smaller than 14 or bigger than 17. double the area of my_kitchen is smaller than triple the area of your_kitchen.

for xcom, row in dataset.iterrows(): print(xcom) print(row)

Iterating over a Pandas DataFrame is typically done with the iterrows() method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available: for lab, row in brics.iterrows() : ... print(lab) print(row) import pandas as pd data = pd.read_csv('baseball.csv', index_col = 0) Write a for loop that iterates over the rows and on each iteration perform two print() calls: one to print out the row label and one to print out all of the rows contents.

immutable

Keys have to be "imm_______________" objects

lists dictionaries lists dictionaries

Lis________ is a sequence of values that are indexed by a range of numbers. Dicti________________ are indexed by unique keys. If you have a collection of values and order matters and you need to select entire subsets, then use ___________. If you need to make a lookup table with unique keys then use ___________.

random_walk = [0] for x in range(100) : step = random_walk[-1] step = max(0, step - 1)

Make a list random_walk that contains the first step, which is the integer 0. Finish the for loop: The loop should run 100 times. On each iteration, set step equal to the last element in the random_walk list. You can use the index -1 for this. Next, let the if-elif-else construct update step for you. The code that appends step to random_walk is already coded. Print out random_walk. Then MODIFY the step so that it cannot go below 0 Use MAX * step = step - 1 * Follow: set seed to 123 np.random.seed(123) # Initialize random_walk # Complete the ___ for x in ___(___) : # Set step: last element in random_walk ___ # Roll the dice dice = np.random.randint(1,7) # Determine next step if dice <= 2: step = step - 1 elif dice <= 5: step = step + 1 else: step = step + np.random.randint(1,7) # append next_step to random_walk random_walk.append(step) # Print random_walk

import numpy as np my_house = np.array([18.0, 20.0, 10.75, 9.50]) your_house = np.array([14.0, 24.0, 14.25, 9.0]) print(np.logical_or(my_house > 18.5, my_house < 10)) print(np.logical_and(my_house < 11, your_house < 11))

Numpy Examples logcal_and() logical_or() logical_not() np.logical_and (bmi > 21, bmi < 22) bmi[np.logical_and(bmi > 21, bmi 22)] Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the boolean operators and, or, and not. To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here's an example on the my_house and your_house arrays from before to give you an idea: np.logical_and(my_house > 13, your_house < 15) Add these arrays into Python: import numpy as np my_house = np.array([18.0, 20.0, 10.75, 9.50]) your_house = np.array([14.0, 24.0, 14.25, 9.0]) Generate boolean arrays that answer the following questions: Which areas in my_house are greater than 18.5 or smaller than 10? Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that you can inspect the output.

series

Panda Se_______ using operators Example: is_huge = dataset["Height"] > 75 dataset[is_huge] dataset[dataset["Height"] >75]

high manipulation tool Wez McKinny Numpy tabular dataframe

Pandas is a hi_______________ level data manip____________ tool developed by We__________ McKin_________, built on the Num_____________ package. Compared to Numpy, it's more high level, making it very interesting for data scientists all over the world. In pandas, we store the tabu_____________ data like the brics table here in an object called a DataFr____________.

print(cars.iloc[1]) print(cars.iloc[[4,6]])

Print the iloc for index 1 as a panda series HInt: print(cars.iloc[index]) Print the iloc for index 4 and 6 as a DataFrame

range append

Random Walk We can nitialize an empty list "outcomes". Next, we build a for loop that should run ten times. We can do this with the ran________() function, that generates a list of numbers that you can use to iterate over. You can use app________________ to generate a value for the empty list.

import numpy as np np.random.seed(123) print(np.random.rand())

Random float Randomness has many uses in science, art, statistics, cryptography, gaming, gambling, and other fields. You're going to use randomness to simulate a game. All the functionality you need is contained in the random package, a sub-package of numpy. In this exercise, you'll be using two functions from this package: seed(): sets the random seed, so that your results are reproducible between simulations. As an argument, it takes an integer of your choosing. If you call the function, no output will be generated. rand(): if you don't specify any arguments, it generates a random float between zero and one. Import numpy as np. Use seed() to set the seed; as an argument, pass 123. Generate your first random float with rand() and print it out. Print np.random.rand()

molecules financial

Random walk: 1. Path of mole______________ 2. Gamblers finan____________ status

string alphabetical

Remember that for str__________ comparison, Python determines the relationship based on alp___________________ order.

print(cars[0:3]) print(cars[3:6])

Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame: data[0:5] Pay attention: You can only select rows using square brackets if you specify a slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels! Select the first 3 observations from cars and print them out. * create a slice for cars using parentheses and experiment until you get the first 3 rows Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.

tabular label rows columns dictionary

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tab______________ data where you can lab__________ the ro______________ and the colu_______________. One way to build a DataFrame is from a dicti__________________.

names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'] dr = [True, False, False, False, True, True, True] cpc = [809, 731, 588, 18, 200, 70, 45] cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc } cars = pd.DataFrame(cars_dict) print(cars) row_labels = [1, 2, 3, 4, 5, 6, 7] cars.index = row_labels

The Python code that solves the previous exercise is included on the right. Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6? To solve this a list row_labels has been created. You can use it to specify the row labels of the cars DataFrame. You do this by setting the index attribute of cars, that you can access as cars.index. Hit Run Code to see that, indeed, the row labels are not correctly set. Add the following into Python: names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'] dr = [True, False, False, False, True, True, True] cpc = [809, 731, 588, 18, 200, 70, 45] cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc } cars = pd.DataFrame(cars_dict) print(cars) Create a row label list with values1 , 2, 3 , 4 , 5, 6, 7, Example: row_labels = [ ] Specify the row labels by setting cars.index equal to row_labels. Print out cars again and check if the row labels are correct this time.

for xcom, ycom in data.iterrows(): print(xcom + ":" + " " + str(ycom["Weight"]))

The row data that's generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets: for lab, row in brics.iterrows() : print(row['country']) Get the baseball.csv Choose the weight and the name of the athlete from the baseball csv Remember to put the column name in brackets

The while loop is like a repeated if statement. The code is executed over and over again, as long as the condition is True. Can you tell how many printouts the following while loop will do? x = 1 while x < 4 : print(x) x = x + 1

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' } europe['italy'] = 'rome' print('italy' in europe) europe['poland'] = 'warsaw' print(europe)

To add to a dictionary, follow this this example: world["sealand"] = 0.000027 "sealand" in world You can also remove by using example: del(world["sealand"]) Add this: europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' } Add the key 'italy' with the value 'rome' to europe. To assert that 'italy' is now a key in europe, print out 'italy' in europe. Example: print('shanghai' in asia) Example: print('country' in europe) Add another key:value pair to europe: 'poland' is the key, 'warsaw' is the corresponding value. Print out europe. Add dictionary: europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }

medium

To experiment with if and else a bit, have a look at this code sample: area = 10.0 if(area < 9) : print("small") elif(area < 12) : print("medium") else : print("large") What will the output be if you run this piece of code in the IPython Shell? small, medium or large?

condition : expression

While loop: while cond______________ : expr___________________ error = 50.0 while error = 1: error = error / 4 print(error)

for area in areas: print(area)

Write a for loop that iterates over all elements of the areas list and prints out every element separately areas = [11.25, 18.0, 20.0, 10.75, 9.50]

You can check if two values are equal by using the _____________________ sign

Scatter plot

You're a professor in Data Analytics with Python, and you want to visually assess if longer answers on exam questions lead to higher grades. Which plot do you use? Line plot Scatter plot Histogram

Histogram

You're a professor teaching Data Science with Python, and you want to visually assess if the grades on your exam follow a particular distribution. Which plot do you use? Line plot Scatter plot Histogram

for index, area in enumerate(areas) : print("room" + str(index) + ":" + str(area))

areas = [11.25, 18.0, 20.0, 10.75, 9.50] for a in areas : print(a) Adapt the for loop in the sample code to use enumerate() and use two iterator variables. Update the print() statement so that on each run, a line of the form "room x: y" should be printed, where x is the index of the list element and y is the actual list element, i.e. the area. Make sure to print out this exact string, with the correct spacing. Follow Example: fam = [1.73, 1.68, 1.71, 1.89] for index, height in enumerate(fam) : print("person " + str(index) + ": " + str(height))

countries = ['spain', 'france', 'germany', 'norway'] capitals = ['madrid', 'paris', 'berlin', 'oslo'] europe = { 'spain':'madrid', 'france': 'paris', 'germany':'berlin', 'norway':'oslo' }

countries = ['spain', 'france', 'germany', 'norway'] capitals = ['madrid', 'paris', 'berlin', 'oslo'] Use the above information again. my_dict = { "key1":"value1", "key2":"value2", } In this recipe, both the keys and the values are strings. With the strings in countries and capitals, create a dictionary called europe with 4 key:value pairs. Beware of capitalization! Make sure you use lowercase characters everywhere. Print out europe to see if the result is what you expected. example: europe = { 'spain':'madrid'......

series dataframe

data["country"] -> Will show the column as a 1 dimensional array se_____________. In a simplified sense, you can think of the Series as a 1-dimensional array that can be labeled, just like the DataFr_________. Otherwise put, if you paste together a bunch of Series, you can create a DataFr____________. Double Brackets data[["country"]] -> Will show the column and keep the dataframe. This creates a sub category. Slices can also be used to extract rows. Example: data[1:4]

# Definition of dictionary europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn', 'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'australia':'vienna' } europe['germany'] = 'berlin' del(europe['australia']) print(europe)

europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn', 'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'australia':'vienna' } The capital of Germany is not 'bonn'; it's 'berlin'. Update its value. Australia is not in Europe, Austria is! Remove the key 'australia' from europe. Example: You can also remove by using example: del(world["sealand"]) Print out europe to see if your cleaning work paid off.

europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' } print(europe.keys()) print(europe['norway'])

europe['france'] Here, 'france' is the key and 'paris' the value is returned. Add the following dictionary: europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' } Check out which keys are in europe by calling the keys() method on europe. Print out the result. Example: europe.keys() Print out the value that belongs to the key 'norway'. Example: (europe[ ] )

europe = { 'spain': { 'capital':'madrid', 'population':46.77 }, 'france': { 'capital':'paris', 'population':66.03 }, 'germany': { 'capital':'berlin', 'population':80.62 }, 'norway': { 'capital':'oslo', 'population':5.084 } } print(europe['france']['capital']) data = {'capital':'rome', 'population':59.83} europe['italy'] = data print(europe)

europe['spain']['population'] Remember lists? They could contain anything, even other lists. Well, for dictionaries the same holds. Dictionaries can contain key:value pairs where the values are again dictionaries. Add the following dictionary: europe = { 'spain': { 'capital':'madrid', 'population':46.77 }, 'france': { 'capital':'paris', 'population':66.03 }, 'germany': { 'capital':'berlin', 'population':80.62 }, 'norway': { 'capital':'oslo', 'population':5.084 } } Use chained square brackets to select and print out the capital of France. Example: print(asia['china']['capital']) Create a dictionary, named data, with the keys 'capital' and 'population'. Set them to 'rome' and 59.83, respectively. Example: { 'capital':'oslo', 'population':5.084 } Add a new key-value pair to europe; the key is 'italy' and the value is data, the dictionary you just built. Example: asian["China"] = data Print europe

enumerate

for loop for var in seq: expression Example: fam = [1.73, 1.68, 1.71, 1.89] for height in fam: print(height) displaying the index enum_______________ Example: fam = [1.73, 1.68, 1.71, 1.89] for index, height in enumerate(fam): print("index" + str(index) + ":" + str(height))

for room in (house): print("the" + " " + room[0] + " " + "is" + " " + str(room[1]) + " " + "sqm");

house = [["hallway", 11.25], ["kitchen", 18.0], ["living room", 20.0], ["bedroom", 10.75], ["bathroom", 9.50]] Write a for loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room. Example: "the" + room[0] + room[0] = the first column and room[1] is the second column USE + " " to create space

condition expression

if cond________________: expre________________ Example: z = 5 if z % 2 == 0 : print("z is even") else : print("z is odd")

import pandas as pd data = pd.read_csv('baseball.csv', index_col = 0) print(data["Height"]) print(data[["Height"]]) print(data[['Height', 'Name']])

import pandas as pd data = pd.read_csv('baseball.csv', index_col = 0) Use single square brackets to print out the height column of data as a Pandas Series. Use double square brackets to print out the height column of data as a Pandas DataFrame. Use double square brackets to print out a DataFrame with both the height and name columns of data, in this order. Remember to use print function with brackets and parentheses and quotation marks for the column names.

lbs = dataset["Weight"] sel = dataset[lbs] print(sel)

import pandas as pd dataset = pd.read_csv('baseball.csv', index_col = 0) Extract the Weight column as a Pandas Series and store it as lbs Use lbs, a boolean Series, to subset the dataset DataFrame. Store the resulting selection in sel. Print sel NEXT: Remove lbs and use sel = dataset[dataset['Weight']] Print sel

w = dataset["Weight"] normal = w > 50 fatso = w[normal] print(fatso)

import pandas as pd dataset = pd.read_csv('baseball.csv', index_col = 0) Select the Weight column from dataset and store it as w Store a value when w is > 50 in normal Store subset normal with w in fatso print fatso

print(cars.country[1]) print(cars.iloc[[2,3], [1,2]])

loc and iloc also allow you to select both rows and columns from a DataFrame. print out the country value of index 3 for cars print out a sub-dataframe (cars.iloc) for index 2 and 3 for columns 1 and 2. Example: [[1,5],[4,6]]

offset = -6 while offset != 0: print("correcting") offset = offset + 1 print(offset)

offset = - 6 Inside the while loop with WHILE offset !=0, If offset is is not equal to zero, you should increase offset by 1. If you've coded things correctly, hitting Submit Answer should work this time.

plt.scatter(gdp_cap, life_exp) plt.xscale('log') plt.xlabel('GDP per Capita [in USD]') plt.ylabel('Life Expectancy [in years]') plt.title('World Development in 2007') tick_val = [1000, 10000, 100000] tick_lab = ['1k', '10k', '100k'] plt.xticks(tick_val, tick_lab) plt.show()

plt.yticks([0,1,2], ["one","two","three"]) In this example, the ticks corresponding to the numbers 0, 1 and 2 will be replaced by one, two and three, respectively. Let's do a similar thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000, 10000 and 100000 should be replaced by 1k, 10k and 100k. To this end, two lists have already been created for you: tick_val and tick_lab. Use tick_val and tick_lab as inputs to the xticks() function to make the the plot more readable. As usual, display the plot with plt.show() after you've added the customizations. Create a scatter plot for gdp_cap and life_exp Use these settings: plt.xscale('log') plt.xlabel('GDP per Capita [in USD]') plt.ylabel('Life Expectancy [in years]') plt.title('World Development in 2007') Adapt these into the plt.xticks tick_val = [1000, 10000, 100000] tick_lab = ['1k', '10k', '100k']

plt.hist(pop, 5) plt.show() plt.clf()

pop= [100,200,300] By default, Python sets the number of bins to 10 in that case. The number of bins is pretty important. Too few bins will oversimplify reality and won't show you the details. Too many bins will overcomplicate reality and won't show the bigger picture. Build a histogram of pop, with 5 bins. Can you tell which bin contains the most observations? Show then clean.

elif area > 10 : print("medium size, nice!")

room = "bed" area = 14.0 if room == "kit" : print("looking around in the kitchen.") elif room a== "bed": print("looking around in the bedroom.") else : print("looking around elsewhere.") if area > 15 : print("big place!") else : print("pretty small.") Add an elif to the second control structure such that "medium size, nice!" is printed out if area is greater than 10.

else: print("pretty small.")

room = "kit" area = 14.0 if room == "kit" : print("looking around in the kitchen.") else : print("looking around elsewhere.") if area > 15 : print("big place!") -> ADD ELSE HERE Add an else statement to the second control structure so that "pretty small." is printed out if area > 15 evaluates to False.

import matplotlib.pyplot as plt plt.plot(year,pop) plt.show()

year = [2000,2001,2002] pop= [100,200,300] Use plt.plot() to build a line plot.

plt.scatter(year, pop) plt.xscale('log') plt.show() plt.hist(year)

year = [2000,2001,2002] pop= [100,200,300] Use these variables to create a scatter plot and then create a histogram with the population

Intermediate Python

Kaugnay na mga set ng pag-aaral

OB Exam Two

unit 2 chapter 3 notes

Econ

gastrointestinal system

ECON FINAL EXAM

English midterm exam

UNIT 1 - READING & CRITICAL THINKING

PHARM FINAL

MUSC CH 12

Mod 01 Taking a Computer Apart and Putting it Back Together

Civilization of Asia: Japan (Final)

Chap 1.

Lesson 113 - GFCI, AFCI, and Other Special-Purpose Receptacles Quiz

Art 1011 Quiz 1

Exam 4 Bio 1050

Consumer Behavior Exam 2 Jeopardy

Biology Game Questions: Biochemistry

Diabetes Mellitus (Ch. 48-Section 10)

Microbiology Quiz Questions

Final Study Guide for "Projekt 1065."