Data Camp Intermediate
When you make a list into a dictionary, what do you use for the strings?
' ' not " " I think. Unless I just had to use ' ' because that was a part of the two lists that I was combining into a dictionary.
Numpy Arrays handle how many data types?
1
.loc
No i in it so it is not techy. This one you just write out the names of stuff not the index. Because you are making it easy on yourself, you have to make it harder by using double brackets. listname.loc[["UN"]] But sometimes .loc only requires single brackets and idk why. Maybe it was because the list item was only 1 thing? Idk
3 most common types of booleans
and or not
How do you get the value/definition out of a dictionary? For example, if Europe is your list and you want to know the capital of France, what do you do?
europe['france']
What is the difference between a line chart and a scatter chart?
A line chart has a line, a scatter plot just has the data points.
How to filter a panda series?
Access the column brics["areas"] listname["column_name"] Compare and store as a name brics["areas"] > 8 is_huge = brics["areas"] > 8 Subset the DataFrame brics[is_huge]
How can you remember what a key is?
By remembering that a key is basically a set of key value pairs (typically found in a dictionary based on the lesson but this verbage will probably also apply to a list and other things)
Mutable
Can change the contents after they are created.
Immutable
Cannot be changed
CSV stands for
Comma-separated values.
How do you label rows while also removing the automated row labels 0,1,2,3? Hint: pd.read_csv("path/to/brics.csv")
DataFrame_name = pd.read_csv('DataFrame_name.csv', index_col = 0) you add ,index_col = 0 to the end of the path so it exchanges the automated number labels into the names that you want.
To import CSV data into python as a pandas DataFrame you can use
DataFrame_name = read.csv() OR DataFrame_name = pd.read.csv(' ') DataFrame_name = pd.read.csv('stuff_you_want_to_covert')
How to label the side portion of a chart? (How do you create row labels)
DataFrame_name.index = ["A", "B"]
How can you access both a row and/or a column using the same method?
DataFrame_name.loc[[ " ", " " ] You type in names of stuff, you do not use slicing (numbers)
If you want to select the country column but keep the data in a DataFrame, you will need to use:
Double Square brackets bricks[["country"]]
Historgram
Helps you get an idea of the distribuion of your variables. It is the line chart with dots above it
What is the array version of the 3 most common booleans?
logical.and_(equations you want to put in) np.logical_and(bmi<3, bmi<4)
In order to create a histogram, you need to import
matplotlib.pyplot as plt
Sometimes creating row labels can take forever and it is typically already stored in a csv list. What would this typically be called?
name.csv brics.csv DataFramename.csv .csv Basically anything .csv CSV just stands for comma separated values
How do you make a historgram look nice? How do you clear things up?
plt.clf()
How to draw gridlines on a plot?
plt.grid(True)
How to create a basic plot?
plt.plot(x,y)
How to plot data as a line chart?
plt.plot(x,y)
How to display plt.plot(x,y)?
plt.show()
How display plt.scatter(x,y)?
plt.show() Do not put the list in the parenthesis. For example, to display (x,y) just do plt.show() NOT plt.show(x,y)
How do you title a graph?
plt.title{"World Population")
How to label axis?
plt.xlabel("year") plt.ylabel("population")
How do you put the x-axis on a logarithmic scale?
plt.xscale("log")
How do you specify at what point your y axis starts? For example, how would you specify to start your y axis at 0?
plt.yticks([0,2,4,6,8,10])
How do you give display names of each item?
plt.yticks([0,2,4,6,8,10]) (["0", "2b", "4b", etc.])
What is the difference between using loc and iloc
with DataFrame_name.loc you use the names of the columns with DataFrame.iloc you use index numbers The results are the same.
Dictionaries require
{}
Don't forget that when you are making a dictionary with values, you must always include a ____ after the {
A SPACE!!
series are
a one dimentional array that can be labeled, just like a DataFrame. You can create a DataFrame by pasting together a bunch of series.
What do you always need to remember with a boolean statement when not using the np.logical?
: The colen! For example: If area > 15 : print "Duh bish"
If you don't specify the bins argument in a histogram, it will be __ by default
10
Pandas
High level data manipulation tool built on numpy package. More high level. You can store tabular data like brics table in a DataFrame.
What is a key?
I think it is the list elements in a dictionary.
How are the true/false statements of booleans similar to Spanish?
If there is one male, it is ellos. If there is only females, it is ellas. It if is male/female, it is ellos. If there is one false, it is false. If there is only true, it is true. If it is true/false, it is false. It's like men are the false statements and women are true lol.
Keys in dictionaries should be
Immutable objects. The content of immutable objects cannot be changed.
How to create a DataFrame from a dictionary?
Import pandas as pd THEN create a DataFrame from the dictionary by using pd.DataFrame brics = pd.DataFrame(dict) brics = the column labels
What is % in Python?
It means divisible by
When can you use <= without np.logical_and? When do you have to use np.logical_and?
Operational operations (<, <=, etc.) work with Numpy areas out of the box. Boolean operators however, must use np.logical_and
Rows are __________ and columns are _________
Rows are observations and columns are variables
Datasets are not best using by numpy because
They have multiple data types, and it has strings/floats/and all different kinds of data types.
.iloc
Think iloc like iphone. It is techy and refers to numbers. It is also user friendly like an iphone and therefore only needs 1 bracket. listname.iloc[1:7]
What is .index used for?
To create row labels This is if you have manually typed out the row labels or it is already presented. Otherwise, you will have to use the read.csv() function and get the info from a CSV. This is typically more efficient as you would not normally write out all the row labels.
When do you use , in a dictionary as opposed to : ?
You use commas in scenarios like this my_dict = {'apples', 'green tea', 'burgers', 'bellini'} You use : when your dictionary has values specified. For example: my_dict = { 'apples': fruit, 'green tea': beverage, 'burgers':meat, 'bellini': booze} Notice how the values/definitions do not have ' '
To access a column
You use double brackets.
To access a row
You use slicing. DataFrame_name[1:4]
If I write z % 2 What does that mean?
Z is divisible by 2
You can print out columns using
[ ]
What is the difference between print something with [ ] and printing something with [[ ]]
[ ] prints out columns as a Panda Series [ [ ] ] prints out columns as a Pandas DataFrame
How do you convert the row labels into the names of the CSV file?
brics = pd.read_csv ("path/to/bricks.csv")
Let's say you only want to type out one column called country from brics. How do you do that?
brics["country"]
How to delete an item in a list?
del(listname['listitem'])
How to get help or clarification on how to create a historgram?
help(plt.hist)
How do you import matplotlib?
import matplotlib.plyplot as plt
How to create a scatter plot?
import matplotlib.pyplot as plt plt.scatter(x,y)
How do you turn a list into a row label?
listname.index = row_labels The listname.index refers to the definition or value of each item. For example, if you have a dictionary my_dict = { 'apple': fruit} then the row label will be fruit
How do you add a new item with a definition to a dictionary?
listname['newitem'] = 'yo'
How do you print out the keys of a dictionary?
print(listname.keys()) the () is meant to be blank. Do not forget to insert this.
How to print the last item in a list?
print[-1]
If you covert the row labels into CSV using the brics = pd.read_csv("path/to/brics.csv") the automated row labels 0,1,2,3 will ________
still appear.