Matplotlib + Dictionaries & Pandas
plt.scatter()
- allows on to plot two lists onto a scatter plot - first argument is x-axis - second argument is y-axis
Pandas
- an open-source library that creates datasets used in Python that are more advanced than a Numpy array - in Pandas, one can analyze/manipulate different data types, unlike numpy arrays which would coerce types into uniformity
CSV meaning
- comma separated values - they can be contained in files used to store information
How to access a column in a dataset using Pandas
- dataset_name[["column_name"]] - multiple columns: dataset_name[["column_name", "column_name"]] - single bracket gives a Pandas Series, while double bracket gives a Pandas DataFrame
How to delete a key-value from a dictionary
- del(my_dic[key_value])
Python dictionary
- dictionary is used to correspond a value in one list to a value in another list - dictionaries are created using squiggle brackets - my_dic = {key_list_value : other_list_value} - use my_dic[value] to find the corresponding value
plt.plot()
- function that allows one to plot two lists onto a line on an x-y axis - first argument is x-axis - second argument is y-axis
plt.show()
- function that displays the x-y graph created by a plt.plot() function
purposes for data visualization
- helps one explore data - helps one report insights on the data
pandas iloc
- integer position-based row/column access - dataset_name.iloc[[index_value]] - *similar rules of loc apply to iloc*
Condition for keys in a dictionary
- keys have to be immutable objects - a list cannot occur as a key
pandas loc
- label based row/column access -dataset_name.loc[["row/column_name"]] - multiple sets: dataset_name.loc[["row/column_name", "row/column_name"]]
How to add more data onto lists used for plots
- list_name = [new values] + list_name
How to add or change a key-value to a dictionary
- my_dic[key_value] = list_value - when my_dic is called, the new or changed key-values will be added
key-value pairs
- name for corresponding values contained within a dictionary list
immutable objects
- objects that cannot be changed/manipulated - ex: floats, booleans, strings - lists are not immutable objects b/c they can be manipulated
How to add x-label, y-label, and title to a plot
- plt.xlabel('label_name') - plt.ylabel('label_name') - plt.title('title_name') - make sure to call these functions before calling the plt.show() function
How to change the numbers used for x-axis or y-axis
- plt.yticks([x, x, x,]) - plt.xticks([x, x, x,])
How to add label names for numbers used for x-axis or y-axis
- plt.yticks([x, x, x,], [x-label_name, x-label_name, x-label_name]) - plt.xticks([x, x, x,], [x-label_name, x-label_name, x-label_name])
How to access a row in a dataset using Pandas
- required to use a slice - dataset_name[row_index (inclusive) : row_index (exclusive)]
Why use a scatter plot?
- scatter plot is great to use when you are trying to find a correlation between two variables (instead of comparing a constant)
differences and similarities between lists and dictionaries
- sim: both can be selected, or have updates/deletions - lists: indexed by range of numbers, are a collection of values - dictionaries: indexed by unique keys, lookup table w/ unique keys
Putting a dictionary as a value within a dictionary
- this is possible! - to call for a value of a key within a within a key, use code: my_dic[key_value][key_value]
Example of DataFrame from a Dictionary using Pandas
- use dataset_name.index = [values] to label the rows of the dataset
How to import CSV data into Python
- use read_csv() function - ex: dataset_name = pd.read_csv('csv_file_name', index_col = x) - use index_col to specify which column in the CSV file should be used as a row label
histogram
- useful for exploring dataset - helps get idea about distribution
How to build a histogram using Matplotlib
1. assign name to value list 2. use plt.hist(list_name, bins=x) 3. use plt.show - when number of bins aren't specified, the bin number will automatically set to 10
How to use Pandas to create a Dataframe from a Dictionary
1. import pandas as pd 2. use dataset_name = pd.DataFrame(my_dic) - keys will become column labels - values will become data, column by column
pandas row & column loc slicing
dataset_name.loc[ : , ["column_name"]] - gives all the rows in the dataset - can use ":" for columns, as well
pandas row & column loc
dataset_name.loc[["row_name"], ["column_name"]] - calls rows and columns
conventional syntax for importing matplotlib
import matplotlib.pyplot as plt