Data Analysis with Python Week 3 Exploratory Data Analysis
Use describe() for objects
df.describe(include=['object'])
Find the correlation between several columns
df[['col_A', 'col_B', 'col_C', 'col_D']].corr()
A dataframe, df_NHL_players has a categorical column named 'position'. Get the number of players in each position
position_cnts = df_NHL_players['position'].value_counts()
Make boxplots for all in a particular category
sns.boxplot(x="col_category", y="col_quantity", data=df)
Install seaborn
%%capture ! pip install seaborn import seaborn as sns
Plot values from one column on the vertical axis and another column on the horizontal axis. The lower limit the vertical axis should be zero.
sns.regplot(x="engine-size", y="price", data=df) plt.ylim(0,)
A dataframe, df_NHL_players has a categorical column named 'position'. Get the number of players in each position and then cast those numbers into a dataframe
position_cnts = df_NHL_players['position'].value_counts() df_position_cnts = position_cnts.to_frame()
Plot values from one column on the vertical axis and another column on the horizontal axis
sns.regplot(x="col_A", y="col_B", data=df)