Data Visualization
Input the following! # Change the style of the figure sns.set_style("dark") # Line chart plt.figure(figsize=(12,6)) sns.lineplot(data=spotify_data)
Change the style of the figure to dark and use a figuresize of 12,6 and a lineplot with the spotify data. HINT: use set_style method with sns and choose dark
plt.figure(figsize=(8,6)) sns.barplot(x=ign_data["Puzzle"],y=ign_data.index) plt.title = "Average Score for Racing Games, by Puzzle"
Create a bar chart showing average score for puzzle games by platform
1) Use plt figure size should be 8,6 2) start with sns.barplot and x should equal ign_data and index 'Racing' and y should equal ign_data.index 4) plt.xlabel("") 5) Title should be "Average Score for Racing Games, by Platform"
Create a bar chart showing average score for racing games by platform
Use seaborn swarmplot with (x=candy_data['chocolate'], y=candy_data['winpercent'])
Create a categorical swarm plot to highlight the relationship between 'chocolate' and 'winpercent'.
1) Use plt figure size 10,10 2) Use seaborn heatmap with ign_data and annot = True 3. Use plt.xlabel("Genre") 4. Use plt title "Average Game Score, by Platform and Genre"
Create a heat map showing average score for by genre and platform
1. Use plt.figure to create a figsize = (12,6) 2. Use sns.lineplot and in a tuple brackets add the variable data = museum_data 3. plt.title your graph as "Monthly Visitors to Los Angeles City Museums"
Create a line chart showing the number of visitors to each museum over time
1. Use plt.figure to create a figsize of 12,6 2. Use sns.lineplot and in a brackets add the variable data = museum_data and index it with ['Avila Adobe'] 3. Title your graph as "Monthly Visitors to Avila Adobe" using plt.title 4. Create a plt.xlabel called "Date"
Create a line chart that shows how the number of visitors to Avila Adobe has evolved over time.
sns.lineplot(data=museum_data['Firehouse Museum'])
Create a lineplot that shows the data on Firehouse Museum
Use seaborn sns with method scatterplot with (x = candy_data['sugarpercent'], y=candy_data['winpercent']
Create a scatter plot that shows the relationship between 'sugarpercent' and 'winpercent'
Use seaborn sns with method regplot with (x = candy_data['sugarpercent'], y=candy_data['winpercent'])
Create a scatter plot that shows the relationship between 'sugarpercent' and 'winpercent' in a regression line
Use seaborn sns with method scatterplot with (x = candy_data['sugarpercent'], y=candy_data['winpercent'], hue=candy_data['chocolate'])
Create a scatter plot to show the relationship between 'pricepercent' and 'winpercent' with the 'chocolate' column to color-code the points
Use seaborn sns with method lmplot with (x="pricepercent", y="winpercent", hue="chocolate", data=candy_data) Do not create an index for each column but create a variable called data to equal candy.
Create a scatter plot to show the relationship between 'pricepercent' and 'winpercent' with the 'chocolate' column to color-code the points with TWO regression lines.
IGN SCORES import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns print("Setup Complete")
Create a variable called ign_data that saves the csv path file for the ign_scores and use index_col = "Platform"
import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns print("Setup Complete") spotify_data = pd.read_csv('C:\\Users\\PC\\Kaggle_CSV_Files\\spotify.csv',index_col="Date", parse_dates=True)
Create a variable for for spotify data and read the file path. Make sure the index column is "Date" and parse dates is True
sns.kdeplot(data=cancer_b_data['Radius (worst)'], shade=True, label="Benign") sns.kdeplot(data=cancer_m_data['Radius (worst)'], shade=True, label="Malignant")
Create two KDE plots that show the distribution in values for 'Radius (worst)' for both benign and malignant tumors. (To permit easy comparison, create a single figure containing both KDE plots in the code cell below.) HINT: Use data as variable, shade and label but no KDE variable
sns.distplot(a=cancer_b_data['Area (mean)'], label="Benign", kde=False) sns.distplot(a=cancer_m_data['Area (mean)'], label="Malignant", kde=False) plt.legend()
Create two histograms that show the distribution in values for 'Area (mean)' for both benign and malignant tumors. (To permit easy comparison, create a single figure containing both histograms in the code cell below.) HINT: a equal to variable and index ['Area (mean)'], label, and KDE variable is either true or false
import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns print("Setup Complete")
Setup your notebook using a 6 line code:
import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns print("Setup Complete") cancer_b_data = pd.read_csv('C:\\Users\\PC\\Kaggle_CSV_Files\\cancer_b.csv', index_col="Id") cancer_m_data = pd.read_csv('C:\\Users\\PC\\Kaggle_CSV_Files\\cancer_m.csv', index_col="Id")
Load the data file corresponding to benign tumors into a DataFrame called cancer_b. The corresponding filepath is cancer_b_filepath. Use the "Id" column to label the rows. Load the data file corresponding to malignant tumors into a DataFrame called cancer_m. The corresponding filepath is cancer_m_filepath. Use the "Id" column to label the rows.
import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns print("Setup Complete") candy_filepath = 'C:\\Users\\PC\\Kaggle_CSV_Files\\candy.csv' candy_data = pd.read_csv(candy_filepath, index_col="id")
Read the candy data file into candy_data. Use the "id" column to label the rows.
headline
What is the code to get the first 5 rows of data?
.tail()
What is the code to get the last 5 rows of data?