Python- Pandas
If a column of data actually holds two variables instead of one variable we will have to ____ or cast the variable into separate columns.
pivot
Violin plots are able to show the same values as a _________, but plot the boxes as a kernel density estimation.
Boxplot
________ are used when a discrete variable is plotted against a continuous variable.
Boxplots
The _____ is the most common Pandas object.
DataFrame
T or F. A One-to-One Merge of two dataframes is where we want to join one column to another column and the columns we want to join DO contain duplicate values.
False
T or F. In pandas, strings CANNOT BY CONVERTED to a proper datetime type, therefore no calculations on string type common date and time operations can be done
False
T or F. In pandas, strings CANNOT BY CONVERTED to a proper datetime type, therefore no calculations on string type common date and time operations can be done.
False
T or F. In statistics jargon, the term bivariate refers to a more than 2 variables.
False
T or F. None of the features of the Series data structure carry over into the DataFrame.
False
T or F. None of these features of the Series data structure carry over into the DataFrame.
False
T or F. Pandas has a pd.merge command that uses pd.join under the hood. So the merge uses the join internally.
False
T or F. The aggregate, transform, and filter methods are commonly used ways of working with ungrouped objects in Pandas.
False
T or F. The string function capitalize will result in the Capitalization of all words in a string.
False
T or F. The tail function returns the top five rows of data.
False
T or F. This DataFrame subsetting method, df[column_name], is to section off multiple columns
False
T or F. Using the to_numeric function, if we pass the errors parameter the 'ignore' value, nothing will change in our column but we would get an error message.
False
Histograms can bin a variable to create a bar but a _____ can bin two variables
Hexbin
______________ are the most common means of looking at a single variable.
Histograms
What function is used to open and read in a csv file in Pandas?
read_csv
The subplot syntax takes three parameters in the figuare() that include all EXCEPT ___________.
type of graph
T or F. Existing columns CANNOT be directly changed.
False
Missing values may be used or displayed in a 3 ways in python, all EXCEPT _____.
NULL
There are many representations of missing data. In databases, they are called ____ values.
NULL
Depending on where you get your data, missing values can be an empty string "" or even numeric values such as 88 or 99. Pandas displays missing values as ___.
NaN
If you want an ordered dictionary, you need to use the _______ from the collections module.
OrderedDict
Subsetting based on index label (row name) uses the ___ attribute.
loc
Pandas has a function called _____ that will reshape the dataframe into a tidy format
melt
Pandas can export and import data using any of these functions with DataFrames EXCEPT:
to_data
Split and Add Columns Individually (the Simple Method) of seperating values on an underscore is based on the Python function ____.
Split
T or F. When using the read_csv(), the na_values parameter allows you to specify additional missing or NaN values. Although not often used it would allowing you to create your own missing value replacement
True
The numba library has a _______ decorator just like the numpy library
Vectorize
The names of most of the basic plots will start with plt.plot() where plt is the _____ to the imported library
alias
Using the value_counts method on a series will return a frequency table of values that can be printed. The to_numeric function has a parameter called errors that governs what happens when the function encounters a value that it is unable to convert to a numeric value. The errors parameters has three possible values INCLUDING ALL BUT:
bypass
Data can have columns that contain values instead of variables. This is usually a convenient format for data ___________ and presentation.
collection
What data is returned if the print(df.types) is executed?
column headers and data type of header
Concatenation is accomplished by using the _______ function from Pandas.
concat
Split and Combine in a Single Step (Simple Method) uses the ______ function.
concat
One of the (conceptually) easier ways to combine data is with ______.
concatenation
Pandas is an open source Python library for ______________.
data analysis
Shape is an attribute of the dataframe, not a function or method of the DataFrame and is coded as:
df.shape
The quintessential example for creating visualizations of data is Anscombe s quartet, so called because the data set contains ____ sets of data, each of which contains two continuous variables.
four
The data type that is stored in a column will govern which kinds of _________ and calculations you can perform on the data found in that column.
functions
Working with grouped/aggregate computations is done with the ______ method on dataframes.
groupby
Subsetting based on row index (row number) uses the ____attribute
iloc
The join parameter on the concat function can be set to join='____' to keep only the columns that are shared among the data sets.
inner
Many functions and methods in pandas will have an ____ parameter that you can set to ____, if you want to perform the action in place.
inplace, True
When concatenate rows with different columns, the use of a ______ parameter in the concat function can help match up the columns.
join
Dictionaries are the most common way of creating a DataFrame. The ____ represents the column name, and the ____ are the contents of the column.
key, values
To write the ______ function, we use the ______ keyword.
lambda
Methods That Can Be Performed on a Series include all BUT:
len
The NaN value in Pandas comes from the _____ library.
numpy
By default, barplot will calculate a mean, but you can pass any function into the ______ parameter.
estimator
To drop a column, select the columns to drop with the ______ method on the dataframe.
drop
one way to get missing data counts is to use the value_counts method on a series. If you use the ______ parameter, you can also get a missing value count.
dropna
T or F. The float64 and int64 types represent numeric values without and with decimal points, respectively
False
T or F. A One-to-One Merge of two dataframes is where we want to join one column to another column and the columns we want to join DO NOT contain any duplicate values.
True
T or F. A data set might be split into multiple parts due to the data collection process.
True
T or F. When data is missing there is no concept of equality. NaN is not be equivalent to 0 or empty string.
True
To convert values into strings, we use the ____ method on the column.
astype
DataFrame can be thought of as a __________ of series objects
dictionary
After the Split and Add Columns Individually with the Simple Method is executed on the underscore, the values are returned in a ____.
list
The shape attribute returns a _____ in which the first value is the number of rows and the second number is the number of columns in a DataFrame.
tuple
In the melt function, the _____ parameter identifies the columns you want to melt down (or unpivot).
value_vars
Pandas also has methods for testing non-missing values that is called via pd.______.
notnull()
Using the__________ and __________ methods, respectively will get counts of unique values and frequency counts on a Pandas Series
nunique, value_counts
Using the__________ and __________ methods, respectively will get counts of unique values and frequency counts on a Pandas Series.
nunique, value_counts
By default, plt.plot will draw lines. If we want it to draw circles (points) instead, we can pass a(n) '_' parameter to tell plt.plot to use points.
o
The join parameter has a default value that is set to '_______' that means it will keep all the columns.
outer
When working with Panda dataframe you use ________ notation.
.dot
When working with Panda dataframes you use _____________ notation.
.dot
The string method _____ will remove leading and trailing whitespace from a string.
Strip
T or F. A boxplot shows multiple statistics: the minimum, first quartile, median, third quartile, maximum, and, if applicable, outliers based on the interquartile range
True
T or F. Another benefit of saving just the groupby object is that you can then iterate though the groups individually.
True
T or F. Columns that contain string data such as a country are considered Pandas type of object
True
T or F. Comma-separated values (CSV) are the most flexible data storage type.
True
T or F. Data types determine what can and cannot be done to a variable (i.e., column).
True
T or F. Grouped operations are a powerful way to aggregate, transform, and filter data.
True
T or F. In a Many-to-One Merge, one of the dataframes has key values that repeat and the dataframes that contains the single observations will then be duplicated in the merge.
True
T or F. In general, it s much easier to work with string object types when the variable is not a numeric number.
True
T or F. In statistics jargon, the term multivariate refers to a more than 2 variables and can be difficult to visualize.
True
T or F. Numpy is a scientific computing library that typically deals with numeric vectors
True
T or F. Pandas data structure known as Series is very similar to the numpy.ndarray
True
T or F. The Series data structure does not have an explicit to_excel method.
True
T or F. The column names, like shape, are specified using the column attribute of the dataframe object.
True
T or F. The goal of the Anscombe's quartet is to show the benefits of visualizations and the pitfalls of looking at only summary statistics.
True
T or F. The head function returns top five rows of data.
True
T or F. The matplotlib library is Python's fundamental plotting library
True
T or F. The splitlines method is similar to the split method. It is typically used on strings that are multiple lines long, and will return a list in which each element of the list is a line in the multiple-line string.
True
T or F. Tidy data is a framework to structure data sets so they can be easily analyzed.
True
T or F. When we transform data, we pass values from our dataframe into a function. The function then transforms the data.
True
T or F. When you apply a function across a DataFrame (in this case, column-wise with axis=0), the entire axis (e.g., column) is passed into the first argument of the function.
True
T or F. You use a programming language like Python and tool like Pandas to work with data because of automation and reproducibility.
True
If you just needed to add a single object to the end of an existing dataframe, the _________ function can handle that task.
append
Data _____________ refers to assembling a data set for analysis by combining various data sets together
assembly
Concatenating columns is very similar to concatenating rows. The main difference is the use of ____ parameter in the concat function.
axis
We were able to subset a Series with a boolean vector, so we can subset a DataFrame with a _____.
bool
Pandas supports _______, which comes from the numpy library which describes what happens when performing operations between array-like objects like Series and DataFrames.
broadcasting
Shape is an attribute of the dataframe, not a function or method of the dataframe and is coded as:
df.shape
In a histogram, the values are binned, meaning they are grouped together and plotted to show the __________ of the variable.
distribution
After performing a merge on tables, a lot of _____ information (data) can appeared across the tables rows
redundant
While performing an action on dataframe with a _____, it will try to apply the operation on each cell of the dataframe.
scalar
A Series is a ______ ________ of the DataFrame.
single column
To import matplotlib into your project you use the import matplotlib.pyplot as plt because the .pyplot is a _______.
subfolder
If you wish to plot more than one set of data for comparison, matplotlib offers a much handier way to create ______.
subplots
If we want to examine multiple columns, we can specify them by names, positions, or ranges. This is called ________ columns.
subsetting
If we want to examine multiple columns, we can specify them by names, positions, or ranges. This is called _________ columns.
subsetting