Python- Pandas

Ace your homework & exams now with Quizwiz!

If a column of data actually holds two variables instead of one variable we will have to ____ or cast the variable into separate columns.

pivot

Violin plots are able to show the same values as a _________, but plot the boxes as a kernel density estimation.

Boxplot

________ are used when a discrete variable is plotted against a continuous variable.

Boxplots

The _____ is the most common Pandas object.

DataFrame

T or F. A One-to-One Merge of two dataframes is where we want to join one column to another column and the columns we want to join DO contain duplicate values.

False

T or F. In pandas, strings CANNOT BY CONVERTED to a proper datetime type, therefore no calculations on string type common date and time operations can be done

False

T or F. In pandas, strings CANNOT BY CONVERTED to a proper datetime type, therefore no calculations on string type common date and time operations can be done.

False

T or F. In statistics jargon, the term bivariate refers to a more than 2 variables.

False

T or F. None of the features of the Series data structure carry over into the DataFrame.

False

T or F. None of these features of the Series data structure carry over into the DataFrame.

False

T or F. Pandas has a pd.merge command that uses pd.join under the hood. So the merge uses the join internally.

False

T or F. The aggregate, transform, and filter methods are commonly used ways of working with ungrouped objects in Pandas.

False

T or F. The string function capitalize will result in the Capitalization of all words in a string.

False

T or F. The tail function returns the top five rows of data.

False

T or F. This DataFrame subsetting method, df[column_name], is to section off multiple columns

False

T or F. Using the to_numeric function, if we pass the errors parameter the 'ignore' value, nothing will change in our column but we would get an error message.

False

Histograms can bin a variable to create a bar but a _____ can bin two variables

Hexbin

______________ are the most common means of looking at a single variable.

Histograms

What function is used to open and read in a csv file in Pandas?

read_csv

The subplot syntax takes three parameters in the figuare() that include all EXCEPT ___________.

type of graph

T or F. Existing columns CANNOT be directly changed.

False

Missing values may be used or displayed in a 3 ways in python, all EXCEPT _____.

NULL

There are many representations of missing data. In databases, they are called ____ values.

NULL

Depending on where you get your data, missing values can be an empty string "" or even numeric values such as 88 or 99. Pandas displays missing values as ___.

NaN

If you want an ordered dictionary, you need to use the _______ from the collections module.

OrderedDict

Subsetting based on index label (row name) uses the ___ attribute.

loc

Pandas has a function called _____ that will reshape the dataframe into a tidy format

melt

Pandas can export and import data using any of these functions with DataFrames EXCEPT:

to_data

Split and Add Columns Individually (the Simple Method) of seperating values on an underscore is based on the Python function ____.

Split

T or F. When using the read_csv(), the na_values parameter allows you to specify additional missing or NaN values. Although not often used it would allowing you to create your own missing value replacement

True

The numba library has a _______ decorator just like the numpy library

Vectorize

The names of most of the basic plots will start with plt.plot() where plt is the _____ to the imported library

alias

Using the value_counts method on a series will return a frequency table of values that can be printed. The to_numeric function has a parameter called errors that governs what happens when the function encounters a value that it is unable to convert to a numeric value. The errors parameters has three possible values INCLUDING ALL BUT:

bypass

Data can have columns that contain values instead of variables. This is usually a convenient format for data ___________ and presentation.

collection

What data is returned if the print(df.types) is executed?

column headers and data type of header

Concatenation is accomplished by using the _______ function from Pandas.

concat

Split and Combine in a Single Step (Simple Method) uses the ______ function.

concat

One of the (conceptually) easier ways to combine data is with ______.

concatenation

Pandas is an open source Python library for ______________.

data analysis

Shape is an attribute of the dataframe, not a function or method of the DataFrame and is coded as:

df.shape

The quintessential example for creating visualizations of data is Anscombe s quartet, so called because the data set contains ____ sets of data, each of which contains two continuous variables.

four

The data type that is stored in a column will govern which kinds of _________ and calculations you can perform on the data found in that column.

functions

Working with grouped/aggregate computations is done with the ______ method on dataframes.

groupby

Subsetting based on row index (row number) uses the ____attribute

iloc

The join parameter on the concat function can be set to join='____' to keep only the columns that are shared among the data sets.

inner

Many functions and methods in pandas will have an ____ parameter that you can set to ____, if you want to perform the action in place.

inplace, True

When concatenate rows with different columns, the use of a ______ parameter in the concat function can help match up the columns.

join

Dictionaries are the most common way of creating a DataFrame. The ____ represents the column name, and the ____ are the contents of the column.

key, values

To write the ______ function, we use the ______ keyword.

lambda

Methods That Can Be Performed on a Series include all BUT:

len

The NaN value in Pandas comes from the _____ library.

numpy

By default, barplot will calculate a mean, but you can pass any function into the ______ parameter.

estimator

To drop a column, select the columns to drop with the ______ method on the dataframe.

drop

one way to get missing data counts is to use the value_counts method on a series. If you use the ______ parameter, you can also get a missing value count.

dropna

T or F. The float64 and int64 types represent numeric values without and with decimal points, respectively

False

T or F. A One-to-One Merge of two dataframes is where we want to join one column to another column and the columns we want to join DO NOT contain any duplicate values.

True

T or F. A data set might be split into multiple parts due to the data collection process.

True

T or F. When data is missing there is no concept of equality. NaN is not be equivalent to 0 or empty string.

True

To convert values into strings, we use the ____ method on the column.

astype

DataFrame can be thought of as a __________ of series objects

dictionary

After the Split and Add Columns Individually with the Simple Method is executed on the underscore, the values are returned in a ____.

list

The shape attribute returns a _____ in which the first value is the number of rows and the second number is the number of columns in a DataFrame.

tuple

In the melt function, the _____ parameter identifies the columns you want to melt down (or unpivot).

value_vars

Pandas also has methods for testing non-missing values that is called via pd.______.

notnull()

Using the__________ and __________ methods, respectively will get counts of unique values and frequency counts on a Pandas Series

nunique, value_counts

Using the__________ and __________ methods, respectively will get counts of unique values and frequency counts on a Pandas Series.

nunique, value_counts

By default, plt.plot will draw lines. If we want it to draw circles (points) instead, we can pass a(n) '_' parameter to tell plt.plot to use points.

o

The join parameter has a default value that is set to '_______' that means it will keep all the columns.

outer

When working with Panda dataframe you use ________ notation.

.dot

When working with Panda dataframes you use _____________ notation.

.dot

The string method _____ will remove leading and trailing whitespace from a string.

Strip

T or F. A boxplot shows multiple statistics: the minimum, first quartile, median, third quartile, maximum, and, if applicable, outliers based on the interquartile range

True

T or F. Another benefit of saving just the groupby object is that you can then iterate though the groups individually.

True

T or F. Columns that contain string data such as a country are considered Pandas type of object

True

T or F. Comma-separated values (CSV) are the most flexible data storage type.

True

T or F. Data types determine what can and cannot be done to a variable (i.e., column).

True

T or F. Grouped operations are a powerful way to aggregate, transform, and filter data.

True

T or F. In a Many-to-One Merge, one of the dataframes has key values that repeat and the dataframes that contains the single observations will then be duplicated in the merge.

True

T or F. In general, it s much easier to work with string object types when the variable is not a numeric number.

True

T or F. In statistics jargon, the term multivariate refers to a more than 2 variables and can be difficult to visualize.

True

T or F. Numpy is a scientific computing library that typically deals with numeric vectors

True

T or F. Pandas data structure known as Series is very similar to the numpy.ndarray

True

T or F. The Series data structure does not have an explicit to_excel method.

True

T or F. The column names, like shape, are specified using the column attribute of the dataframe object.

True

T or F. The goal of the Anscombe's quartet is to show the benefits of visualizations and the pitfalls of looking at only summary statistics.

True

T or F. The head function returns top five rows of data.

True

T or F. The matplotlib library is Python's fundamental plotting library

True

T or F. The splitlines method is similar to the split method. It is typically used on strings that are multiple lines long, and will return a list in which each element of the list is a line in the multiple-line string.

True

T or F. Tidy data is a framework to structure data sets so they can be easily analyzed.

True

T or F. When we transform data, we pass values from our dataframe into a function. The function then transforms the data.

True

T or F. When you apply a function across a DataFrame (in this case, column-wise with axis=0), the entire axis (e.g., column) is passed into the first argument of the function.

True

T or F. You use a programming language like Python and tool like Pandas to work with data because of automation and reproducibility.

True

If you just needed to add a single object to the end of an existing dataframe, the _________ function can handle that task.

append

Data _____________ refers to assembling a data set for analysis by combining various data sets together

assembly

Concatenating columns is very similar to concatenating rows. The main difference is the use of ____ parameter in the concat function.

axis

We were able to subset a Series with a boolean vector, so we can subset a DataFrame with a _____.

bool

Pandas supports _______, which comes from the numpy library which describes what happens when performing operations between array-like objects like Series and DataFrames.

broadcasting

Shape is an attribute of the dataframe, not a function or method of the dataframe and is coded as:

df.shape

In a histogram, the values are binned, meaning they are grouped together and plotted to show the __________ of the variable.

distribution

After performing a merge on tables, a lot of _____ information (data) can appeared across the tables rows

redundant

While performing an action on dataframe with a _____, it will try to apply the operation on each cell of the dataframe.

scalar

A Series is a ______ ________ of the DataFrame.

single column

To import matplotlib into your project you use the import matplotlib.pyplot as plt because the .pyplot is a _______.

subfolder

If you wish to plot more than one set of data for comparison, matplotlib offers a much handier way to create ______.

subplots

If we want to examine multiple columns, we can specify them by names, positions, or ranges. This is called ________ columns.

subsetting

If we want to examine multiple columns, we can specify them by names, positions, or ranges. This is called _________ columns.

subsetting


Related study sets

Describing at what time events will occur (Daily Routine)

View Set

Chapter 5: Business Communication: Creating and Delivering Messages that Matter

View Set

Module 3 A&P Practice Mult. Choice Questions.

View Set

BNS CH. 16 FLUID AND CHEMICAL BALANCE NCLEX STYLE Q'S

View Set