MIST 5750 Python Exam 2
data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5]) data.loc[1]
'a'
A = pd.Series([2, 4, 6], index=[0, 1, 2]) B = pd.Series([1, 3, 5], index=[1, 2, 3]) A.add(B, fill_value=0)
0 2.0 1 5.0 2 9.0 3 5.0 dtype: float64
A = pd.Series([2, 4, 6], index=[0, 1, 2]) B = pd.Series([1, 3, 5], index=[1, 2, 3]) A + B
0 NaN 1 5.0 2 9.0 3 NaN dtype: float64
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd']) data['a']
0.25
ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3]) ser2 = pd.Series(['D', 'E', 'F'], index=[1, 2, 3]) pd.concat([ser1, ser2])
1 A 2 B 3 C 1 D 2 E 3 F dtype: object
[T/R]
Data in the real world is rarely clean and homogeneous
[T/F] The methods concat and append provide the same functionality
False Append makes a completely new object when ran; concat does not
[T/F] The index of a Series object should be an integer
False Only the implicit indices need to be an integer! The explicit indices that you define can be any data type you want them to be.
[T/F] Pandas only supports inner joins
False Pandas also supports outer joins
[T/F] Slicing is not allowed in Pandas vectorized string operations
False Slicing is allowed
One advantage of Pandas vectorized string compared to built-in string methods is...
Graciously handling missing values
What are the two special sentinels that can be used to indicate a missing value?
None and NaN
[T/F] A Multiindex can be used to represent data of more than two dimensions in a Series
True
[T/F] A Pandas Series is a one-dimensional array of indexed data
True
[T/F] A multi-index can be used to represent two-dimensional data within a one-dimensional Series
True
[T/F] A multi-indexed Series can be converted to a Dataframe
True
[T/F] An essential piece of analysis of large data is efficient summarization by computing aggregations like sum( ), mean( ), min( ), and max( )
True
[T/F] By default, pd.merge( ) discards the index
True
[T/F] By default, pd.merge( ) uses the common columns across the data frames to join
True
[T/F] Concatenating two DataFrame can result in duplicate index values
True
[T/F] It is possible to define custom aggregations in Pandas
True
[T/F] Nearly all Python's built in string methods are mirrored by a Pandas vectorized string method
True
[T/F] Pandas allow MultiIndex for columns
True
[T/F] Pandas includes functions and methods that make combining and joining data from mulitple sources fast and straightforward.
True
[T/F] Regular expressions are supported in Pandas vectorized string operations
True
[T/F] The pd.merge( ) function implements a number of types of joins: the one-to-one, many-to-one, and many-to-many joins
True
[T/F] Unlike a dictionary, the Series supports array-style operations such as slicing
True
Does Pandas allow operations on Series of different indices?
Truth
[T/F] DataFrame/Series operations will automatically align indices
Truth
indA = pd.Index([1, 3, 5, 7, 9]) indB = pd.Index([2, 3, 5, 7, 11]) indA | indB #union index64Index(?, dtype='int64')
[1, 2, 3, 5, 7, 9, 11]
indA = pd.Index([1, 3, 5, 7, 9]) indB = pd.Index([2, 3, 5, 7, 11]) indA & indB # intersection Int64Index(?, dtype="int64')
[3, 5, 7]
To fill NA entries in a DataFrame data with a single value, such as zero:
data.fillna(0)
What function can be used to fill each na value using the previous in the data frame
data.fillna(method='ffill')
A handy function to calculate the descriptive statistics (min, max, mean, std, count) for all columns in a DataFrame is ____________
describe( )
Which function is used to return a copy of the data with missing values filled or imputed
fillna( )
To convert a column of String values to multiple columns of numerical values, we use ___________ method
get_dummies( )
To aggregate data in Pandas, we use the function:
groupby
When concatenating two data frames, to restrict the result to the common columns, we can use the option
join='inner
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd']) data[['a','d']]
o a 0.25 d 1.00 dtype: float64
The three steps in a groupby operation are:
split, apply, combine
When two Series of different indices are added, the resulting Series will have an index that is the __________ of indices of the two input Series
union
What method can be used to convert a multiply indexed Series into a conventionally indexed DataFrame?
unstack( )