QMB3302
what is the shape of the following numPy array?
(2, 2, 2, 2)
doc.head(6) would produce what number of columns? (eg. 4-6, etc)
0-5
sentinel value
a value that when evaluated by the loop expression causes the loop to terminate indicates missing entry Pandas uses this can be data specific convention or more global, such as indicating missing value with NaN
function that adds thing
addmeup
datetime64
allows more flexibility, is compacted into 64 bit integers. MUST be year year-month-day
docstring
A special comment located at the beginning of a program or the beginning of a function that is used to automatically create help documentation. """ text """"
for pandas to work, data must be formatted as lists before imported T/F
False
Two ways to indicate missing data in set
mask and sentinel value
user defined function
must contain a RETURNS clause and a RETURN statement. a function that the user creates
do all operators produce an int output
no; for example, the result of standard division is always a float
lists are
ordered, changeable (can add and subtract values), and allow duplicates can be access by (thislist[2:5]), [:0]
timestamp
pandas replacement for numpy's datetime and datetime64 assoc. DatetimeIndex
what dot plot to use for very large data sets
plt.plot rather than plt.scatter
identify parts of the for loop below for x in y z
x is the iteration variable, y is the iterable, and z is the print variable
can functions have multiple arguments?
yes
while is closely related to the
for loop
mask
globally indicates missing values can be boolean array or something else
break statements
help you jump out of loop while --- print -- x = x+1 if x > maxc: break
Which was built on top of the other: matplotlib, seaborn
seaborn was built on top of matplotlib
time-shifts
shift() (data) and tshift() (index), specified in multiples of the frequency helpful with graphs!
NaN
special floating value, can be used with computations
time stamps vs intervals and periods vs deltas/durations
stamps is a particular moment in time, intervals is a span of time durations is a length of time (22 seconds)
Imagine we have a dataframe, df. What would be the purpose of df.loc[1]
to look from/ retrieve value from df
purpose of %matplotlib inline
to make the plots show up inline
pandas allows multiple different data types (like objects and numbers) in a single table
True
anonymous function
a set of related statements with no name assigned to it.
strings
"hello"
Let's take just a small slice of this data... we don't need it all. Let's assume we want to create a dataset of home features such as TotalBedrooms, TotalBathrooms, YearBuilt and LastSalePrice. How would you write that code? Call that dataframe "HomeFeatures".
HomeFeatures = homes[['TotalBedrooms', 'TotalBathrooms', 'YearBuilt', 'LastSalePrice']]
label parts of this statement for N in "string" : print(N)
N is the iteration variable, string is any character string
rolling()
Series and dataframe objects, similar to groupby
numPy allows us to do more complicated math on lists and other data structures, and is used in most of the more advances modules used (eg.pandas) T/F
True
Can Pandas join two dataframes together?
Yes
Do datasets need something in common to be joined (a key)
Yes
can pandas work with complicated dates?
Yes
lists
['apple', 'banana']
pandas-datareader package
can import financial data from a number of sources
iteration variable
changes each time the loop executes and controls when the loop is executed
Iterable
contains a sequence of values that can be iterated over (lists, tuples, strings)
datetime and dateutil
datetime is built in dateutil is not
Timedelta
deltas or durations TimedeltaIndex
Use of None
first sentinel value in pandas, object that is used for missing values in arrays. Can ONLY be used w objects (can't perform sums or mean)
period
fixed frequency interval PeriodIndex
How to import matplotlib
import matplotlib.pyplot as plt
how many types of data do 2d Numpy arrays take, and which type
integers!! arrays are made up on integers
Booleans for detecting null values
isnull() and nonnull()
Pandas use of missing data is constrained by
reliance on NumPy package, no built in notion of NA values for non-floating data types
a%b
remainder of a/b
range()
returns a range object that can be iterated over x= range(1,3)
pd.daterange()
returns following range based on the starting date, periods repeated, and frequency ( H=hour, D=day, etc, or 2H30T)
Tuples
rows or records in a relational database (multiple items in a single variable), (__)
advantage of using seaborn over native matplotlib when working with pandas dataframe to visualize two columns
seaborn can use columns from pandas, while matplotlib requires additional formatting
control statement
the part of a structure that determines whether the subsequent block of statements executes ("")
Relative path definition
the path FROM where you code is in your current working directory TO where your data is
replacing values in lists
this list = ['apple', 'cherry', 'banana'] this list [0:1] = ['melon', 'berry'] print(thislist) ['melon', 'berry', 'banana']
dropping null values
dropna() and fillna()