QMB3302

Ace your homework & exams now with Quizwiz!

what is the shape of the following numPy array?

(2, 2, 2, 2)

doc.head(6) would produce what number of columns? (eg. 4-6, etc)

0-5

sentinel value

a value that when evaluated by the loop expression causes the loop to terminate indicates missing entry Pandas uses this can be data specific convention or more global, such as indicating missing value with NaN

function that adds thing

addmeup

datetime64

allows more flexibility, is compacted into 64 bit integers. MUST be year year-month-day

docstring

A special comment located at the beginning of a program or the beginning of a function that is used to automatically create help documentation. """ text """"

for pandas to work, data must be formatted as lists before imported T/F

False

Two ways to indicate missing data in set

mask and sentinel value

user defined function

must contain a RETURNS clause and a RETURN statement. a function that the user creates

do all operators produce an int output

no; for example, the result of standard division is always a float

lists are

ordered, changeable (can add and subtract values), and allow duplicates can be access by (thislist[2:5]), [:0]

timestamp

pandas replacement for numpy's datetime and datetime64 assoc. DatetimeIndex

what dot plot to use for very large data sets

plt.plot rather than plt.scatter

identify parts of the for loop below for x in y z

x is the iteration variable, y is the iterable, and z is the print variable

can functions have multiple arguments?

yes

while is closely related to the

for loop

mask

globally indicates missing values can be boolean array or something else

break statements

help you jump out of loop while --- print -- x = x+1 if x > maxc: break

Which was built on top of the other: matplotlib, seaborn

seaborn was built on top of matplotlib

time-shifts

shift() (data) and tshift() (index), specified in multiples of the frequency helpful with graphs!

NaN

special floating value, can be used with computations

time stamps vs intervals and periods vs deltas/durations

stamps is a particular moment in time, intervals is a span of time durations is a length of time (22 seconds)

Imagine we have a dataframe, df. What would be the purpose of df.loc[1]

to look from/ retrieve value from df

purpose of %matplotlib inline

to make the plots show up inline

pandas allows multiple different data types (like objects and numbers) in a single table

True

anonymous function

a set of related statements with no name assigned to it.

strings

"hello"

Let's take just a small slice of this data... we don't need it all. Let's assume we want to create a dataset of home features such as TotalBedrooms, TotalBathrooms, YearBuilt and LastSalePrice. How would you write that code? Call that dataframe "HomeFeatures".

HomeFeatures = homes[['TotalBedrooms', 'TotalBathrooms', 'YearBuilt', 'LastSalePrice']]

label parts of this statement for N in "string" : print(N)

N is the iteration variable, string is any character string

rolling()

Series and dataframe objects, similar to groupby

numPy allows us to do more complicated math on lists and other data structures, and is used in most of the more advances modules used (eg.pandas) T/F

True

Can Pandas join two dataframes together?

Yes

Do datasets need something in common to be joined (a key)

Yes

can pandas work with complicated dates?

Yes

lists

['apple', 'banana']

pandas-datareader package

can import financial data from a number of sources

iteration variable

changes each time the loop executes and controls when the loop is executed

Iterable

contains a sequence of values that can be iterated over (lists, tuples, strings)

datetime and dateutil

datetime is built in dateutil is not

Timedelta

deltas or durations TimedeltaIndex

Use of None

first sentinel value in pandas, object that is used for missing values in arrays. Can ONLY be used w objects (can't perform sums or mean)

period

fixed frequency interval PeriodIndex

How to import matplotlib

import matplotlib.pyplot as plt

how many types of data do 2d Numpy arrays take, and which type

integers!! arrays are made up on integers

Booleans for detecting null values

isnull() and nonnull()

Pandas use of missing data is constrained by

reliance on NumPy package, no built in notion of NA values for non-floating data types

a%b

remainder of a/b

range()

returns a range object that can be iterated over x= range(1,3)

pd.daterange()

returns following range based on the starting date, periods repeated, and frequency ( H=hour, D=day, etc, or 2H30T)

Tuples

rows or records in a relational database (multiple items in a single variable), (__)

advantage of using seaborn over native matplotlib when working with pandas dataframe to visualize two columns

seaborn can use columns from pandas, while matplotlib requires additional formatting

control statement

the part of a structure that determines whether the subsequent block of statements executes ("")

Relative path definition

the path FROM where you code is in your current working directory TO where your data is

replacing values in lists

this list = ['apple', 'cherry', 'banana'] this list [0:1] = ['melon', 'berry'] print(thislist) ['melon', 'berry', 'banana']

dropping null values

dropna() and fillna()


Related study sets

Intro to Business Chapter 8: Structuring Organizations for Today's Challenges

View Set

Managerial Acct. Exam 2 (True or False)

View Set

Ch.15, section 5- under-or overlapped manufacturing overhead

View Set