BA 222 Midterm 1
Data
made up of variables and observations
categorical variables
major
[ ]
matrix
what describes normal distribution?
mean and SD
what is the symbol for sample mean?
mu
mean
myArray = np.array([1, 2, 3, 4, 5]) np.mean(myArray)
percentile
myArray = np.array([1, 2, 3, 4, 5]) np.percentile(myArray, 75)
maximum
myArray = np.array([1, 2, 3, 4, 5]) print(myArray.max())
minimum
myArray = np.array([1, 2, 3, 4, 5]) print(myArray.min())
how to define matricies
myMatrix = [[1,2,3], [4,5,6]]
Python Data Types: lists
mylist = [1, 2, 'tie my shoes', 3.0, [4.5]]
fundamental of type method: access a list
mylist = [1, 2, 'tie my shoes', 3.0, [4.5]] mylist [2]
what is the symbol for number of observations?
n
when is the t distribution almost exactly the normal distribution?
n > 30
what is the symbol for SD
sigma
the standard deviation
spread of the bell curve
t = "elephant" type (t)
str
what is the built in function for strings?
str ( )
Python Data Types: string
"hello world" , "4"
fundamental of type method: concatenate strings
"hello" + "world" = "helloworld"
concat string and int
'3' + 'hello' = '3hello'
what does t >= 1.7 correspond to?
0.05 probability
what is the value of the area under the entire probability density function?
1
what is the height formula of the probably density function?
1 / (b-a)
multiple string and int
3 * 'hello' = hellohellohello
Python Data Types: integer
4, 42, -3
Python Data Types: float
4.0, -5.7, 33.77777
68-95-99.7 rule: what is the percentage of lying 1 SD of the mean?
68%
68-95-99.7 rule: what is the percentage of lying 2 SD of the mean?
95%
68-95-99.7 rule: what is the percentage of lying 3 SD of the mean?
99.7%
how to define function
1. def function_name (var):
how to make charts
1. import matplotlib.pyplot as plt 2. reference_name.plot.bar 3. plt.show( )
importing numpy
1. import numpy as np 2. a = np.array ([1,2,3,4,5]) 3. print (a)
"1" + "1" = ?
11
len ("pygmy hippo")
11 (includes space)
x = list (range(0, 30)) print (x)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
myList3 = [1, 2, 3, 4, 5] myList3[4] = "test" print(myList3)
[1, 2, 3 ,4, 'test']
numerical variables
age
variable
an attribute or characteristic of the observation we record
observation
an instance of the data being collected
what will happen to the SD as the sample size increases ?
as sample size n increases, the SD will converge to the true SD
( )
calling
find rows and columns
carsdata.shape
the mean
center of the bell curve
What is the t-statistic?
counts how many standard errors of the mean are between our observed mean and the hypothesis
what does t =< 1.7
less than 5% chance (reject)
the mean and count of distance by airline using groupby
flights[["AIRLINE","DISTANCE"]].groupby("AIRLINE").agg(['count', 'mean'])
modular operator
gets remainder
what does the area under the curve between two points tell?
how likely the value is to fall in that range
what is the z-score
how many SD away from the mean is the value X
what does t-distribution tell us?
how many SE away fro m the hypothetical mean is the observed mean
how to import
import library_name as reference variable reference_variable.function_name ( )
what is the imported function for plotting with pandas?
import name.bar (x, y)
how is data presented?
in a matrix
how are variables presented?
in columns
how are observations presented?
in rows
how to upper case an output
print (x.upper())
how can the distribution be described by?
probability density function
randint(0, 100)
random integer from 0-100
access a column by name
reference_name ["column_name"] flights["DAY']
columns attribute that lists column names
reference_name.columns
how to find names of tables
reference_name.columns
head method to show what top of data looks like
reference_name.head( )
reading a csv file
reference_name.read.csv( name of file )
shape attribute function
reference_name.shape
sorting a list from lowest to highest
reference_variable.sort ( )
standard error of the mean
the SD of the mean value for different samples
what is standard deviation?
the estimated values from your sample
correlation
the measure of strength of the relationship between 2 variables (scaled from -1 to 1)
distribution
the range of values the data can take on
how to find variance?
the square of the SD
true / false --> integer plus float is a float
true
true / false --> the sum of floats is a float
true
fundamental of type method: check a type
type (3.0)
what is used to see if two values are not equal and return true or false?
use != x = 3 y = 2 z = 2 print (x != y) --> true print (y != z) --> false
what happens in a positive correlation?
when X is bigger, Y is on average bigger
what happens in a negative correlation?
when X is bigger, Y is on average smaller
the central limit theorm
when you take the sample mean and SD of a population and take sufficiently large random samples from the population with replacement, then the distribution of sample means will be approx. normally distributed
how to see if two values are equal and returns true or false
x = 3 y = 2 z = 2 print (x == y) print (y == z)
mean
y = np.array([1, 2, 3, 4, 6]) np.mean(y)
median
y = np.array([1, 2, 3, 4, 6]) np.median(y)
SD
y = np.array([1, 2, 3, 4, 6]) np.std(y)