BA 222 Midterm 1

Ace your homework & exams now with Quizwiz!

Data

made up of variables and observations

categorical variables

major

[ ]

matrix

what describes normal distribution?

mean and SD

what is the symbol for sample mean?

mu

mean

myArray = np.array([1, 2, 3, 4, 5]) np.mean(myArray)

percentile

myArray = np.array([1, 2, 3, 4, 5]) np.percentile(myArray, 75)

maximum

myArray = np.array([1, 2, 3, 4, 5]) print(myArray.max())

minimum

myArray = np.array([1, 2, 3, 4, 5]) print(myArray.min())

how to define matricies

myMatrix = [[1,2,3], [4,5,6]]

Python Data Types: lists

mylist = [1, 2, 'tie my shoes', 3.0, [4.5]]

fundamental of type method: access a list

mylist = [1, 2, 'tie my shoes', 3.0, [4.5]] mylist [2]

what is the symbol for number of observations?

n

when is the t distribution almost exactly the normal distribution?

n > 30

what is the symbol for SD

sigma

the standard deviation

spread of the bell curve

t = "elephant" type (t)

str

what is the built in function for strings?

str ( )

Python Data Types: string

"hello world" , "4"

fundamental of type method: concatenate strings

"hello" + "world" = "helloworld"

concat string and int

'3' + 'hello' = '3hello'

what does t >= 1.7 correspond to?

0.05 probability

what is the value of the area under the entire probability density function?

1

what is the height formula of the probably density function?

1 / (b-a)

multiple string and int

3 * 'hello' = hellohellohello

Python Data Types: integer

4, 42, -3

Python Data Types: float

4.0, -5.7, 33.77777

68-95-99.7 rule: what is the percentage of lying 1 SD of the mean?

68%

68-95-99.7 rule: what is the percentage of lying 2 SD of the mean?

95%

68-95-99.7 rule: what is the percentage of lying 3 SD of the mean?

99.7%

how to define function

1. def function_name (var):

how to make charts

1. import matplotlib.pyplot as plt 2. reference_name.plot.bar 3. plt.show( )

importing numpy

1. import numpy as np 2. a = np.array ([1,2,3,4,5]) 3. print (a)

"1" + "1" = ?

11

len ("pygmy hippo")

11 (includes space)

x = list (range(0, 30)) print (x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]

myList3 = [1, 2, 3, 4, 5] myList3[4] = "test" print(myList3)

[1, 2, 3 ,4, 'test']

numerical variables

age

variable

an attribute or characteristic of the observation we record

observation

an instance of the data being collected

what will happen to the SD as the sample size increases ?

as sample size n increases, the SD will converge to the true SD

( )

calling

find rows and columns

carsdata.shape

the mean

center of the bell curve

What is the t-statistic?

counts how many standard errors of the mean are between our observed mean and the hypothesis

what does t =< 1.7

less than 5% chance (reject)

the mean and count of distance by airline using groupby

flights[["AIRLINE","DISTANCE"]].groupby("AIRLINE").agg(['count', 'mean'])

modular operator

gets remainder

what does the area under the curve between two points tell?

how likely the value is to fall in that range

what is the z-score

how many SD away from the mean is the value X

what does t-distribution tell us?

how many SE away fro m the hypothetical mean is the observed mean

how to import

import library_name as reference variable reference_variable.function_name ( )

what is the imported function for plotting with pandas?

import name.bar (x, y)

how is data presented?

in a matrix

how are variables presented?

in columns

how are observations presented?

in rows

how to upper case an output

print (x.upper())

how can the distribution be described by?

probability density function

randint(0, 100)

random integer from 0-100

access a column by name

reference_name ["column_name"] flights["DAY']

columns attribute that lists column names

reference_name.columns

how to find names of tables

reference_name.columns

head method to show what top of data looks like

reference_name.head( )

reading a csv file

reference_name.read.csv( name of file )

shape attribute function

reference_name.shape

sorting a list from lowest to highest

reference_variable.sort ( )

standard error of the mean

the SD of the mean value for different samples

what is standard deviation?

the estimated values from your sample

correlation

the measure of strength of the relationship between 2 variables (scaled from -1 to 1)

distribution

the range of values the data can take on

how to find variance?

the square of the SD

true / false --> integer plus float is a float

true

true / false --> the sum of floats is a float

true

fundamental of type method: check a type

type (3.0)

what is used to see if two values are not equal and return true or false?

use != x = 3 y = 2 z = 2 print (x != y) --> true print (y != z) --> false

what happens in a positive correlation?

when X is bigger, Y is on average bigger

what happens in a negative correlation?

when X is bigger, Y is on average smaller

the central limit theorm

when you take the sample mean and SD of a population and take sufficiently large random samples from the population with replacement, then the distribution of sample means will be approx. normally distributed

how to see if two values are equal and returns true or false

x = 3 y = 2 z = 2 print (x == y) print (y == z)

mean

y = np.array([1, 2, 3, 4, 6]) np.mean(y)

median

y = np.array([1, 2, 3, 4, 6]) np.median(y)

SD

y = np.array([1, 2, 3, 4, 6]) np.std(y)


Related study sets

2. Ask Questions to Make Data-Driven Decisions

View Set

INDUCTIVE AND DEDUCTIVE REASONING

View Set

NRSG 102 lecture review for test1 17?'s

View Set