PETE 404 Lecture 4 Questions
Interpreting Quartile Skew Coefficient
1) Qs = 0: symmetric 2) Qs < 0: left tail longer 3) Qs > 0: right tail longer
Properties of CDF
1) bounded between 0 and 1 2) limits of 0 and 1 3) increasing 4) continuous 5) intervals
Important characteristics for every set of measurements
1) central/typical value 2) spread about central value 3) symmetry
What are the 3 measures of shape
1) coefficient of skewness 2) quartile skew coefficient 3) coefficient of kurtosis
Two types of random variables
1) discrete 2) continuous
Interpreting coefficient of kurtosis in regards to gaussian distribution
1) k = 3: gaussian distribution 2) k < 3: less peaked than 3) k > 3: more peaked than
Measures of location or central tendency
1) mean 2) median 3) mode
The _______ of a continuous distribution corresponds to the point on which ____ is attained
1) mode 2) maximum probability
Three properties of a PDF
1) non-negative 2) unit area under f(x) 3) intervals
5 common measures of spread
1) range 2) variance 3) standard deviation 4) coefficient of variation 5) interquartile range
4 Steps to generating a PDF
1) sort data in ascending order 2) divide data range in convenient/reasonable intervals (bins) 3) count number of points, i, in each bin to calculate probability of each bin Pi(fi) 4) plot bin mid-range versus Pi
Results if dx is too small or too large
1) too small= pdf is too bumpy 2) too large= pdf has low resolution
A ______ defines the probability of finding a value of a r.v, X, that is less than or equal to a specified value x
Cumulative distribution function
Why is a CDF preferred over a PDF
PDF is dependent on bin size
Equation of probability for each bin
Pi = fi/n
Random Variable
a real valued function that assigns a value to each outcome in a sample space
A CDF defines what about a random variable
all probabilistic properties; complete statistical characterization
What part of the PDF defines probability
area under the curve
Shape of PDF is a function of ______
bin size
In practice only generate continuous through fitting data to ____ CDF functions
closed form
Measure of how peaked (or fat) a distribution is
coefficient of kutosis
Describes the measure of asymmetry of the histogram
coefficient of skewness
What is a normalized measure of dispersion
coefficient of variation
Continuous CDF
continuous within limits of 0 and 1
Mild Outlier
deviates more than 1.5 times the iqr from median
Extreme Outlier
deviates more than 3 times the iqr from the median
Discrete CDF
discontinuous at xi and constant in between
What is equation for determining bin intervals (dx)
dx = (xn-x1)/n
Rule of thumb for calculating bin intervals
dx = 5(xn-x1)/n
Negative Skewness
elongated tail to left; mass of data on right -mean<median
Positive Skewness
elongated tail to the right; mass of data on left -mean>median
Histogram is a graphical representation of a ________
frequency table
Which type of curve do we typically use for risk analysis
inverse cumulative histogram
______ is sensitive to erratic values
mean
On a CDF, the ______ is the value on the x-axis that corresponds to P50 on the y-axis
median
_______ is often the class with the tallest bar on the histogram
mode
Advantage to using PDF over CDF
more intuitive and shows spread of probability more clearly
When working with order statistics, the number of boundaries is always _____ than the number of partitions
one less
Mode is sensitive to _______
only to the value with the highest frequency
Variance is very sensitive to _____
outliers
We obtain _______ each time an experiment is performed
realization of a random variable
Quartile Skew Coefficient
serves same purpose as coefficient of skewness but is only sensitive to central part of distribution
Range is a poor descriptor for _____ samples
small
Percentiles
split into hundredths
Quartiles
split into quarters
Deciles
split into tenths
Quantiles
splitting data into any fraction
What is the average squared difference of data about the mean
variance
Low Kurtosis
variance due to frequent modest size deviation
High Kurtosis
variance mainly due to infrequent extreme deviations
When should variance divide by n-1 rather than just n
when sample sizes <30