STT 201 exam 1
Histograms
*1. Histograms. Creating a histogram: 1) Draw axes (x - for classes, y - for percentages (or counts)). 2) Divide data into classes. 3) Make a bar for each class, such that the height of the bar represents to a percentage (or a count) for this class. In a histogram bars share sides. 6 (!) Rules for dividing data into the classes: -Decision about the size of a class is based on data: don't make the boundaries too narrow or too wide - Size of all classes is the same -There is no overlapping between classes - All data values are assigned to a class -A standard rule for a value that falls exactly on a class boundary is to count it in the class that begins with that value. Once all data is grouped into classes, we obtain its frequency distribution and are ready to make a histogram. (Relative frequencies or percentages can also be used). *EXAMPLE:* The amounts of money spent by 194 customers at large department store during one-day sale had been recorded and then displayed by the following histogram. *CONCLUSION:* when making a histogram make sure all bars have the same width and share sides. Include labels for both axes (label for x-axis must provide measurements units), all classes, and a title.
Bar Graphs
*2. Bar Graphs.* 1) Draw axes (x - for categories, y - for counts/percentages). 2) Make a bar for each category, such that the height of this bar corresponds to a percentage (or a count) for this category. Widths of all bars must be the same. Bars do not share sides. *EXAMPLE:* The following bar graph displays the results of the investigation about recent 820 non-military plane crashes (www. planecrashinfo.com). *CONCLUSION:* when making a bar graph make sure all bars have the same width, include the labels for each axis, each bar, and a title.
3. At least Squares Regression Line= a line of Best fit
*3. A least Squared regression Line= A line of Best Fit:* -Not only we cannot draw a line through all data points, even the best line might not hit any data points. *Goal:* to find the line that comes closer to all data points than any other line. -When we draw the line through the scatterplot, some residuals are positive, and some are negative. Therefore, we cannot assess how well this line fits the data just by adding them up: positive and negative ones will just cancel each other out. *Q:* What do we do to avoid this? -Square the residuals: squaring makes them all positive and now they can be added up. -When all squared residuals are added together, the sum indicates how well the line fits the data: *smaller sum = better fit.* This sum is called a sum of squared errors (SSE). -A *least squares regression line (a least squares line)* is a line for which SSE is the smallest.
Box Plots
*Box Plots:* A five-number summary of a data set are: the smallest value, the lower quartile, the median, the upper quartile, and the largest value: Min , Q1 , M , Q3 ,Max -Once we have the five-number summary, we can make a box plot of these data. *Creating a box plot:* 1) *Axis:* draw a single horizontal axis that covers a range from the minimum to the maximum of a data set. 2) *Creating a Box:* - Draw vertical lines at the values of the lower, the upper quartiles, and the median -Connect the lines corresponding to the quartiles with horizontal lines to form a box (the box can have any height that looks OK). 3) *Fences:* crate the fences around the main part of data. Note that the fences are just for the construction. They are not a part of an actual box plot. So, during the construction they are shown with dotted lines for illustration. -An inner lower fence (ILF) is placed 1.5 IQRs below the lower quartile - An inner upper fence (IUF) is placed 1.5 IQRs above the upper quartile. *4) Whickers:* draw the lines from the sides of the box to the left and to the right to the most extreme data values located within or on the fences. 5) *Outliers:* the outliers are any data values that fall beyond the inner fences. They are usually displayed with *
Changing Measurements Units with Linear Transformation
*Changing Measurement Units with Linear Transformation.* In some cases, the same quantitative variables can be recorded with different measurement units. EXAMPLE: weight, height, distance, temperature, currency, etc. Fortunately, the conversion between measurement units is easy. A *linear transformation* is used for this purpose. It changes the original variable x into the new variable x new given by an equation of the form: xnew =a+b(x) -Multiplying the value x by a positive constant b changes the size of a measurement unit only. Adding a constant a to the result, shifts it upward or downward by the ׀a ׀. Such shift also changes the origin (zero point) of the variable. *EXAMPLE 1:* -Changing measurement units for a distance from kilometers (x) to miles. Since 1 kilometer ≈0.62 miles, the transformation is: xnew =0.62(x.) Therefore, 25 kilometers drive covers 15.5 miles. This linear transformation changes measurement units without changing the origin - a distance of 0 kilometers is the same as a distance of 0 miles
Comparing Data Distribution
*Comparing Data Distribution:* The box plots are very useful for comparison of the distributions of multiple data sets consisting of observations on the same variable. EXAMPLE: A study of the efficiency of various containers for hot beverages had been conducted. Containers of three different brands: Thermos, Oxo, and Arctic were selected and each of them was tested 15 times. Each time the water was heated to 180°F, poured into each container, and then all containers were sealed. After 30 minutes, the temperatures were measured and the differences in temperatures for each container were recorded. (lecture 4 page 9)
*Data context example:*
*Data Context example:* *EXAMPLE:* Last year a group of researchers from Ohio State University conducted a study in Canada. The aim of this study was to find out if children who are breastfed are less susceptible to allergies. Mothers of babies born in 2016 had been asked about their babies' age, gender, height, weight, breastfeeding status and number of allergy episodes during their 1st year of life. *1) WHO:* Babies. *2) WHAT:* Age, gender, height, weight, breastfeeding status and number of allergy episodes during baby's 1st year of life. *3) WHY:* The purpose of this study was to find out if children who are breastfed are less susceptible to allergies. *4) WHERE:* In Canada. *5) WHEN:* 2016-2017. *6) HOW:* Survey of mothers.
Data context
*Data Context:* context provides... *1) WHO:* subjects (people, objects, etc.) which data describes. *2) WHAT:* all collected data. *3) WHY:* what was a purpose of data collection. *4) WHEN:* time frame for which data was collected. *5) WHERE:* where data was collected (geographic location). *6) HOW:* how data was collected.
Describing a Linear Pattern with a regression Line
*Describing a Linear Pattern with a Regression Line:* Scatterplots provide a lot of information about the relationship. However, often we want answers to specific numerical questions, such as: "How a response and explanatory variables are related?" *EXAMPLES:* -When examining weights and heights of a sample of people, we might want to know: *1)* What is an increase in an average weight for each one-inch increase in a height? *2)* We might want to estimate an average weight for people with a specific height. -A *regression analysis* is the area of Statistics that is used to examine a relationship between quantitative response variable and one or more quantitative explanatory variables. -The main element of a regression analysis is the construction of a *regression equation* - an equation that describes the average relationship between quantitative response and explanatory variables. The simplest kind of relationship between two variables is a straight-line relationship, called a *linear relationship.* -It frequently occur in practice, so a straight line is useful and important type of a regression equation. -The term *simple linear regression* refers to the methods used to analyze straight-line relationships. (!) Before we use a straight-line regression model, we should always examine the scatterplot to verify that the pattern of the data is actually linear. A straight line that best describes the linear relationship between two quantitative variables is called a regression line. *A regression line is used for two purposes:* *1)* To predict an unknown value of y for a subject, given this subject's x value. *2)* To estimate an average value of y at any specified value of x.
Describing the Distribution of Quantitative Data part 2
*Describing the Distribution of Quantitative Data part 2:* b) SYMMETRY. A distribution is symmetric if both halves of it look similar. In case data distribution is not symmetric, we need to identify its skewness. A skewness is a measure of degree of asymmetry of data distribution. 10 A right-skewed distribution stretches to the right more than to the left A left-skewed distribution stretches to the left more than to the right. c) UNUSUAL FEATURES (gaps and outliers): A gap is a region of data distribution without values. An outlier is an extreme value that does not appear to belong with the rest of data. Unfortunately, there is no obvious way to tell if an outlier is a real data value or an error made while taking the measurement or entering the value into computer. Reporting measurement units wrong may also cause an outlier. 11 The outliers need special attention because they can have a big influence on the conclusions drawn from a data set and, if not treated properly, can lead to the wrong conclusions. (!) Outliers must never be discarded without justification. Instead, analysis should be performed with and without them and then the results should be compared. (!) If relative frequencies (or percentages) are used instead of frequencies, the shape of a histogram wouldn't change. EXAMPLE: Based on the histogram from the previous example answer the following questions.
Describing the Distribution of Quantitative Data
*Describing the Distribution of Quantitative Data.* -The *distribution* of quantitative data is an overall pattern of how often the possible values occur. To examine the distribution of a quantitative data, we have to look at three things: *shape, location (centrality), and spread (variability).* *1. Shape.* *Q:* How do we decide on shape of data distribution? We need to identify the following things: - Modality - Symmetry / skewness - Unusual features: gaps and outliers. *a) MODALITY.* - A *mode* is a prominent peak of a distribution. Based on the number of modes, data distribution can be: 1. Unimodal - a single mode 2. Bimodal-twomodes 3. Multimodal - three or more modes 4. Uniform - a distribution that is roughly flat.
Descriptives Statistics
*Descriptives Statistics:* *Purpose:* describe information. *Involves:* collecting, classifying, organizing, displaying, describing, and presenting information.
Displaying Categorical Data
*Displaying Categorical Data:* -When we look at the collected data the problem is that we are unable to see what is going on. We need some ways to arrange it, so the patterns, relationships, and trends can be seen right away. *Q:* So what should we do? --> Make a display of data. It will help us to see the things clearly, and it's also the best way to present our data to others. (!) It is very important for any data display to be self- explanatory. *Q:* How can we display categorical data? --> Most commonly used displays for categorical data are a pie chart and a bar graph. In order to make a display of categorical data we have to organize data first. 1 To organize categorical data, a frequency distribution or a relative frequency distribution can be used.
Displaying Quantitative Data --> Stem & Leaf Plots
*Displaying Quantitative Data --> Stem & Leaf Plots:* (!) Unlike a histogram, a stem-and-leaf plot preserves all individual data values. Creating a stem-and-leave plot: 1) Separate each data value into stem value(s) consisting of all but the very last (rightmost) digit, and leaf value (the very last digit). 2) Write the stem values as a vertical column with the smallest value on top, and draw a vertical line on the right of this column. 3) Write each leaf value in the row to the right of corresponding stem value, in the increasing order away from the stem. 4) Indicate measurement units for data values on the display. (!) If the same data value occurs multiple times, it must be accounted for each time. 1
DotPlots
*Dot Plots:* -Just like a stem-and-leaf plot, a dot plot preserves all individual data values. *Creating a dot plot:* 1) Draw a number line (horizontal axis) that covers a range from the smallest to the largest data values. 2) For each data value, place a dot above the number line at the place corresponding to this data value. When multiple data values are the same, their dots are stacked vertically. *EXAMPLE:* A local bookstore has recorder the amounts of money (in whole $) spent by their customers during weekend. The dot plot provided below displays these data.
example of relationship between mean and standard deviation
*EX of relationship between mean & Standard deviation:* The average amount of time required to fill orders at a certain fast food restaurant has been observed to be 200 seconds with the standard deviation of 30 seconds. It's known that the distribution of all order-fill times at this restaurant is unimodal and symmetric. Answer the following questions. *Q1:* What percentage of all order-fill times contains between 140 and 260 seconds? --> According to the Empirical Rule: approximately 95% (mean (200) ± 2 standard deviations (2ˑ30)). *Q2:* Which time-range includes approximately 99.7% of all order- fill times? --> According to the Empirical Rule: 110-290 seconds (mean (200) ± 3 standard deviations (3ˑ30)).
Changing Measurements with Linear Transformation
*EXAMPLE 2:* A temperature (x) measured in degrees Fahrenheit must be re-expressed in degrees Celsius to be easily understood by most of the world. The transformation is: x new = 5 9 (x - 32). Therefore, the temperature of 95°F in the US translates into 35°C in most of the world. -This linear transformation changes both the measurement unit and the origin of the measurements. The origin (the temperature at which water freezes) is 0°C and 32°F.
Equation of a regression Line
*Equation of a Regression Line:* - *where yˆ is a predicted (estimated) value of y, and b0 and b1 are called an intercept and a slope of the line respectively.* -An *intercept (b0 )* is the value of y when x = 0 (it is the point at which the regression line crosses x-axis). - *A slope ( b1 )* shows how much change there is for the predicted or average value of y-variable when x-variable is increases by one unit. The sign of the slope shows whether yˆ increases or decreases as x increases. (!) Interpreting the intercept in the context of statistical data makes sense only if x = 0 is included in a data set. *EXAMPLE:* For the variables such as age, weigh, height, etc. zero is not a plausible value. So, the intercept is used only as a starting value for predictions, but is not interpreted as a meaningful predicted value. ˆ y=b0 +b1 (x) -Together the intercept and the slope are called the *regression coefficients.* -Both the intercept and the slope have measurement units. Measurement units of the intercept are measurement units of y-variable. Measurement units of the slope are measurement units of y-variable per measurement units of x-variable.
Frequency Distribution
*Frequency Distribution:* is a listing of all categories along with their frequencies (counts). (!) Sum of all frequencies (counts) = total number of counts in a data set. EXAMPLE: In a survey, 75 college students were asked if they have any siblings.
Displaying Quantitative Data
*II. Displaying Quantitative Data.* -When we look at the collected data the problem is that we are unable to see what is going on. We need some ways to arrange it, so the patterns, relationships, and trends can be seen right away. *Q:* So, what should we do? --> Make a graph. It will help us to see the things clearly and it is the best way to present our data to others. *Q:* Which types of graphs are used for quantitative data? --> There are different ways to graph quantitative data. Most commonly used are a histogram, a stem-and-leaf plot, a dot plot, and a box plot.
Inferential Statisitics
*Inferential Statistics:* *Purpose:* draw conclusions about characteristics of a large group (a population), based on a smaller group (a sample), selected from this large group. *Involves:* estimation and hypothesis testing. --> EXAMPLE: estimating an average GPA of all MSU students based on GPAs of smaller group of MSU students.
Making a scatterplot
*Making a ScatterPlot:* -Each point on the scatterplot represents the combination of measurements for an individual subject. -When looking at relationships between two variables, we can often identify one variable as an *explanatory variable* and another one as *response (outcome) variable.* -In the scatterplot, the response variable is plotted on a vertical axis (y-axis), so it's often called y-variable, and the explanatory variable is plotted on a horizontal axis (x-axis) and is often called x-variable. -If there is no explanatory-response distinction, either variable can go on either axis. 2 EXAMPLES: 1) An amount of study time and a corresponding assignment score: the amount of study time is the explanatory variable (x) and the assignment score is the response variable (y). 2) A number of miles driven, and an amount of gas left in the tank: the number of miles driven is the explanatory variable (x) and the amount of gas left in the tank is the response variable (y). Sometimes we may be interested in adding categorical variables to the scatterplots. In such cases different symbols can be used instead of dots or different colors of dots for each category. *EXAMPLE:* In the examining the relationship between height and weight we are interested in taking gender into account. Symbol "M" may be used for males and symbol "F" for females.
Measures of Location (Centralitity)
*Measures of Location (Centrality).* -A *mean* is an arithmetic average of all values in a data set: a sum of all values divided by the size of the data set. Sample mean: x=n xi =x +x +x +...+x i=1 1 2 3 n nn Just as x represents the average value of the observations in a sample, the average of values in a population can be calculated. This average is called a population mean and is denoted by μ. A *median* is a point that splits a data set in half. A *mode* is the value that occurs most frequently in a data set. 11 Note: for the perfectly symmetric data the values of the mean and the median are equal (and approximately equal for the nearly symmetric). *EXAMPLE:* For the data set from above: 2, 4, 5, 6, 9, 10, 11, 13, 18, 25, 29 find are the measures of location (centrality). *The mean:* (2 + 4 + 5 + 6 + 9 + 10 + 11 + 13 + 18 + 25 + 29) / 11 = 12, *the median* is 10, and there is no mode.
Measures of Spread (variability)
*Measures of Spread (Variability).* -Reporting measures of location (centrality) provides only partial information about a data set. Different data sets may have identical measures of location (centrality), but different measures of spread (variability). -The *range* is a difference between the largest and the smallest values in a data set: *Range = Maximum - Minimum* -The *interquartile range (IQR)* is a difference between the upper (Q3) and lower (Q1) quartiles of a data set: *IQR = Upper Quartile - Lower Quartile = Q3 - Q1* -Both the range and the interquartile range are not very useful due to the way they are calculated. Therefore, the most commonly used measures of the spread (variability) are the *variance* and the *standard deviation.*
Numerical Summaries of Quantitative Varibales
*Numerical Summaries of Quantitative Variables:* 1. Numerical Summaries of Quantitative Variables. In the numerical approach, we use descriptive statistics. They are brief descriptive coefficients that summarize a given data set. Descriptive statistics are broken down into two groups: --> Measures of location (centrality) --> Measures of spread (variability) The measures of location (centrality) include a mode, a median and a mean. The measures of spread (variability) include a *range,* an *interquartile range (IQR),* a *variance,* and a *standard deviation.*
Percentiles
*Percentiles.* A p-th percentile of a data set is a point below which lie p% of the values in the data set. 8 Some percentiles have special names: - 25th percentile is called a lower quartile (Q1 )- - 50th percentile is called a median -75th percentile is called an upper quartile (Q3 )
Pie Charts
*Pie charts:* ~Creating a pie chart:~ 1. Pie Charts. 2) "Slice" the circle in such a way that the size of each "slice" represents percentage (or count) for the corresponding category. *EXAMPLE:* The following pie chart displays the results of the survey conducted by the Centers for Disease Control and Prevention (CDC) about causes of death of 950 people. *CONCLUSION:* when making a pie chart make sure that the sizes of all "slices" are accurate and provide percentages (and/or counts) for each category, name the categories, and include a title.
Prediction errors and Residuals lecture 5 page 9
*Prediction Erros and Residuals:* -The prediction ability of the regression line is useful when we know only the value of x-variable and wish to predict the value of y-variable. -However, we can also use the predicted values to check how well a particular line works for a data set. -To do this, we substitute an observed (true) value of x into ourequation to compute the corresponding predicted value of y (y).Then we compare the obtained predicted value of y (y) to the observed value of y. ˆ *A residual (or a prediction error)* is a difference between observed value of y and its corresponding predicted value y. *Residual = (y-y)* -The residual tells us how far off the model's prediction is at that point. For example, positive residual shows that our model makes an underestimate - true y-value is higher than estimated.
Probability Theory
*Probability theory:* forms a bridge between the Descriptive and Inferential Statistics
Quartiles
*QUARTILES:* to find the quartiles the following steps must be made: 9 Step 1: sort data set from the smallest to the largest value (!) Step 2: calculate the positions of the quartiles as follows: - Lower quartile (Q1 ) is the median of the lower (left) half of the sorted values (without data set's median) - Upper quartile (Q3 ) is the median of the upper (right) half of the sorted values (without data set's median). Step 3: look up the values of the quartiles in the sorted data set.
Relationships between Quantitative Variables
*Relationships between Quantitative Variables:* EXAMPLES: 1) Are the heights and weights of people related? 2) Does a person's weight depend on his/her height? 3) If yes, what is the change in weight of a person, on average, for every one-inch increase in person's height? Three tools that are most commonly used to display, describe, and quantify the relationship between two quantitative variables are: 1. Scatterplot 2. Regression Equation 3. Correlation.
Summary of BOX PLOT
*SUMMARY OF BOX PLOT:* -The inside of the box plot shows the middle half of data (between the quartiles) -The width of the box equals to the IQR - The position of the median (and the length of the wickers) show symmetry/skewness - Outliers are displayed individually. 6 (!) A box plot does not show modality of data distribution. However, it's the best tool in identifying the outliers. *CONCLUSION:* -when making a box plot make sure that all lines are straight and drawn at the proper places. Put a label for an axis (including measurement units), and a title.
Scatterplot
*ScatterPlot:* -scatterplot is a two-dimensional graph of the values of two quantitative variables measured on the same subject. -It is one of the most common ways to display *bivariate* quantitative data. It allows us to see patterns, trends, relationships, and also outliers and subgroups (clusters). (!) The scatterplots are the best way to start observing the relationship between two quantitative variables. EXAMPLES: 1) A car speed and an amount of time to reach a destination. 2) An educational level and an annual income.
Statistics
*Statistics:* is a science of information. It involves: collecting 1. classifying 2. organizing 3. analyzing 4. presenting,and 5. interpreting information to answer questions and/or draw conclusions.
example
*Stem& Leaf Plots:* EXAMPLE 1: To study the age of people attending a certain fitness club on a given day 30 club members were asked about their ages. The following answers had been recorded: 50, 79, 48, 30, 61, 46, 33, 44, 18, 21, 56, 61, 44, 52, 62, 41, 39, 57, 61, 44, 20, 48, 35, 50, 57, 38, 61, 72, 42, and 29. *Q1:* What were the minimum and the maximum ages of club members that day? The youngest was 18 and the oldest was 79 years old. 2 *Q2:* What was the most common age? 61 years (4 members). *Q3:* How many members were between 20 and 35 years old (inclusive)? 6 members (ages 20, 21, 29, 30, 33, and 35). *Q4:* How many members were 62 years old or older? 3 members (ages 62, 72, and 79). *Q5:* What is the modality of the distribution of these data? It's unimodal (the mode is at 40's). *Q6:* Is the distribution of these data symmetric or skewed? It's symmetric. *Q7:* Are there any unusual features in these data distribution? No. *CONCLUSION:* when making a stem-and-leaf plot make sure that the rows are straight, and all numbers are of the same size. Include the measurement units and a title.
*III. Summarizing Quantitative Variables.:*
*Summarizing Quantitative Variables.:* -A population is an entire group of subjects about which information is desired. A size of a population (N) is a number of its elements. *EXAMPLE:* average age of all students of a certain university; average salary of all employees of a certain hospital. A sample is a subset (a part) of a population. A size of a sample (n) is a number of its elements. *EXAMPLE:* average age of freshmen of this university; average salary of physicians of this hospital. (!) The selection of the sample can be done in many ways. However, is very important to select the sample which is representative of the entire population. These are various techniques that can be used for data analysis: - Graphical approach - Tabular approach -Numerical approach 7 While data displays, and tables are easier to use, numerical approach is more precise and objective. So, the best way is to use both.
variables + types of variables
*Variables:* A variable is a characteristic that differs from one subject to the next. *types of Variables:* *1) categorical variable:* is a variable that names the categories and tells to which category a subject belongs. Each subject can belong to only one category. *EXAMPLE:* gender, country of birth, presence of a certain characteristic of interest, etc. A particular case of a categorical variable is an *ordinal variable.* In this case the categories have a logical order or ranking. *EXAMPLES:* 1) Size of clothes: S, M, L, XL, etc. 2) Olympic ranking: gold, silver, bronze. 7 Note that often a rank number is often assigned to the categories of the ordinal variable. EXAMPLES (Rating scales): 1) 3-point Comparison of Skills scale: 1=Better, 2=The same, 3=Worse 2) 4-point Agreement scale: 1=Strongly Agree, 2=Disagree, 3=Agree, 4=Strongly Agree 3) 5-point Performance scale: 1=Poor, ..., 5=Excellent
DATA
*What is data?* -Data is systematically recorded information together with its context. --> Data is collected using observational studies, surveys or designed experiments. It is collected to answer some specific question(s). *EXAMPLES:* 1) What is the average age of students in this course? 2) What percentage of defective items is produced at some factory during one-month period? 3) Do men tend to drive faster than women, on average? 4) Does newly developed drug work better that the currently used one?
relative Frequency
*relative frequency:* is a frequency for each category divided by a total number of counts in a data set. The *relative frequency distribution* - is a listing of all categories along with their relative frequencies (or percentages). (!) Sum of all relative frequencies is 1 (or percentage is 100%).
Quantitive variable
2) *quantitative variable:* is a variable that takes numerical values for which arithmetic operations make sense. (!) All quantitative variables have measurement units which provide the information on how these variables were measured. Reporting numerical variables without their measurement units is useless! *EXAMPLE:* age (years/months), weight (pounds/ounces), temperature (degrees), salary (dollars/cents), etc. (!) Just because a variable consists of numbers DON'T automatically assume that it is quantitative. EXAMPLE: social security numbers, zip codes, phone numbers. 8 *(!) Quantitative variables can be turned into categorical variables.* EXAMPLE: Age (quantitative variable) can be divided into categories: - Children (under 15 years) -Youth(15-24years) - Adults (25-64 years) -Seniors (65+ years) The distribution of a variable tells what values it takes and how often it takes them. EXAMPLE: In the example of breastfeeding status and number of allergy episodes during 1st year of baby's life, the following variables had been collected: baby's gender, height, weight, breastfeeding status and number of allergy episodes during 1st year of life. Q: Identify type of each of these variables and provide categories for categorical and measurement units for quantitative. 1) Age: Quantitative (months, since we are working with babies). 9 2) Gender: Categorical (male, female). 3) Height: Quantitative (centimeters, since the babies under study are from Canada). 4) Weight: Quantitative (kilograms and grams (study in Canada)). 5) Breastfeeding status: Categorical (breastfed, not breastfed). 6) Number of allergy episodes: Quantitative (number of episodes). Q: Are there any ordinal variables in this example? No.
Calculating Median & Lower and Upper Quartiles
Calculating Median and Lower and Upper Quartiles. *1) MEDIAN:* to find the median, the following steps must be made: Step 1: sort a data set from the smallest to the largest value (!) Step 2: find the position of the median using the following formula: (n+1)/2, where n is a size of the data set. Step 3: look up the value of the median in the sorted data set (counting from the bottom of the data set). (!) In case the data set has even number of values the value of the median is the average of the values on (n/2) and (n/2)+1 positions.