Udacity: Data Types
Another way to describe standard deviation is _________________
How much each point varies from the mean of the points.
What's categorical ordinal data?
In addition to classifying cases into categories, variables measured at the ordinal level allow the categories to be ranked with respect to how much of the trait being measured they possess.
How can we split age into smaller and smaller pieces/units?
Years, months, weeks, days, hours, minutes, seconda, etc.
Another standard deviation image example attached. Shows what the standard deviation is measuring.
You would average all of the white numbers in this image to see how different each data point is from the average.
How do I measure the spread of a data set using just one number?
Standard deviation or variance
What does it mean to bin data?
Statistical data binning is a way to group numbers of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together).
Definition median
The "middle" of a sorted list of numbers.
What does the standard deviation mean?
The Standard Deviation is a measure of how spread out numbers are. Its symbol is σ (the greek letter sigma)
How is the spread of our data best understood?
visually
What's the formula for finding the range?
range=maximum - minimum
What are the minimum and maximum values in this data set: 1,2,3,3,5,8,10
1 is the minimum (or smallest value) 10 is the maximum (or the highest value)
Categorical data can be ordinal or nominal. What does this mean?
1. A categorical variable (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories. For example, gender is a categorical variable having two categories (male and female) and there is no intrinsic ordering to the categories. Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest. A purely categorical variable is one that simply allows you to assign categories but you cannot clearly order the categories. If the variable has a clear ordering, then that variable would be an ordinal variable, An ordinal variable is similar to a categorical variable. The difference between the two is that there is a clear ordering of the categories. For example, suppose you have a variable, economic status, with three categories (low, medium and high). In addition to being able to classify people into these three categories, you can order the categories as low, medium and high.
What's an example of a Discrete quantitative data type?
1. A fixed value. 2. The number of dogs I interact with. The number of each type of treatment a salon needs to schedule for the week, the number of children attending a nursery each day or the profit a business makes each month.
What are the 2 defining characteristics of a variable in statistics?
1. A variable is an attribute that describes a person, place, thing, or idea. 2. The value of the variable can "vary" from one entity to another.
What are the 2 subcategories of categorical data?
1. Categorical ordinal (ordered) and 2. categorical nominal(no order)
What are the 4 quantitative variables?
1. Center 2. Spread 3. Shape 4. outliers
What are 2 subcategories of quantitative data?
1. Continuous quantitative data type. 2. Discrete quantitative data type
How do you find Qi and Q3 from the image attached?
1. Find the median which is 4. 2. Which two numbers did I add to find the median? 3 and 5. I draw a line between 3 & 5 Step 3: find the median of all the numbers on the left side of the line to equal Q1. Step 4: find the median of all the numbers on the right hand side of the line to get Q3.
What are the 5 values in the five number summary?
1. Maximum 2. third quartile 3. Second quartile(Median) 4. first quartile 5. Minimum
What are the 3 measures of center in a quantitative variable?
1. Mean 2. Median 3. Mode
What are the 4 aspects used when analyzing quantitative variables?
1. Measures of center 2. Measures of spread 3. Shape 4. Outliers
What are the 4 measures of spread?
1. Range 2. Interquartile range 3. Standard deviation 4. Variance
What is the only way to measure qualitative nominal data?
1. classification into categories is the only measurement procedure permitted. The categories cannot be thought of as "higher" or "lower" than each other along some numerical scale.
What's an example of continuous quantitative data type?
1. data that can take any value. Height, weight, temperature and length are all examples of continuous data. 2. Some continuous data will change over time; the weight of a baby in its first year or the temperature in a room throughout the day. 3. This data is best shown on a line graph
Quantitative
1. data that you can do math on 2. Research that addresses the "what" or "how many" aspects of a question. 3.It is data that can either be counted or compared on a numeric scale.
Qualitative data
1. describes qualities or characteristics. 2. It is collected using questionnaires, interviews, or observation (For example, it could be notes taken during a focus group on the quality of the food at Cafe Mac, or responses from an open-ended questionnaire). 3. difficult to precisely measure and analyze. 4. The data may be in the form of descriptive words that can be examined for patterns or meaning, sometimes through the use of coding. Coding allows the researcher to categorize qualitative data to identify themes that correspond with the research questions and to perform quantitative analysis.
How do you find median with a dataset of even numbers?
1. order the numbers 2. average the two center values
How much of our data falls below Q2?
50%
What's a frequency distribution?
A frequency distribution is an overview of all distinct values in some variable and the number of times they occur.
Definition: Histogram.?
A frequency distribution shows how often each different value in a set of data occurs.
What is the most common visual for quantitative data called?
A histogram
Measures of spread can be thought of as ________________
A measure of spread, sometimes also called a measure of dispersion, is used to describe the variability in a sample or population. It is usually used in conjunction with a measure of central tendency, such as the mean or median, to provide an overall description of a set of data
What's another word for mean?
Average
What are the 4 subcategories of both discrete and continuous quantitative data?
Center, shape, spread, and outliers
The image attached shows how far each employee lives from work.
How do I calculate how the distance varies from one employee to the next. Answer one: use the 5 number summary as a description. Answer two: Standard deviation.
What does the number of values in each bin determine?
How high the bar is on the histogram
What does the word spread mean statistics?
In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.
What does the 5 number summary do?
It gives us the values for calculating the range and the interquartile range.
What does measures of center tell you about your data?
It' gives you an idea of the average subject.
What's another way to describe discrete quantitative data?
It's a static value. i.e. pages in a book, trees in a yard
What is the median in our dataset?
It's the middle number or 3.
What's an example of categorical ordinal data?
Letter grade, survey rating
What's another word for median?
Middle value
What's another word for mode?
Most common value
What type of values does discrete data take on?
Only countable values like the number of dogs.
Here's a data set: 5,8,3,2, 1,3,10 What's the first step in finding the five number summary?
Order our values: 1,2,3,3,5,8,10
What's the first step in calculating the median?
Put the values in order
What's the first step in creating a histogram?
Putting data into bins.
The median is also called ____________________
Q2
What's another name for median in the five number summary?
Q2 (second quartile)
How do i find Interquartile range?
Q3 - Q1 = Interquartile range
What are the next 2 numbers needed to complete the five number summary on this data: 1,2,3,3,5,8,10
The first 3 going left to right. or Q1 The number next to the last number or Q3
What is the most common way to measure the spread of our data?
The five number summary
Another standard deviation image example attached. Shows what the standard deviation is measuring.
The standard deviation in this image is how far each individual is from the mean/average distance. Step one: subtract employee 1 from the average, then employee 2 from the average, then employee 3 from the mean, then employee four. See image attached to the card below.
What's another way to describe continuous quantitative data?
The values can always change or are always changing. I.e. height, weight.
What numeric values do we consider when looking at measures of spread?
We consider numeric values that consider how far points are from one another.
What's a box plot?
box plot—displays the five-number summary of a set of data where we draw a box from the first quartile to the third quartile.
What type of data are ranked categories?
categorical ordinal data.
When we can split a value into smaller and smaller pieces, like age, for example and still have smaller units we can break it into, this is an example of ______
continous quantitative data
The median of an odd number of data is the _________________________
exact middle value after you order the numbers
What are the 3 measures of center?
mean, median, and the mode
An odd number of values in a dataset vs. an even number of values will determine how we calculate the _______________________
median
The value that occurs most often is known as the ___________________
mode
What is the purpose of the 3rd measure of center-mode?
most common value.
What's a ranked category?
not very nice, nice, nicest
Continous data can take on any ____________________
numeric value