BSTAT TEST 1 REVIEW
what do we call when we say drawing conclusions about a large set of data?
population
Numerical variable
A variable that assumes meaningful numerical values for observations.
Continuous (random) variable
A variable that assumes uncountable values in an interval.
Categorical variable
A variable that uses labels or names to identify the distinguishing characteristics of observations.
what are the 2 ways we cannot use population data only sampling?
1. Obtaining information on the entire population is expensive 2. It is impossible to examine every member of the population.
Population parameter
A characteristic of a population. Number that describes something about an entire group or population
Variable
A general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree.
Line Chart
A graph that connects the consecutive observations of a numerical variable with a line.
Bar chart
A graph that depicts the frequency or relative frequency of each category of a categorical variable as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are to be depicted.
Histogram
A graphical depiction of a frequency or a relative frequency distribution for a numerical variable; series of rectangles where the width and height of each rectangle represent the interval width and frequency (or relative frequency) of the respective interval.
Discrete (random) variable
A variable that assumes a countable number of values.
big data
A massive volume of both structured and unstructured data that are often difficult to manage, process, and analyze using traditional data processing tools.
Scatterplot with a categorical variable
A modification of a basic scatterplot that incorporates a categorical variable.
Pie chart
A segmented circle portraying the categories and relative sizes for a categorical variable.
Frequency distribution
A table that groups the observations of a variable into categories or intervals and records the number of observations that fall into each category or interval.
Contingency table
A table that shows frequencies for two categorical variables, x and y, where each cell represents a mutually exclusive combination of the pair of x and y observations.
Stem-and-leaf diagram
A visual method of displaying quantitative data where each value of a data set is separated into two parts: a stem, which consists of the leftmost digits, and a leaf, which consists of the last digit.
Myers-Briggs assessment breaks down personality types into four categories
Analyst, Diplomat, Explorer, Sentinel
structured data
Data that conform to a predefined row-column format.
what is one example of interval scale?
Fahrenheit scale for temperatures
Polygon
For a numerical variable, a graph of a frequency or relative frequency distribution in which lines connect a series of neighboring points, where each point represents the midpoint of a particular interval and its associated frequency or relative frequency.
Stacked column chart
Graph of a contingency table; depicts more than one categorical variable and allows for the comparison of composition within each category.
approximation formula to find the width of each interval
Minimum - Maximum / number of intervals
Interval scale
Observations of a variable can be categorized and ranked, and differences between observations are meaningful.
Ratio scale
Observations of a variable can be categorized and ranked, differences between observations are meaningful, and a true zero point (origin) exists.
Ordinal scale
Observations of a variable can be categorized and ranked.
Nominal scale
Observations of a variable differ merely by name or label.
Volume
One of the V's describing big data; an immense amount of data is compiled from a single source or a wide range of sources.
Variety
One of the V's describing big data; data come in all types, forms, and granularity.
Velocity
One of the V's describing big data; data from a variety of sources get generated at a rapid speed.
Value
One of the V's describing big data; information derived from big data should have value.
Veracity
One of the V's describing big data; refers to the credibility and quality of data.
Subsetting
The process of extracting a portion of a data set.
Omission strategy
When missing values exist, this strategy recommends excluding these observations from subsequent analysis.
Imputation strategy
When missing values exist, this strategy recommends replacing them with some reasonable imputed values.
Scatterplot
a graphed cluster of dots, each of which represents the values of two variables
Ogive
a line graph of a cumulative frequency or cumulative relative frequency distribution.
what does sample?
a subset of the population
what does population consist of?
all items of interest in a statistical problem
what are variables classified as?
categorical (qualitative) or numerical (quantitative)
what are nominal and ordinal scales use for?
categorical variables
COUNTIF function
count the number of employees in each industry
Cross-sectional data
data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time
Time series data
data collected over several time periods focusing on certain groups of people, specific events, or objects.
2 branches of statistics
descriptive statistics and inferential statistics
what are 2 of the numerical variables?
discrete or continuous.
unstructured data (or unmodeled data)
do not conform to a predefined, row-column format
What is inferential statistics?
drawing conclusions about a large set of data.
COUNTA
function counts the number of cells that are not empty and is applicable to all four variables.
What is the Phenomenal growth in statistics?
inferential statistics
distribution is not symmetric
it is either positively skewed or negatively skewed
symmetric distribution
mirror image of itself on both sides of its center
one of four major measurement scales
nominal, ordinal, interval, or ratio
what are examples of structured data?
numbers, dates, and groups of words and numbers, typically stored in a tabular format.
what are interval and ratio scales use for?
numerical variables
ratio scale
represents the strongest level of measurement.
if the set is smaller what do we call it ?
sample data
descriptive statistics
summary of important aspects of a data set. Ex: collecting data, organizing the data, and then presenting the data in the form of charts and tables.
COUNT function
the number of cells that contain numeric observations and, therefore, can only apply to the EmployeeID and Wage variables.
Exel Function COUNT and COUNTA
to inspect the number of observations in each column
what are examples of unstructured data?
written reports, e-mail messages, doctor's notes, or open-ended survey responses etc.