Exam 2
Events where only one of them can occur
Mutually Exclusive Events
A statement that specifies exactly which data a user requires from a database
Query
The probability that at least one of a set of mutually exclusive events will occur is the sum of their probabilities
Addition rule for mutually exclusive events
A measure similar to R-square , but adjusted for the number of explanatory variables in the equation
Adjusted R-square
Typically, the hypothesis the analyst is trying to prove or research hypothesis
Alternative hypothesis
A primary key that automatically assigns consecutive integer values
Autonumber Key
The distribution of the number of successes in n independent, identical trials, where each trial has probability p of success
Binomial distribution
An increasingly popular term referring to the insights gained from data analysis
Business Intelligence (BI)
A new column created in the Data Model by using a DAX formula
Calculated column
States that the distribution of the sample mean is approximately normal for sufficiently large sample sizes
Central limit theorem
Skewed distribution useful for estimating standard deviations
Chi-square distribution
Test to check whether two attributes are probabilistically independent
Chi-square test for independence
Process of removing errors from a data set
Cleansing Data
A sample where the population is separated into clusters, such as cities or city blocks, and then a random sample of the clusters is selected
Cluster sampling
Updates the probability of an event, given the knowledge that another event has occurred
Conditional probability formula
An interval around the point estimate, calculated from the sample data, where the true value of the population parameter is very likely to be
Confidence interval
An interval that, with a stated level of confidence, captures a population parameter
Confidence interval
Interval that is likely to capture a population mean
Confidence interval for a mean
Interval that is likely to capture the proportion of all population members that satisfy a specified property
Confidence interval for a proportion
Interval that is likely to capture a population standard deviation
Confidence interval for a standard deviation
Interval that is likely to capture the total of all observations in a population
Confidence interval for a total
Interval that is likely to capture the difference between two population means when the samples are independent
Confidence interval for difference between means with independent samples
Interval that is likely to capture the difference between two population means when the samples are paired in a natural way
Confidence interval for difference between means with paired samples
Interval that is likely to capture the difference between similarly defined proportions from two populations
Confidence interval for difference between proportions
An interval that is very likely to contain the population mean mean
Confidence interval for population mean
Percentage (usually 90%, 95%, or 99%) that indicates how confident you are that the confidence interval captures the true population parameter
Confidence level
A relationship where predicted Y changes by a constant percentage when any X changes by 1%; requires logarithmic transformations
Constant elasticity (or multiplicative) relationship
A measure of the strength of the linear relationship between two variables X and Y
Correlation
"Less than or equal to" probabilities associated with a random variable
Cumulative probability
As used by Tableau Public, a collection of related charts used to tell different aspects of the same basic story
Dashboard
A language recently developed to create calculated columns and measures in Power Pivot
Data Analysis Expressions (DAX) language
Imaginative graphs for providing insights not immediately obvious from numbers alone
Data Visualization
Skewed distribution useful for testing equality of variances
F distribution
Specifies the probability distribution of a continuous random variable
Density function
The variable being estimated or predicted in a regression analysis
Dependent (or response) variable
General terms used by Tableau Public and others, where measures are numeric fields to be summarized and dimensions are usually categorical fields used to break measures down by
Dimensions and Measures
Variables coded as 0 or 1, used to capture categorical variables in a regression analysis
Dummy (or indicator) variables
About 68% of the data fall within one standard deviation of the mean, about 95% of the data fall within two standard deviations of the mean, and almost all fall within three standard deviations of the mean
Empirical rules for normal distribution
A recent Microsoft technology used to mimic relational databases, but all within Excel
Excel Data Model
Events where at least one of them must occur
Exhaustive events
The variables used to explain or predict the dependent variable
Explanatory (or independent) variables
A continuous probability distribution useful for measuring times between events, such as customer arrivals to a service facility; mean and standard deviation both equal the reciprocal of the parameter
Exponential distribution
Test for equality of two population variances, used to check an assumption of two-sample t test for difference between means
F test for equality of two variances
A correction for the standard error when the sample size is fairly large relative to the population size
Finite population correction
The predicted value of the dependent variable, found by substituting explanatory values into the regression equation
Fitted value
An older name for a single-table database
Flat file
A field in a database table that is related to a primary key in another table
Foreign Key
A list of all members of the population
Frame
A natural sequence of fields such as Country, State, City that can be used to drill down in a pivot table
Hierarchy
Relevant when sampling without replacement, especially when the fraction of the population sampled is large
Hypergeometric distribution
Products of explanatory variables, used when the effect of one on the dependent variable depends on the value of the other
Interaction variables
Any sample that is chosen according to a sampler's judgment rather than a random mechanism
Judgmental sample
A particular multiplicative relationship used to indicate how cost or time in production decreases over time
Learning curve
The regression equation that minimizes the sum of squared residuals
Least squares line
A table used to transform a many-to-many relationship between two tables into two one-to-many relationships; usually composed mostly of foreign keys
Linking Table
A relationship between two tables where each record in each table can be associated with many records in the other table
Many-to-many relationship
A measure of central tendency—the weighted sum of the possible values, weighted by their probabilities
Mean (or expected value) of a probability distribution
The mean and standard deviation of a binomial distribution with parameters n and p are np and , respectively
Mean and standard deviation of a binomial distribution
Indicates property of unbiasedness of sample mean
Mean of sample mean
A summarization of data in a Data Model created with a DAX formula and used in the Values area of a pivot table
Measure
A regression model using logarithms of Ys and/or Xs
Model with logarithmic transformations
A regression model with any number of explanatory variables
Multiple regression
Formula for the probability that two events both occur
Multiplication rule
Variables created to capture nonlinear relationships in a regression model
Nonlinear transformations
Any type of estimation error that is not sampling error, including nonresponse bias, nontruthful responses, measurement error, and voluntary response bias
Nonsampling error
A continuous distribution with possible values ranging over the entire number line; its density function is a symmetric bell-shaped curve
Normal Distribution
Useful for finding probabilities and percentiles for nonstandard and standard normal distributions
Normal calculations in Excel
Hypothesis that represents the current thinking or status quo
Null hypothesis
Test where values in only one direction will lead to rejection of the null hypothesis
One-tailed test
A relationship between two tables where each record in the "one" table can be associated with many records in the "many" table but where each record in the "many" table can be associated with only one record in the "one" table
One-to-many relationship
A single numeric value, a "best guess" of a population parameter, based on the data in a sample
Point estimate
A discrete probability distribution that often describes the number of events occurring within a specified period of time or space; mean and variance both equal the parameter 1
Poisson distribution
Contains all members about which a study intends to make inferences
Population
Probability of correctly rejecting the null when it is false
Power
A recent Excel add-in for creating more powerful pivot table reports than are possible with "regular" pivot tables
Power Pivot
A window for viewing and managing the data in an Excel Data Model
Power Pivot window
A recent set of Excel tools for importing external data into Excel from a variety of sources
Power Query
A field in a database table that serves as a unique identifier
Primary Key
Events where knowledge that one of them has occurred is of no value in assessing the probability that the other will occur
Probabilistically independent events
A number between 0 and 1 that measures the likelihood that some event will occur
Probability
Any sample that is chosen by using a random mechanism
Probability sample
A graphical representation of how events occur through time, useful for calculating probabilities of multiple events
Probability tree
The property of each stratum selected having the same proportion from stratum to stratum
Proportional sample sizes (in stratified sampling)
A regression model with linear and squared explanatory variables
Quadratic relationship
The percentage of variation in the response variable explained by the regression model
R -square
Associates a numeric value with each possible outcome in a situation involving uncertainty
Random Variable
A general method for estimating the relationship between a dependent variable and one or more explanatory variables
Regression analysis
The constant and the coefficients of the explanatory variables in a regression equation
Regression coefficients
Sample results that lead to rejection of null hypothesis
Rejection region
A database with multiple tables related by key fields, structured to avoid redundancy
Relational Database
A complete system, often server-based, for managing one or more corporate relational databases (Microsoft SQL Server, Oracle Database, IBM Db2, MySQL, and others)
Relational database management system (RDBMS)
The proportion of times the event occurs out of the number of times a random experiment is performed
Relative frequency
The difference between the actual and fitted values of the dependent variable
Residual
The probability of any event and the probability of its complement sum to 1
Rule of Complements
Formulas that specify the sample size(s) required to obtain sufficiently narrow confidence intervals
Sample size formulas
The distribution of the point estimates from all possible samples (of a given sample size) from the population
Sampling distribution
The inevitable result of basing an inference on a sample rather than on the entire population
Sampling error
Difference between the estimate of a population parameter and the true value of the parameter
Sampling error (or estimation error)
Potential members of a sample from a population
Sampling units
Sampling where any member of the population can be sampled more than once
Sampling with replacement
Sampling where no member of the population can be sampled more than once
Sampling without replacement
Refers to business intelligence that can be generated by employees in all areas of business, without need for IT department help, by using powerful, user-friendly software
Self-Service BI
The probability of a type I error an analyst chooses
Significance level
A sample where each member of the population has the same chance of being chosen
Simple random sample
A regression model with a single explanatory variable
Simple regression
A database with all data in a single table
Single-table database
A measure of variability: the square root of the variance
Standard deviation of a probability distribution
The standard deviation of the sampling distribution of the estimate
Standard error of an estimate
Essentially, the standard deviation of the residuals; indicates the magnitude of the prediction errors
Standard error of estimate
Indicates how sample means from different samples vary
Standard error of sample mean
Transforms any normal distribution with mean and standard deviation to the standard normal distribution with mean 0 and standard deviation 1
Standardizing a normal random variable
Sample results that lead to rejection of null hypothesis
Statistically significant results
As used by Tableau Public, a sequence of visualizations that tell a data narrative, somewhat like a sequence of PowerPoint slides
Story
Sampling in which the population is divided into relatively homogeneous subsets called strata, and then random samples are taken from each of the strata
Stratified sampling
A language developed to express queries that can be used (with minor modifications) for all RDBMS
Structured Query Language (SQL)
A sample where one of the first k members is selected randomly, and then every kth member after this one is selected
Systematic sample
A free software package developed by Tableau Software for creating data visualizations
Tableau Public
Tests to check whether a population is normally distributed; possibilities include chi-square test, Lilliefors test, and Q-Q plot
Tests for normality
Test where values in both directions will lead to rejection of the null hypothesis
Two-tailed test
Error committed when null hypothesis is true but is rejected
Type I error
Error committed when null hypothesis is false but is not rejected
Type II error
An estimate where the mean of its sampling distribution equals the value of the parameter being estimated
Unbiased estimate
Checks how well a regression model based on one sample predicts a related sample
Validation of fit
A measure of variability: the weighted sum of the squared deviations of the possible values from the mean, weighted by the probabilities
Variance of a probability distribution
Short for visualization, any workbook created in Tableau Public containing charts, dashboards, and/or stories
Viz
Probability of observing a sample result at least as extreme as the one actually observed
p-value
The sampling distribution of the standardized sample mean when the sample standard deviation is used in place of the population standard deviation
t distribution
Test for a mean from a single population
t test for a population mean
Test for the difference between two population means when samples are independent
t test for difference between means from independent samples
Test for the difference between two population means when samples are paired in a natural way
t test for difference between means from paired samples
Test for a proportion from a single population
z test for a population proportion
Test for difference between similarly defined proportions from two populations
z test for difference between proportions