Exam 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Events where only one of them can occur

Mutually Exclusive Events

A statement that specifies exactly which data a user requires from a database

Query

The probability that at least one of a set of mutually exclusive events will occur is the sum of their probabilities

Addition rule for mutually exclusive events

A measure similar to R-square , but adjusted for the number of explanatory variables in the equation

Adjusted R-square

Typically, the hypothesis the analyst is trying to prove or research hypothesis

Alternative hypothesis

A primary key that automatically assigns consecutive integer values

Autonumber Key

The distribution of the number of successes in n independent, identical trials, where each trial has probability p of success

Binomial distribution

An increasingly popular term referring to the insights gained from data analysis

Business Intelligence (BI)

A new column created in the Data Model by using a DAX formula

Calculated column

States that the distribution of the sample mean is approximately normal for sufficiently large sample sizes

Central limit theorem

Skewed distribution useful for estimating standard deviations

Chi-square distribution

Test to check whether two attributes are probabilistically independent

Chi-square test for independence

Process of removing errors from a data set

Cleansing Data

A sample where the population is separated into clusters, such as cities or city blocks, and then a random sample of the clusters is selected

Cluster sampling

Updates the probability of an event, given the knowledge that another event has occurred

Conditional probability formula

An interval around the point estimate, calculated from the sample data, where the true value of the population parameter is very likely to be

Confidence interval

An interval that, with a stated level of confidence, captures a population parameter

Confidence interval

Interval that is likely to capture a population mean

Confidence interval for a mean

Interval that is likely to capture the proportion of all population members that satisfy a specified property

Confidence interval for a proportion

Interval that is likely to capture a population standard deviation

Confidence interval for a standard deviation

Interval that is likely to capture the total of all observations in a population

Confidence interval for a total

Interval that is likely to capture the difference between two population means when the samples are independent

Confidence interval for difference between means with independent samples

Interval that is likely to capture the difference between two population means when the samples are paired in a natural way

Confidence interval for difference between means with paired samples

Interval that is likely to capture the difference between similarly defined proportions from two populations

Confidence interval for difference between proportions

An interval that is very likely to contain the population mean mean

Confidence interval for population mean

Percentage (usually 90%, 95%, or 99%) that indicates how confident you are that the confidence interval captures the true population parameter

Confidence level

A relationship where predicted Y changes by a constant percentage when any X changes by 1%; requires logarithmic transformations

Constant elasticity (or multiplicative) relationship

A measure of the strength of the linear relationship between two variables X and Y

Correlation

"Less than or equal to" probabilities associated with a random variable

Cumulative probability

As used by Tableau Public, a collection of related charts used to tell different aspects of the same basic story

Dashboard

A language recently developed to create calculated columns and measures in Power Pivot

Data Analysis Expressions (DAX) language

Imaginative graphs for providing insights not immediately obvious from numbers alone

Data Visualization

Skewed distribution useful for testing equality of variances

F distribution

Specifies the probability distribution of a continuous random variable

Density function

The variable being estimated or predicted in a regression analysis

Dependent (or response) variable

General terms used by Tableau Public and others, where measures are numeric fields to be summarized and dimensions are usually categorical fields used to break measures down by

Dimensions and Measures

Variables coded as 0 or 1, used to capture categorical variables in a regression analysis

Dummy (or indicator) variables

About 68% of the data fall within one standard deviation of the mean, about 95% of the data fall within two standard deviations of the mean, and almost all fall within three standard deviations of the mean

Empirical rules for normal distribution

A recent Microsoft technology used to mimic relational databases, but all within Excel

Excel Data Model

Events where at least one of them must occur

Exhaustive events

The variables used to explain or predict the dependent variable

Explanatory (or independent) variables

A continuous probability distribution useful for measuring times between events, such as customer arrivals to a service facility; mean and standard deviation both equal the reciprocal of the parameter

Exponential distribution

Test for equality of two population variances, used to check an assumption of two-sample t test for difference between means

F test for equality of two variances

A correction for the standard error when the sample size is fairly large relative to the population size

Finite population correction

The predicted value of the dependent variable, found by substituting explanatory values into the regression equation

Fitted value

An older name for a single-table database

Flat file

A field in a database table that is related to a primary key in another table

Foreign Key

A list of all members of the population

Frame

A natural sequence of fields such as Country, State, City that can be used to drill down in a pivot table

Hierarchy

Relevant when sampling without replacement, especially when the fraction of the population sampled is large

Hypergeometric distribution

Products of explanatory variables, used when the effect of one on the dependent variable depends on the value of the other

Interaction variables

Any sample that is chosen according to a sampler's judgment rather than a random mechanism

Judgmental sample

A particular multiplicative relationship used to indicate how cost or time in production decreases over time

Learning curve

The regression equation that minimizes the sum of squared residuals

Least squares line

A table used to transform a many-to-many relationship between two tables into two one-to-many relationships; usually composed mostly of foreign keys

Linking Table

A relationship between two tables where each record in each table can be associated with many records in the other table

Many-to-many relationship

A measure of central tendency—the weighted sum of the possible values, weighted by their probabilities

Mean (or expected value) of a probability distribution

The mean and standard deviation of a binomial distribution with parameters n and p are np and ‍, respectively

Mean and standard deviation of a binomial distribution

Indicates property of unbiasedness of sample mean

Mean of sample mean

A summarization of data in a Data Model created with a DAX formula and used in the Values area of a pivot table

Measure

A regression model using logarithms of Ys and/or Xs

Model with logarithmic transformations

A regression model with any number of explanatory variables

Multiple regression

Formula for the probability that two events both occur

Multiplication rule

Variables created to capture nonlinear relationships in a regression model

Nonlinear transformations

Any type of estimation error that is not sampling error, including nonresponse bias, nontruthful responses, measurement error, and voluntary response bias

Nonsampling error

A continuous distribution with possible values ranging over the entire number line; its density function is a symmetric bell-shaped curve

Normal Distribution

Useful for finding probabilities and percentiles for nonstandard and standard normal distributions

Normal calculations in Excel

Hypothesis that represents the current thinking or status quo

Null hypothesis

Test where values in only one direction will lead to rejection of the null hypothesis

One-tailed test

A relationship between two tables where each record in the "one" table can be associated with many records in the "many" table but where each record in the "many" table can be associated with only one record in the "one" table

One-to-many relationship

A single numeric value, a "best guess" of a population parameter, based on the data in a sample

Point estimate

A discrete probability distribution that often describes the number of events occurring within a specified period of time or space; mean and variance both equal the parameter 1

Poisson distribution

Contains all members about which a study intends to make inferences

Population

Probability of correctly rejecting the null when it is false

Power

A recent Excel add-in for creating more powerful pivot table reports than are possible with "regular" pivot tables

Power Pivot

A window for viewing and managing the data in an Excel Data Model

Power Pivot window

A recent set of Excel tools for importing external data into Excel from a variety of sources

Power Query

A field in a database table that serves as a unique identifier

Primary Key

Events where knowledge that one of them has occurred is of no value in assessing the probability that the other will occur

Probabilistically independent events

A number between 0 and 1 that measures the likelihood that some event will occur

Probability

Any sample that is chosen by using a random mechanism

Probability sample

A graphical representation of how events occur through time, useful for calculating probabilities of multiple events

Probability tree

The property of each stratum selected having the same proportion from stratum to stratum

Proportional sample sizes (in stratified sampling)

A regression model with linear and squared explanatory variables

Quadratic relationship

The percentage of variation in the response variable explained by the regression model

R -square

Associates a numeric value with each possible outcome in a situation involving uncertainty

Random Variable

A general method for estimating the relationship between a dependent variable and one or more explanatory variables

Regression analysis

The constant and the coefficients of the explanatory variables in a regression equation

Regression coefficients

Sample results that lead to rejection of null hypothesis

Rejection region

A database with multiple tables related by key fields, structured to avoid redundancy

Relational Database

A complete system, often server-based, for managing one or more corporate relational databases (Microsoft SQL Server, Oracle Database, IBM Db2, MySQL, and others)

Relational database management system (RDBMS)

The proportion of times the event occurs out of the number of times a random experiment is performed

Relative frequency

The difference between the actual and fitted values of the dependent variable

Residual

The probability of any event and the probability of its complement sum to 1

Rule of Complements

Formulas that specify the sample size(s) required to obtain sufficiently narrow confidence intervals

Sample size formulas

The distribution of the point estimates from all possible samples (of a given sample size) from the population

Sampling distribution

The inevitable result of basing an inference on a sample rather than on the entire population

Sampling error

Difference between the estimate of a population parameter and the true value of the parameter

Sampling error (or estimation error)

Potential members of a sample from a population

Sampling units

Sampling where any member of the population can be sampled more than once

Sampling with replacement

Sampling where no member of the population can be sampled more than once

Sampling without replacement

Refers to business intelligence that can be generated by employees in all areas of business, without need for IT department help, by using powerful, user-friendly software

Self-Service BI

The probability of a type I error an analyst chooses

Significance level

A sample where each member of the population has the same chance of being chosen

Simple random sample

A regression model with a single explanatory variable

Simple regression

A database with all data in a single table

Single-table database

A measure of variability: the square root of the variance

Standard deviation of a probability distribution

The standard deviation of the sampling distribution of the estimate

Standard error of an estimate

Essentially, the standard deviation of the residuals; indicates the magnitude of the prediction errors

Standard error of estimate

Indicates how sample means from different samples vary

Standard error of sample mean

Transforms any normal distribution with mean and standard deviation to the standard normal distribution with mean 0 and standard deviation 1

Standardizing a normal random variable

Sample results that lead to rejection of null hypothesis

Statistically significant results

As used by Tableau Public, a sequence of visualizations that tell a data narrative, somewhat like a sequence of PowerPoint slides

Story

Sampling in which the population is divided into relatively homogeneous subsets called strata, and then random samples are taken from each of the strata

Stratified sampling

A language developed to express queries that can be used (with minor modifications) for all RDBMS

Structured Query Language (SQL)

A sample where one of the first k members is selected randomly, and then every kth member after this one is selected

Systematic sample

A free software package developed by Tableau Software for creating data visualizations

Tableau Public

Tests to check whether a population is normally distributed; possibilities include chi-square test, Lilliefors test, and Q-Q plot

Tests for normality

Test where values in both directions will lead to rejection of the null hypothesis

Two-tailed test

Error committed when null hypothesis is true but is rejected

Type I error

Error committed when null hypothesis is false but is not rejected

Type II error

An estimate where the mean of its sampling distribution equals the value of the parameter being estimated

Unbiased estimate

Checks how well a regression model based on one sample predicts a related sample

Validation of fit

A measure of variability: the weighted sum of the squared deviations of the possible values from the mean, weighted by the probabilities

Variance of a probability distribution

Short for visualization, any workbook created in Tableau Public containing charts, dashboards, and/or stories

Viz

Probability of observing a sample result at least as extreme as the one actually observed

p-value

The sampling distribution of the standardized sample mean when the sample standard deviation is used in place of the population standard deviation

t distribution

Test for a mean from a single population

t test for a population mean

Test for the difference between two population means when samples are independent

t test for difference between means from independent samples

Test for the difference between two population means when samples are paired in a natural way

t test for difference between means from paired samples

Test for a proportion from a single population

z test for a population proportion

Test for difference between similarly defined proportions from two populations

z test for difference between proportions


Kaugnay na mga set ng pag-aaral

Elements, Atoms, Molecules & Compounds

View Set

Divide and Conquer, Sorting and Searchings Algos

View Set

Financial Markets and Institutions Exam 2

View Set

RN Maternal Newborn Online Practice 2023 A

View Set

Exam 4:Davis/Med Surg Sucess: Thyroid

View Set

Ch 28 Infection Prevention and Control

View Set

Chapter 4- Radiographic imaging and exposure

View Set

AP Human Geography Unit 7 Study Set

View Set