ECO 361 Midterm
One Tailed Test
< >
Two Tailed Test
= ≠
Variable
A general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree
Data Management
A process that an organization uses to acquire, organize, store, manipulate, and distribute data
Entity-Relationship Diagram (ERD)
A schematic used to illustrate the structure of the data
Information
A set of data that are organized and processed in a meaningful and purposeful way
Sample
A subset of the population
Big Data
A term used to describe a massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data-processing tools
Population
All observations or items of a interest in an analysis
Numerical Variable
Assumes meaningful numerical values
Categorical Variable
Assumes names or labels
Regression Analysis
Captures the relationship between two or more variables
Business Analytics
Combines qualitative reasoning with quantitative tools to identify key business problems and translate data analysis into decisions that improve business performance
Unstructured Data
Data that does not conform to a predefined data model.
Structured Data
Data that reside in a predefined row-column format
Knowledge
Derived from a blend of data, contextual information, experience, and intuition
Type II Error
Do not reject the null hypothesis when it is false
Delimited Format
Each column is separated by a comma and can contain as many characters as it wants
Fixed Width Format
Each column starts and ends at the same place in every row
Data
Facts, figures, or other contents, both numerical and nonnumerical
d=0
For the other categories
Least Sophisticated Measurement Scale
Nominal
d=1
One of the categories
Linear Regression
Postulates that the relationship between the response and predictors is linear
Most Sophisticated Measurement Scale
Ratio
Imputation Strategy
Recommends that missing values be replaced with some reasonable imputed values
Omission Strategy
Recommends that observations with missing values be excluded from subsequent analysis
Type I Error
Reject the null hypothesis when it is true
Relational Database
The most common type of database that consists of one or more logically related data files, where each data file is a 2D grid that consists of rows/columns
Confidence Interval
The narrower the interval, the more precise it is
Sample Proportion P
The point estimator of the population proportion p
Parameter P
The proportion of successes in the population
Dummy Variable
Used to describe two categories of a categorical variable, denoted d
Reject Null Hypothesis
When the sample evidence is inconsistent with the null hypothesis (p<a)
Do Not Reject Null Hypothesis
When the sample evidence is not inconsistent with the null (p>a)
Data Wrangling
process of retrieving, cleansing, integrating, transforming, and enriching data to support analytics