Acct Stats - Exam 1

Ace your homework & exams now with Quizwiz!

measures of shape

(skewness coefficient, kurtosis coefficient)

data warehouse

a central repository of data from multiple departments within an organization to support managerial decision making

database

a collection of data logically organized to enable easy retrieval, management, and distribution of data

binning

a common data transformation technique that converts numerical variables into categorical variables by grouping the numerical values into a small number of bins

boxplot

a convenient way to graphically display the five-number summary of a variable

variable

a general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree

entity

a generalized category to represent persons, places, things, or events about which we want to store data in a database table

entity-relationship diagram (ERD)

a graphical representation used to model the structure of the data

scatterplot

a graphical tool that helps in determining whether or not two numerical variables are related in some systematic way

structured query language (SQL)

a language for manipulating data in a relational database using relatively simple and intuitive commands, the basic structure consists of the select, from, where keywords

big data

a massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data-processing tools

composite primary key

a primary key that contains more than one attribute

histogram

a series of rectangles where the width and height of each rectangle represent the interval width and frequency of the respective interval

information

a set of data that are organized and processed in a meaningful and purposeful way

HyperText Markup Language (HTML)

a simple text-based markup language for displaying content in web browsers

eXtensible Markup Language (XML)

a simple text-based markup language for representing structured data, uses user-defined markup tags to specify the structure of data

instance

a single occurrence of an entity

database management system (DBMS)

a software application for defining, manipulating, and managing data in databases

JavaScript Object Notation (JSON)

a standard for transmitting human-readable data in compact files

sample

a subset of data used for the analysis and to make inferences about the population

primary key

an attribute that uniquely identifies each instance of an entity

mean absolute deviation (MAD)

an average of the absolute differences between the observations and the mean

variance

an average of the squared differences between the observation and the mean

volume

an immense amount of data is compiled from a single source or a wide range of sources

heat map

an important visualization tool that uses color or color intensity to display relationships between variables

dummy variable

an indicator or a binary variable, takes on values of 1 or 0 to describe two categories of a categorical variable

empirical rule

approx 68% of all observations fall in one standard deviation, approx 95% of all observations fall within two standard deviations, and approx 100% fall within three standard deviations from the mean

discrete variable

assumes a countable number of values

dimension table

business dimensions of interest such as customer, product, location, and time

continuous variable

characterized by uncountable values within an interval

business analytics

combines qualitative reasoning with quantitative tools to identify key business problems and translate data analysis into decisions that improve business performance

data

compilations of facts, figures, or other contents, both numerical and nonnumerical

population

consists of all observations or items of interest in an analysis

relational database

consists of one or more logically related data files, where each file is a two-dimensional grid that consists of rows and columns

fact table

contains facts about the business operation, often in a quantitative format

standardizing

converting observations into z-scores

cross-sectional data

data collected by recoding a characteristic of many subjects at the same point in time, or without regard to differences in time

time series data

data collected over several time periods focusing on certain groups of people, specific events, or objects

variety

data come in all types, forms, and granularity, both structured and unstructured

velocity

data from a variety of sources get generated at a rapid speed

bar chart

depicts the frequency or the relative frequency for each category of the categorical variable as a series of horizontal or vertical bars

knowledge

derived from a blend of data, contextual information, experience, and intuition

stacked column chart

designed to visualize more than one categorical variable, allows for the comparison of composition within each category

unstructured data

do not conform to a predefined, row-column format

delimited format

each column is separated by a delimiter such as a comma

relative frequency

equals the proportion of observations in each category or interval compared to the whole

measures of dispersion

gauge the variability of a data set (range, interquartile range, mean absolute deviation, variance, and standard deviation)

frequency distribution

groups the data into categories and records the number of observations that fall into each category for a categorical variable, for a numerical variable it groups data into intervals and records the number of observations that fall into each interval

fixed-width format

in a data file where each column starts and ends at the same place in every row

correlation coefficient

indicates the direction and the strength of the linear relationship between x and y (-1 = perfect negative lin rel, 0 = not linearly related, 1 = perfect positive lin rel)

covariance

indicates whether x and y have a negative linear relationship, positive linear relationship, or no linear relationship (negative # = negative lin rel, positive # = positive lin rel, 0 = no lin rel)

skewness coefficient

measures the degree to which a distribution is not symmetric about its mean (symmetric = 0, positively skewed = positive, negatively skewed = negative)

z-score

measures the relative position of an observation within a distribution

kurtosis coefficient

measures whether the tails of a distribution are more or less extreme than the normal distribution (normal = 3, excess = coefficient - 3) more extreme = leptokurtic, less extreme = platykurtic

interval scale

observations can be categorized and ranked, and differences between observations are meaningful, the value of zero is arbitrarily chosen

ordinal scale

observations can be categorized and ranked, however the differences between the ranked observations are meaningless

nominal scale

observations differ merely by name or label, the least sophisticated level of measurement

ratio scale

observations have all the characteristics of interval-scaled data as well as a true zero point, meaningful ratios can be calculated

business intelligence

provides organizations and their users with the ability to access and manipulate data interactively

categorical

qualitative, observations represent categories

measures of association

quantify the direction and strength of the linear relationship between two variables (covariance, correlation coefficient)

numerical

quantitative, observations represent meaningful numbers

omission

recommends that observations with missing values be excluded from subsequent analysis

imputation

recommends that the missing values be replaced with some reasonable imputed values

descriptive analytics

refers to gathering, organizing, tabulating, and visualizing data to summarize 'what has happened?'

predictive analytics

refers to using historical data to predict 'what could happen in the future?'

prescriptive analytics

refers to using optimization and simulation algorithms to provide advice on 'what should we do?'

measures of central location

relates to the way numerical data tend to cluster around some middle or central value (mean, median, mode, percentile)

relationship between entities

represents certain business facts or rules, one-to-one, one-to-many, or many-to-many

structured data

reside in a predefined, row-column format

line chart

shows a numerical variable as a series of data points connected by a line

contingency table

shows the frequencies for two categorical variables, x and y, where each cell represents a mutually exclusive combination of the pair of x and y values

bubble plot

shows the relationship between three numerical variables in a two-dimensional graph

scatterplot with a categorical variable

shows the relationship between two numerical variables and a categorical variable in a two-dimensional graph

data mart

small-scale data warehouses that only contain data that are relevant to certain subjects or decision areas

star schema

structure of a data mart conforms to this multidimensional data model

veracity

the credibility and quality of the data

data transformation

the data conversion process from one format or structure to another

range

the difference between the maximum and the minimum observations

interquartile range (IQR)

the difference between the third quartile and the first quartile, does not rely on extreme observations

negatively skewed distribution

the long tail extending off to the left, with a small number of relatively small variables

positively skewed distribution

the long tail that extends to the right reflects the presence of a small number of relatively large variables

median

the middle observation of a variable

mode

the most frequently occurring observation of a variable

value

the most important aspect of any analytics initiative

parameter

the population mean

standard deviation

the positive square root of the variance

outliers

the presence of extremely small or large observations

foreign key

the primary key of a related entity

arithmetic mean

the primary measure of central location

data modeling

the process of defining the structure of a database

subsetting

the process of extracting parts of a data set that is of interest to the analytics professional

data wrangling

the process of retrieving, cleansing, integrating, transforming, and enriching data to support subsequent data analysis

data management

the process that an organization uses to acquire, organize, store, manipulate, and distribute data

statistic

the sample mean


Related study sets

Algebra and Equation Solving module (videos 15-26)

View Set

Cognitive Psychology: The Acquisition of Memories and the Working-Memory System

View Set

Ch 9 Domestic U.S. and Global Logistics

View Set

Intro to public speaking Midterm study guide

View Set

A&P II: Chapter 22 - The Respiratory System

View Set

AP EURO-Chapter 19: A Revolution in Politics: The Era of the French Revolution and Napoleon

View Set

Meteorology: Chapter 13: Weather Forecasting

View Set