stat business quiz 1
instance
A single occurrence of an entity is called an
Database Management System (DBMS)
A software application for defining, manipulating, and managing data in databases
volume
An immense amount of data is compiled from a single source or a wide range of sources, including business transactions, household and personal devices, manufacturing equipment, social media, and other online portals.
Predictive Analytics
Analytical models that help identify associations between variables, and these associations are used to estimate the likelihood of a specific outcome
variety
Data also come in all types, forms, and granularity, both structured and unstructured. These data may include numbers, text, and figures as well as audio, video, e-mails, and other multimedia elements.
star schema
Data in a data mart are organized using a multidimensional data model called a _______________, which includes dimension and fact tables.
data transformation
Data transformation is the process of converting data from one format or structure to another.
Predictive Analytics
Examples of ____________ include identifying customers who are most likely to respond to specific marketing campaigns, admitted students who are likely to enroll, credit card transactions that are likely to be fraudulent, or the incidence of crime at certain regions and times.
velocity
In addition to volume, data from a variety of sources get generated at a rapid speed. Managing these data streams can become a critical issue for many organizations.
Data Privacy / Information privacy
Its concerns revolve around (a) how data are legally collected and stored; (b) if and how data are shared with third parties; and (c) how data collection, usage, and transmission meet all regulatory obligations.
interval
Observations can be categorized and ranked, and differences between observations are meaningful. The main drawback of the ________________ is that the value of zero is arbitrarily chosen.
ordinal
Observations can be categorized and ranked; however, differences between the ranked observations are meaningless
nominal
Observations differ merely by name or label.
ratio
Observations have all the characteristics of an interval-scaled variable as well as a true zero point; thus, meaningful ratios can be calculated.
Database Management System (DBMS)
Popular ___________ packages include Oracle, IBM DB2, SQL Server, MySQL, and Microsoft Access.
imputation
The ___________ strategy replaces missing values with some reasonable imputed values
Structured Query Language (SQL)
The most popular query language used today is
subsetting
The process of extracting portions of a data set that are relevant to the analysis is called
data wrangling
__________ helps in data quality, reducing the time and effort required to perform analytics, and helping reveal the true intelligence in the data.
Prescriptive analytics
_____________Examples include providing advice on scheduling employees' work hours and adjusting supply level in order to meet customer demand, selecting a mix of products to manufacture, choosing an investment portfolio to meet a financial goal, or targeting marketing campaigns to specific customer groups on a limited budget.
data ethics
a branch of ethics that studies and evaluates moral problems related to data
data warehouse
a central repository of data from multiple functional areas within an organization
database
a collection of data logically organized to enable easy retrieval, management, and distribution of data.
Delimited format
a comma is called a delimiter, and the file is called a comma-delimited or comma-separated value (csv)
XML
a simple text-based markup language for representing structured data. It uses user-defined markup tags to specify the structure of data.
omission strategy
also called complete-case analysis, recommends that observations with missing values be excluded from the analysis.
dummy variable
also referred to as an indicator or a binary variable, takes on values of 1 or 0 to describe two categories of a categorical variable.
data
are compilations of facts, figures, or other contents, both numerical and nonnumerical
discrete variable
assumes a countable number of values
Data Privacy / Information privacy
branch of data security related to the proper collection, usage, and transmission of data.
continuous variable
characterized by uncountable values within an interval. Weight, height, time, and investment return
Business analytics
combines qualitative reasoning with quantitative tools to identify key business problems and translate data analysis into decisions that improve business performance.
Data Privacy / Information privacy
confidentiality, transparency, and accountability are the three key principles of _______________
population
consists of all items of interest in a statistical problem.
relational database
consists of one or more logically related data tables, where each data table is a two-dimensional grid that consists of rows and columns.
Cross-sectional data
data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time.
Time series data
data collected over several time periods focusing on certain groups of people, specific events, or objects.
unstructured data
do not conform to a predefined, row-column format. They tend to be textual (e.g., written reports, e-mail messages, doctor's notes, or open-ended survey responses) or have multimedia contents (e.g., photographs, videos, and audio data).
fixed-width format
each column starts and ends at the same place in every row. The actual data are stored as plain text characters.
Prescriptive analytics
explores several possible actions and suggests a course of action.
Descriptive Analytics
financial reports, public health statistics, enrollment at universities, student report cards, and crime rates across regions and time are examples of
Business analytics
is a broad topic, encompassing statistics, computer science, and information systems with a wide variety of applications in marketing, human resource management, economics, finance, health, sports, politics, etc
entity
is a generalized category to represent persons, places, things, or events about which we want to store data in a database table
entity-relationship diagram (ERD)
is a graphical representation used to model the structure of the data
"Not Only SQL" database
is a non-relational database that supports the storage of a wide range of data types including structured, semistructured, and unstructured data
composite primary key
is a primary key that contains more than one attribute.
information
is a set of data that are organized and processed in a meaningful and purposeful way
HTML
is a simple text-based markup language for displaying content in web browsers.
data mart
is a small-scale data warehouse or a subset of the enterprise data warehouse that focuses on one particular subject or decision area
JSON
is a standard for transmitting human-readable data in compact files.
Sample
is a subset of the population.
primary key (PK)
is an attribute that uniquely identifies each instance of the entity
foreign key (FK)
is defined as a primary key of a related entity
knowledge
is derived from a blend of data, contextual information, experience, and intuition.
data modeling
is the process of defining the structure of a database
binning
is the process of transforming numerical variables into categorical variables by grouping the numerical values into a small number of groups or bins.
big data
massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data-processing tools
Descriptive analytics
refers to gathering, organizing, tabulating, and visualizing data to summarize "what has happened?"
veracity
refers to the credibility and quality of data
structured data
reside in a predefined, row-column format
Data Wrangling
the process of retrieving, cleansing, integrating, transforming, and enriching data to support analytics.
data mangement
the process that an organization uses to acquire, organize, store, manipulate, and distribute data.
Predictive analytics
using historical data to predict "what could happen in the future?"
Prescriptive analytics
using optimization and simulation algorithms to provide advice on "what should we do?"
Three Vs of Big Data
velocity, volume, variety
