BA II Chp.1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

variety

data also comes in all types, forms, and granularity, both structured and unstructured. May include numbers, text, and figures as well as audio, video, emails, and other multimedia elements.

Veracity

in addition to the 3 Vs, Veracity refers to the credibility and quality of data

Structured data that is machine-generated

includes information from manufacturing sensors (rotations per minute), speed cameras (miles per hour), web server logs (number of visitors, etc.

Structured data that is human-generated

includes information on price, income, retail sales, age, gender, etc.

Unstructured data that is machine-generated

includes satellite images, meteorological data, surveillance video data, traffic camera images, and others.

Unstructured data that is human-generated

includes texts of internal emails, social media data, presentations, mobile phone convos, and text message data and so on.

Big data

is a catchphrase, meaning a massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data-processing tools.

eXtensible Markup Language (XML)

is a simple language for representing structured data. Uses markup tags to define the structure of data. Is case-sensitive. This formate is designed to support readability

Sample

is a subset of population. We examine this data to make inferences about the population.

A Continuous Variable

is characterized by uncountable values within an interval. Weight, height, time, and investment return for example. In practice, however, continuous variables are often measured in discrete values (i.e., rounding)

knowledge

is derived from a blend of data, contextual information, experience, and intuition.

Data for any variable can be classified into one of four major measurement scales:

nominal, ordinal, interval, or ratio

cross-sectional data

refer to data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time.

time series data

refer to data collected over several time periods focusing on certain groups of people, specific events, or objects.

The ordinal scale

reflects a stronger level of measurement compared to nominal scale. We are able to both categorize and rank the data with respect to some characteristic or trait

Nominal scale

represents the least sophisticated level of measurement. If presented with this data, all we can do is categorize or group the data. The values in the data set differ merely by label or name.

The ratio scale

represents the strongest level of measurement. Has all the characteristics of the interval scale as well as a true zero point, which allows us to interpret the ratios between observations.

Sample data is collected by

cross-sectional data or time series data

Predictive and prescriptive analytics example for apple music

"What are the key factors that influence a U.S.-based female listener's music choice?" this answer cannot be found in enterprise database

Business intelligence (BI) example for Apple Music or Spotify would be....

"during the first quarter of 2020, how many country songs recommended by the music service were skipped by U.S.-based female listener within five seconds of playing?"

Structured data examples include

-numbers, dates, and groups of words and numbers, typically stored in a tabular format. -Point-of-sale and financial data.

JSON (JavaScript Object Notation)

A human-readable text format for data interchange that defines attributes and values in a document.

Volume

An immense amount of data is compiled from a single source or a wide range of sources, including business transactions, household and personal devices, manufacturing equipment, social media, and other online portals.

Descriptive Analytics is often referred to as...

Business intelligence (BI). It uses past data integrated from multiple sources to inform decision making and identify problems and solutions

Information

Data that have been organized, analyzed, and processed in a meaningful and purposeful way.

Variable

For business analytics, we invariably focus on people, firms, or events with particular characteristics. When a characteristic of interest differs in kind or degree among carious observations (records) then the characteristic can be termed a variable.

Structured data

Generally reside in a predefined, row-column format. Spreadsheet or databased applications are used to enter, store, query, and analyze structured data. Often consisting of numerical information that is objective and it not open to interpretation.

Data

In general are compilations of facts, figures, or other contents, both numerical and nonnumerical. Data of all type and formats are generated from multiple sources.

tabular format

The presentation of information such as text and numbers in tables.

Ratio scale continued

The ratio scale is used in many business application. Variables such as sales, profits, and inventory level are expressed on the ratio scale. A meaningful zero point allows us to state, for example, that profits for firm A are double those of firm B. Variables such as weight, time, and distance are also measured on a ratio scale because zero is meaningful.

The three Vs of big data

Volume, Variety, Velocity

Predictive Analytics Answers

What could happen? example: identifying customers who are likely to to respond to specific marketing campaigns, admitted students who are likely to enroll.

Descriptive Analytics Answers

What has happened? example: financial reports, enrollment at universities, student report cards.

Prescriptive Analytics Answers

What should we do? It explores several possible actions and and suggests a course of action. example: choosing an investment portfolio to meet a financial goal, targeting marketing campaigns to specific customer groups on limited budget.

Nominal and ordinal scales

are used for categorical variables

Interval and ratio scales

are used for numerical variables.

A Discrete Variable

assumes a countable number of values. Example you cant have 1.3 children, score 90.25 points on a basketball game, a stock price can take on a value of $20.37 or $20.38 but cannot take on a value between these two points

Population Data

consists of all observations or items of interest in an analysis

We rely on sampling because we are unable to use population data for two main reason

cost and imposible

Velocity

data from a variety of sources get generated at a rapid speed

Value

derived from big data is perhaps the most important aspect of any analytics initiative.

Unstructured data (or unmodeled data)

does not conform to predefined, row-column format like (Struct Data). Tends to be textual (e.g., written reports, e-mail messages, DR's notes) or have multimedia contents (e.g., photographs, videos, and audio data).

In a data file with a fixed-width format (or fixed-length format) used to store tabular data

each column starts and ends at the same place in every row. store only raw data. limits the amount of characters

In a delimited file,

each piece of data can contain as many characters as applicable.

Another widely used file format to store tabular data is delimited format

each piece of data is separated by a comma. A comma in this formate is called a delimiter, and the file is called a comma-delimited or comma-separated value (CSV) file.

HTML (Hypertext Markup Language)

the predominant language used to create web pages

Interval Scale

we are able to categorize and rank the data as well as find meaningful differences between observations. Example: fahrenheit scale, not only is 60 degrees f hotter than 50 degrees f, the same difference of 10 degrees also exists between 90 and 80 degrees f. main draw back is that the value zero is arbitrarily chosen

Ordinal scale weakness

we cannot interpret the difference between the ranked values because the actual numbers used are arbitrary. Example: (category) excellent= 5 (rating)

For a categorical variable

we use labels or names to identify the distinguishing characteristic of each observation. It can be defined by more than two categories. Example: marital status, course grade

For numerical variable

we use numbers to identify the distinguishing characteristic of each observation. They are either discrete or continuous


Ensembles d'études connexes

Chapter 40 - Corporate Directors, Officers, & Shareholders (Final Exam)

View Set

Chapter 1- Application, Underwriting, and Delivering Policy

View Set

ACSR 6 - Causes of Loss and Coverage Forms

View Set

History of Microbiology- Exam #1

View Set

9.9: Smooth muscle is nonstriated involuntary muscle

View Set

Anatomy Unit 4: Practice Questions

View Set

The subject matter of the legal system

View Set

International Business Law Quiz 1

View Set