Chapter 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

big data

A term used to describe a massive volume of both structured and unstructured data that are extremely difficult to manage, process, and analyze using traditional data-processing tools; does not imply complete (population) data

Variety

Data come in all types, forms, and granularity, both structured and unstructured. Can include text, numbers, figures and audio, video, emails, and other multimedia elements

Velocity

Data from a variety of sources get generated at a rapid speed

3 characteristics of big data

Volume Velocity Variety

information

a set of data that are organized and processed in a meaningful and purposeful way

HyperText Markup Language (HTML)

a simple text-based markup language for displaying content in web browsers

eXtensible Markup Language (XML)

a simple text-based markup language for representing structured data. Uses user-defined markup tags to specify the structure of data

JavaScript Object Notation (JSON)

a standard for transmitting human-readable data in compact files

population

all observations or items of interest in an analysis

Volume

an immense amount of data is complied from a single source or a wide range of sources, including business transactions, household and personal devices, manufacturing equipment, social media, and other online portals

numerical variable

assume meaningful numerical values

categorical variable

assume names or labels

discrete variable

assumes a countable number of values

continuous variable

characterized by uncountable values within an interval

data

compilations of facts, figures, or other contents, both numerical and nonnumerical

cross-sectional data

data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time

time series data

data collected over several time periods focusing on certain groups of people, specific events, or objects

structured data

data that reside in a predefined, row-column format

knowledge

derived from a blend of data, contextual information, experience, and intuition

unstructured data

do not conform to a predefined, row-column format; textual or multimedia contents

delimited format

each column is separated by a delimited such as a comma. Each column can contain as many characters as applicable

fixed-width format

every column starts and ends at the same place in every row; data stored as plain text characters in a digital file

descriptive analytics

gathering, organizing, tabulating, and visualizing data to summarize "what has happened?"

variable

general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree

nominal scale

least sophisticated level of measurement; observations differ merely by name or label

interval scale

observations can be categorized and ranked, and differences between observations are meaningful. Main drawback of this is that the value of zero is arbitrarily chosen

ordinal scale

observations can be categorized and ranked; however, differences between the ranked observations are meaningless

ratio scale

observations have all the characteristics of interval-scaled data as well as a true zero point; strongest level of measurement

Value

perhaps the most important aspect of any analysis initiative

advanced predictions

predictive & Prescriptive analytics; focus on building predictive and prescriptive models that help organizations understand what might happen in the future

business intelligence (BI)

provides historical, current, and predictive views of business operations and environments and gives organizations a competitive advantage in the marketplace; descriptive analytics

Veracity

refers to the credibility and quality of data

machine-generated

structured: information from manufacturing sensors, speed cameras, web server logs unstructured: satellite images, meteorological data, surveillance video data, traffic camera images

human-generated

structured: information on price, income, retail sales, gender, etc unstructured: texts of internal e-mails, social media data, presentations, mobile phone conversations, text message data, etc

sample

subset of the population

Business Analytics (BA)

uses data and statistical methods to gain insight into the data and provide decision makers with information they can act on

predictive analytics

using historical data to predict "what could happen in the future"?

prescriptive analytics

using optimization and simulation algorithms to provide advice on "what should we do"?


Set pelajaran terkait

Small Gas Engines Chapter 1 and 2

View Set

Chapter 24: Drugs for Seizure Disorders

View Set

SIEM (Security Information Management (SIM))

View Set

MTIOC FINALS ALKANES, ALKENES, ALKYNES

View Set