Week 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

Who Uses Big Data?

Big data plays a role in almost every industry. SAS (n.d.) provides a summary of some of the industries affected by big data.

Manufacturing

Boost quality and output while minimizing waste; support for more agile business decisions.

What is the difference between a survey and a poll

Both may use sample sets of participants that represent the group that is being surveyed or polled. A poll typically asks one question while a survey is generally used to ask a range of questions.

Retail

Building customer relationships, marketing, handling transactions, revitalizing business

(Big Data) (SAS Institute) Complexity

Data comes from multiple sources, which makes it difficult to link, match, cleanse, and transform data across systems. Connecting relationships, hierarchies, and multiple data linkages is important.

(Big Data) Variety:

Data comes in all types of formats—from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

Data summary

Data does not depend on information, but information depends on data. Raw data by itself has no meaning. Information results when context or meaning is added to the raw data, resulting in at least the first level of understanding the answer to whatever question prompted the gathering of that data.

Data

Data is a fact or set of facts that have been gathered about an object, idea, place, person, etc. The facts are stored or represented in the form of numbers, measurements, words, descriptions, or observations.

(Big Data) Velocity:

Data streams into the data center at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors, and smart metering drive the need to deal with torrents of data in near-real time.

Nielsen

Households that participate are selected at random from a predefined sample based on census data. The census data provides critical information on household income, size, age of residents, etc. A certain number of houses from each group is selected.

(Big Data) (SAS Institute) Variability

In addition to the increasing velocities and varieties of data, data flows can be inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal, and event-triggered peak data loads can be challenging to manage.

Government

Managing utilities, running agencies, dealing with traffic congestion, preventing crime. Governments must also address issues of transparency and privacy.

power of information

Numbers by themselves are useless. Until you know the context, the data by itself only provides you with the foundation for eventually organizing the data in such a way as to provide you with the information needed to find answers to the questions or tell the story.

(Big Data) Volume

Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. In the past, storage would have been an issue, but new technologies have helped..

Health Care

Respecting privacy as it relates to patient records, treatment plans, prescription information while at the same time uncovering insights into improving patient care

What Are the Sources for Big Data?

Streaming data, Social media data, Publicly available sources

Big Data

The term "big data" became mainstream in the early 2000s via the work of industry analyst Doug Laney. His definition of the term incorporates the following (SAS, n.d.):

Gallup's method of selecting polling participants

Two of the most familiar polling companies are Gallup and Nielsen. generate a list of all phone numbers (landline and cell phone) in the United States and then use a subset of that list, which covers all geographical areas based on area codes, to call and interview individuals.

Banking

Understanding customers and customer satisfaction, minimizing risk and fraud while maintaining regulatory compliance

datum

a single fact is actually a datum and data is the plural form

What Is the Importance of Big Data?

analysis of that data can find answers to questions about potential reductions in cost and time, and help in making smart decisions about new product development (SAS, n.d.). Other critical business tasks can be supported by using the gathered data to determine what has caused failures, issues, and/or defects or detecting and mitigating fraudulent behavior before the organization is severely affected (SAS, n.d.). If done properly, data collection (and analysis) will allow a business to focus time, personnel, and resources on the issues that will generate the greatest returns.

surveys and polls

are the means by which the data is collected from the sample.

Discrete data

can only be assigned a certain value, such as whole numbers. For example, there are 32 students in the class, the hard drive can store eight Gigabytes, the test score was 89 percent.

Streaming data

comes from the information infrastructure within an organization. For example, all transactions accomplished via the IT systems within the business are captured on a daily or even hourly basis. This includes logs of daily activities and email or other types of messages received from internal sources.

Why Is Data Collected?

data is collected to tell a story or solve a problem. question, data type collected and follow on review

Information is

data that has been converted into a form that makes understanding of the data useful; it is data with meaning

Publicly available sources

data.gov, the CIA World Factbook, or the European Union Open Data Portal. A browser search of "sources for data sets" will provide a long list of sources for data that addresses many areas of interest.

Qualitative data

descriptive data that includes such facts as color, texture, feel, description of an experience, perceptions of strengths or weaknesses, etc. This is data to which numbers are not normally assigned.

How Is Data Collected?

direct observation, census, sample, and physical measurements. Can collect data on yourself, find data already collected, asking for updates or gaining access to data not open to the public.

Quantitative data

facts which are presented as numbers such as test scores, number of students in the class, number of words on a page, capacity of a hard drive. Quantitative data can also be subdivided into discrete and continuous data.

Categorical data

groups the facts into a category such as "new" or "used" or "for sale" or "not for sale," etc.

Social media data

including audio, photo, and video files that are retrieved from watching activity on Facebook, the business's website, or websites of related businesses or competitors. This can aid marketing, sales, and customer support functions.

Data that is

incorrect or used outside of the context for which it was gathered may result in incorrect information.

Data has

no value until it is converted into usable information ("Value" here only refers to the fact that, standing alone, raw data does not tell a story or answer a question. The data itself may have great "value," financial or otherwise, to the person or entity that seeks to use that data).

Sampling, Surveys, and Polls

not cost-effective or even practical to contact every member of the group or the population for data input. Instead, most such studies are based on gathering responses from a sample, or a subset of the entire population. results of sampling are considered to be representative of the population. Randomly selecting the subset of participants is the primary way of guaranteeing that anyone could have been selected.

In What Format Is Data Collected?

obtain the data in machine-readable form—that is, in a form such that the data can be imported into a computer program. The most common format for exchanging or importing data is in comma separated values (CSV).

Continuous data

reflects a range into which the values may fall. Optimal tire pressures may fall anywhere between 30 PSI (pounds per square inch) and 33 PSI, including any fractional pressure in between these values.

Data remains

static. it does not necessarily improve over time; rather, data can decay as it becomes outdated or is no longer applicable to the question being asked.

Data can be

stored, copied, duplicated, modified, and/or moved.

Information becomes

the basis for understanding a question, or making inferences, or making decisions; it helps tell the story.

How Is Information Used?

uses are tied very closely to a "need" that has been identified (that question or story) for gathering information. As such, the list does not directly indicate how the information is used, but why it was gathered. The assumption may be made that the information is then used to address the issue (Taylor, 1991):

Information results

when context is added to data—what, when, where, why, how the data was collected


Conjuntos de estudio relacionados

The age of exploration(vocabulary and people)

View Set

VWCC ECO 201- Practice Final Exam

View Set

ISDS 409 Exam 2 - Network Standards

View Set

EMT- Chapter 28 (head & spine trauma)

View Set