Week 1
Who Uses Big Data?
Big data plays a role in almost every industry. SAS (n.d.) provides a summary of some of the industries affected by big data.
Manufacturing
Boost quality and output while minimizing waste; support for more agile business decisions.
What is the difference between a survey and a poll
Both may use sample sets of participants that represent the group that is being surveyed or polled. A poll typically asks one question while a survey is generally used to ask a range of questions.
Retail
Building customer relationships, marketing, handling transactions, revitalizing business
(Big Data) (SAS Institute) Complexity
Data comes from multiple sources, which makes it difficult to link, match, cleanse, and transform data across systems. Connecting relationships, hierarchies, and multiple data linkages is important.
(Big Data) Variety:
Data comes in all types of formats—from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
Data summary
Data does not depend on information, but information depends on data. Raw data by itself has no meaning. Information results when context or meaning is added to the raw data, resulting in at least the first level of understanding the answer to whatever question prompted the gathering of that data.
Data
Data is a fact or set of facts that have been gathered about an object, idea, place, person, etc. The facts are stored or represented in the form of numbers, measurements, words, descriptions, or observations.
(Big Data) Velocity:
Data streams into the data center at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors, and smart metering drive the need to deal with torrents of data in near-real time.
Nielsen
Households that participate are selected at random from a predefined sample based on census data. The census data provides critical information on household income, size, age of residents, etc. A certain number of houses from each group is selected.
(Big Data) (SAS Institute) Variability
In addition to the increasing velocities and varieties of data, data flows can be inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal, and event-triggered peak data loads can be challenging to manage.
Government
Managing utilities, running agencies, dealing with traffic congestion, preventing crime. Governments must also address issues of transparency and privacy.
power of information
Numbers by themselves are useless. Until you know the context, the data by itself only provides you with the foundation for eventually organizing the data in such a way as to provide you with the information needed to find answers to the questions or tell the story.
(Big Data) Volume
Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. In the past, storage would have been an issue, but new technologies have helped..
Health Care
Respecting privacy as it relates to patient records, treatment plans, prescription information while at the same time uncovering insights into improving patient care
What Are the Sources for Big Data?
Streaming data, Social media data, Publicly available sources
Big Data
The term "big data" became mainstream in the early 2000s via the work of industry analyst Doug Laney. His definition of the term incorporates the following (SAS, n.d.):
Gallup's method of selecting polling participants
Two of the most familiar polling companies are Gallup and Nielsen. generate a list of all phone numbers (landline and cell phone) in the United States and then use a subset of that list, which covers all geographical areas based on area codes, to call and interview individuals.
Banking
Understanding customers and customer satisfaction, minimizing risk and fraud while maintaining regulatory compliance
datum
a single fact is actually a datum and data is the plural form
What Is the Importance of Big Data?
analysis of that data can find answers to questions about potential reductions in cost and time, and help in making smart decisions about new product development (SAS, n.d.). Other critical business tasks can be supported by using the gathered data to determine what has caused failures, issues, and/or defects or detecting and mitigating fraudulent behavior before the organization is severely affected (SAS, n.d.). If done properly, data collection (and analysis) will allow a business to focus time, personnel, and resources on the issues that will generate the greatest returns.
surveys and polls
are the means by which the data is collected from the sample.
Discrete data
can only be assigned a certain value, such as whole numbers. For example, there are 32 students in the class, the hard drive can store eight Gigabytes, the test score was 89 percent.
Streaming data
comes from the information infrastructure within an organization. For example, all transactions accomplished via the IT systems within the business are captured on a daily or even hourly basis. This includes logs of daily activities and email or other types of messages received from internal sources.
Why Is Data Collected?
data is collected to tell a story or solve a problem. question, data type collected and follow on review
Information is
data that has been converted into a form that makes understanding of the data useful; it is data with meaning
Publicly available sources
data.gov, the CIA World Factbook, or the European Union Open Data Portal. A browser search of "sources for data sets" will provide a long list of sources for data that addresses many areas of interest.
Qualitative data
descriptive data that includes such facts as color, texture, feel, description of an experience, perceptions of strengths or weaknesses, etc. This is data to which numbers are not normally assigned.
How Is Data Collected?
direct observation, census, sample, and physical measurements. Can collect data on yourself, find data already collected, asking for updates or gaining access to data not open to the public.
Quantitative data
facts which are presented as numbers such as test scores, number of students in the class, number of words on a page, capacity of a hard drive. Quantitative data can also be subdivided into discrete and continuous data.
Categorical data
groups the facts into a category such as "new" or "used" or "for sale" or "not for sale," etc.
Social media data
including audio, photo, and video files that are retrieved from watching activity on Facebook, the business's website, or websites of related businesses or competitors. This can aid marketing, sales, and customer support functions.
Data that is
incorrect or used outside of the context for which it was gathered may result in incorrect information.
Data has
no value until it is converted into usable information ("Value" here only refers to the fact that, standing alone, raw data does not tell a story or answer a question. The data itself may have great "value," financial or otherwise, to the person or entity that seeks to use that data).
Sampling, Surveys, and Polls
not cost-effective or even practical to contact every member of the group or the population for data input. Instead, most such studies are based on gathering responses from a sample, or a subset of the entire population. results of sampling are considered to be representative of the population. Randomly selecting the subset of participants is the primary way of guaranteeing that anyone could have been selected.
In What Format Is Data Collected?
obtain the data in machine-readable form—that is, in a form such that the data can be imported into a computer program. The most common format for exchanging or importing data is in comma separated values (CSV).
Continuous data
reflects a range into which the values may fall. Optimal tire pressures may fall anywhere between 30 PSI (pounds per square inch) and 33 PSI, including any fractional pressure in between these values.
Data remains
static. it does not necessarily improve over time; rather, data can decay as it becomes outdated or is no longer applicable to the question being asked.
Data can be
stored, copied, duplicated, modified, and/or moved.
Information becomes
the basis for understanding a question, or making inferences, or making decisions; it helps tell the story.
How Is Information Used?
uses are tied very closely to a "need" that has been identified (that question or story) for gathering information. As such, the list does not directly indicate how the information is used, but why it was gathered. The assumption may be made that the information is then used to address the issue (Taylor, 1991):
Information results
when context is added to data—what, when, where, why, how the data was collected