Abcc Ch. 17
Load
Once data is transformed and normalized, it's ready to be finally transferred into the data warehouse or datamart. Loading sometimes happens weekly, daily, or even hourly. The more often this is done, the more up-to-date analytic reports are possible, and the more timely they can be.
Transform
Once you've extracted data, it needs to become normalized. Data is no good to you unless it's organized. Normalizing data means that your data is typically organized into the fields and records of a relational database. Normalizing provides the standard data format required to analyze data.
information
Processed data that means something
Volume
Refers to the amount of data collected by an organization. How much data does your business need, and further, where do you keep it once you've collected it?
Structured Data Categories (3)
Structured data Unstructured data Semi-Structured data
5. What decade did businesses start DSS?
- 1980's - [1950s] - 1960's - 1970's 1950S
9. How much data is unstructured?
- 70% - [80%] - 90% - 100% 80%
19. What allows a cluster system that allows data to be stored on multiple servers?
- Business Intelligence - Constellation Server Technologies (Oracle Group LLC) - Early Bot Technologies (Lycos Engine) - [Hadoop] HADOOP
14. In BI, what does the acronym CRM mean?
- Certified Reference Material - [Customer Relationship Management] - Computer Resource Management - Crew Resource Management CUSTOMER RELATIONSHIP MANAGEMENT
24. What software helps BI become more usable through visualization?
- Console - Boundary - GUI - [Dashboards] DASHBOARDS
22. What is a graphical interface that characterizes specific data analysis through visualization?
- Console - [Dashboard] - GUI - Boundary DASHBOARD
16. What is a smaller Data Warehouse?
- Data Depository - Data Silo - Data Store - [Datamart] DATAMART
20. What is another term for Data Mining?
- Data Recognition - Data Detection - Early Data Encounter (EDE, at the server) - [Data Discovery] DATA DISCOVERY
15. What consolidates disparate data?
- Data Silo(s) (usually multiple repositories) - Data Depository - Data Store - [Data Warehouse] DATA WAREHOUSE
18. What is Doug Cutting's son's toy elephant's name?
- Edward - Edvald - [Hadoop] - Eugene HADOOP
23. What attempts to reveal future patterns in the marketplace?
- Extrapolative Analytics - Projecting Analytics - [Predictive Analytics] - Prognostic Analytics PREDICTIVE ANALYTICS
21. What is another term for Text Analytics?
- Text Extraction - Text Withdrawal - [Text-mining] - Text Abstraction TEXT MINING
17. What provides a standard format to organize data?
- Uniform Data (Stored) - Standardized Data - [Normalization] - Prevailing Data NORMALIZATION
7. What kind of data resides in fixed formats?
- Unstructured - Semi-Structured - Structured and Semi-Structured - [Structured] STRUCTURED
13. What refers to quality of data?
- Variety - [Veracity] - Velocity - Volume VERACITY
12. What refers to different kinds of data?
- Velocity - Veracity - [Variety] - Volume VARIETY
11. What refers to how fast data is collected?
- Volume - Veracity - Variety - [Velocity] VELOCITY
4. Large collected datasets are called what?
- Yottabyte Sets - Large Data - [Big Data] - Yottabyte Data Sets BIG DATA
1. What refers to an assortment of software applications to analyze an organization's raw data?
- [BI] - UML - UML and SDLC - SDLC BI
3. In relation to BI, what does the acronym DSS mean?
- [Decision Support System] - Department of Social Security - Digital Signature Standard - Digital Spread Spectrum DECISION SUPPORT SYSTEM
6. What held up the emergence of the Cloud?
- [Internet speeds] - Security - Hard Drive Space - Hard Drive Scarcity INTERNET SPEED
2. Why is keeping enormous amounts of inventory a bad thing?
- [It's too expensive] - Your customers always find what they need - It doesn't take up too much space - It's inexpensive ITS TOO EXPENSIVE
In BI, which of the following is not part of ETL?
- [Transmission] - Transform - Load - Extract TRANSMISSION
8. What data is disorganized and not easily read?
- [Unstructured] - Unstructured and Structured - Structured - Semi-Structured UNSTRUCTURED
10. What refers to the amount of data?
- [Volume] - Variety - Veracity - Velocity VOLUME
Process of collecting internal data
1) Take an inventory of the data your organization makes, and figure out who or what makes it. 2) What your customers are thinking? 3) Deciding what kind of data you require - think of What questions do you need to answer? Because Coming up with questions goes a long way towards the answers you need as well as toward the creation of a baseline for what data you need to collect. 4) find a place to keep the data you want to retrieve
When did organizations organizations start using and processing data & information?
1950's organizations started using and processing data and information to support the tactical and strategic decisions they made, or were going to make influencing the emergence of Decisions support systems (DSS)
Unstructured data
80% of all data is unstructured, disorganized data that cannot be easily read or processed by a computer because it is not stored in rows and columns like traditional data tables.
Big Data
Collected data sets from - smartphone metadata - Internet usage records - social media activity - computer usage records - and countless other data sources sifted for patterns and trends.
Concepts of Data analytics (4)
Data mining Topic analytics Text analytics Business analytics
Forms of business analytics (3)
Descriptive analytics Predictive analytics Decision analytics
Velocity
How fast can you collect data, and more importantly, how quickly can you analyze it?
Varacity
Is the data your organization collected any good? Just because some other business amasses data you may want, doesn't necessarily make it trustable or valuable. Lots of data sources have data that is not "clean," meaning it may be too fragmented to be valuable or usable, or that it was simply collected poorly in the first place.
Data analysis
Makes sense of an organizations collected data and turns it into useful information to validate their future decisions. Basically applying statistics and logic techniques to define, illustrate, and evaluate data
Four V's of Big data
Volume Velocity Variety Veracity
Variety
You may have identified what data you wish to collect, but is it Structured, Semi-structured, or Unstructured? Very likely it is a combination of all three, which could potentially throw a wrench into your data collection gears.
Extract
after you've determined where your data resides you can begin extracting it, often from Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) software. One of ERPs main function's is to centralize an organization's data so that it ends up being a wealth of data with value across the organization. The extraction step sometimes grabs unstructured data like text notes to semi-structured or structured data by tagging it with metadata.
Dashboards
are easy-to-use graphical interfaces that characterize specific data analysis through visualization, and makes it a lot easier to make sense of the data and see the resulting information.
Business analytics
attempts to make connections between data so organization's can try to predict future trends that may give them a competitive advantage. Business analytics can also uncover computer system inadequacies within an organization.
Predictive analytics
attempts to reveal future patterns in a marketplace, essentially trying to predict the future by looking for data correlations between one thing, and any other things that pertain to it.
Map reduce is not _________________
bringing vast amounts of data back to centralized servers like Data Warehouse and Datamart techniques, it saves immense amounts of network bandwidth and resources.
Decision analytics
builds on Predictive Analysis to make decisions about future industries and marketplaces. Decision Analytics looks at an organization's internal data and then analyzes external conditions like supply abundance and then endorse a best course of action.
SAPBusiness objects dashboard
data visualization software that allows you to create and export interactive dashboards. These dashboards contain various components, such as charts, graphs, and buttons, that are bound to data
ETL can ___________ and requires __________
eat bandwidth and storage at an alarming rate and size. And requires massive amounts of time, money, and storage space.
Businesses' throughout history have tried to _________________
ecognize trends to best serve their customers and in turn, to then try and become more profitable. Essentially, organizations have always tried to predict the future.
Apache Hadoop
is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
Decision support systems (DSS)
evolved as academic research, are computer-based systems that support an organization's decision-making activities.
With Data Businesses' consider
how much data they are going to collect how fast they can analyze it? what type of data is collected, and is the data reliable?
Hadoop
is a toy elephant owned by Doug Cutting's young son, evolved as an infrastructure for storing and processing large sets of data across multiple servers. BUT INSTEAD OF centralized files in one place, like a Data Warehouses or Datamarts, Hadoop uses a cluster system that allows files to be stored on multiple servers.
Semi structured data
lands somewhere in-between Structured and Unstructured data and can possibly be converted into structured data, but not without a lot of work.
Data
raw, unorganized facts
Business intelligence (BI)
refers to an assortment of software applications used to analyze an organization's raw data, described as computer applications that change data into significant, meaningful information that helps organizations make better decisions - often described as "the set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes".
Structured data
resides in fixed formats. typically well labeled and often with traditional fields and records of common data tables., and does not have to be "table like" BUT NEEDS TO HAVE RECOGNIZABLE PATTERNS that allow it to be more easily queried ands searched for in a standard format.
Datamart
smaller more focused warehouse that limits the complexity of databases so you cant answer as much as a Datawarehouse but they are cheaper to implement
Data mining
sometimes called Data Discovery is the examination of huge sets of data to find patterns and connections, and identify outliers.
Text analytics
sometimes called Text-mining hunts through unstructured text data to look for useful patterns, like whether their customers on Facebook.com or Instagram.com are unsatisfied with the organization's products or service.
Hadoop is flexible enough _____________
that it allows for one query to be issued that searches through multiple servers on the fly, or ad hoc queries, and is very difficult to implement and run & requires a highly qualified data scientist(s) to run it.
Descriptive analytics
the baseline that other types of analytics are built. Descriptive Analytics define past data you already have that can be grouped into significant pieces like a department's sales results, and also start to reveal trends
Data visualization
the graphic display of the results of data mining, analytics and BI in general, typically in real time. Many times, data and information is just too massive and confusing to rely on numbers, so products like PowerPoint and Dashboards have become invaluable tools. Data Visualization software helps BI program results become more understandable and therefore, more meaningful in decision making.
Map Reduce
the processing arm, or engine of Hadoop that allows data to be queried and processed directly on the server where it lives, instead of moving the data across the network to be analyzed on the computer. & only the query is transported through the network.
Extract transform and load (ETL)
tools used to standardize data across systems, allowing it to be queried. & must happen in order; 1 ) extracting the data 2 ) transforming data so it fits into your data warehouse or datamart 3 ) loading the data into the data warehouse or datamart.
Topic analytics
tries to catalog phrases of an organization's customer feedback into relevant topics. For example, if a customer said, "the barista was friendly", that would be categorized under the topic "Employee Friendliness."
Data warehouses
used to consolidate disparate data in a central location, holding yottabytes of data.